[PATCH 0/6] Out-of-line static calls for powerpc64 ELF V2

Benjamin Gray bgray at linux.ibm.com
Fri Sep 16 16:23:24 AEST 2022


Implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
Static calls patch an indirect branch into a direct branch at runtime.
Out-of-line specifically has a caller directly call a trampoline, and
the trampoline gets patched to directly call the target.

More context regarding the challenges with the ELF V2 ABI is in the RFC
https://lore.kernel.org/linuxppc-dev/20220901055823.152983-1-bgray@linux.ibm.com/

This resolves the stack issue in the RFC by marking the trampoline as not
preserving the TOC, so the linker will insert its own TOC saving trampoline +
restore the TOC when the target returns.

It is sub-optimal (a separate TOC saving trampoline is not necessary), but does
not require any additional support beyond what's given in the ABI (unlike the
other two suggestions in the RFC). Microbenchmarking shows a performance improvement
in kernel-kernel-kernel calls on a Power9 when the indirect branch predictor is disabled.
However the generic implementation performs better in every other case. And when
branch prediction is enabled the generic implementation behaves like the control cases.

    |    Case    |    Generic      |     Static      |
    |------------|-----------------|-----------------|
    | control_kk | 221536 calls/ms | 221443 calls/ms |  // control is direct call, no SC trampoline
    | control_mm | 221941 calls/ms | 221913 calls/ms |
    | kkk        |  89657 calls/ms | 177835 calls/ms |  // kernel caller -> kernel tramp -> kernel target
    | kkm        |  89835 calls/ms |  53853 calls/ms |  // kernel caller -> kernel tramp -> module target
    | kmk        | 101808 calls/ms |  52280 calls/ms |  // etc.
    | kmm        | 101973 calls/ms |  52347 calls/ms |
    | mkk        |  97621 calls/ms |  78044 calls/ms |
    | mkm        |  97738 calls/ms |  38370 calls/ms |
    | mmk        |  98839 calls/ms |  68436 calls/ms |
    | mmm        |  98967 calls/ms |  68511 calls/ms |

Using a noinline page-aligned target that adds 1 to a counter then runs 64 NOPs
to iron out some processor timing quirks. The target is called in a loop like

	while (!READ_ONCE(stop))
		static_call(bench_sc)(&counter);

Again page aligned. The benchmark is stopped by a timer.

The kernel trampoline's hardcoded TOC offset is done because importing
the asm constants header imports an unrelated macro definition that is the same as the
enum name it was generated from, which confuses the compiler when it reaches said enum
definition.


Benjamin Gray (6):
  powerpc/code-patching: Implement generic text patching function
  powerpc/module: Handle caller-saved TOC in module linker
  powerpc/module: Optimise nearby branches in ELF V2 ABI stub
  static_call: Move static call selftest to static_call_selftest.c
  powerpc/64: Add support for out-of-line static calls
  powerpc/64: Add tests for out-of-line static calls

 arch/powerpc/Kconfig                     |  22 +-
 arch/powerpc/include/asm/code-patching.h |   2 +
 arch/powerpc/include/asm/static_call.h   |  80 ++++++-
 arch/powerpc/kernel/Makefile             |   4 +-
 arch/powerpc/kernel/module_64.c          |  25 ++-
 arch/powerpc/kernel/static_call.c        | 203 ++++++++++++++++-
 arch/powerpc/kernel/static_call_test.c   | 263 +++++++++++++++++++++++
 arch/powerpc/lib/code-patching.c         | 135 ++++++++----
 kernel/Makefile                          |   1 +
 kernel/static_call_inline.c              |  43 ----
 kernel/static_call_selftest.c            |  41 ++++
 11 files changed, 713 insertions(+), 106 deletions(-)
 create mode 100644 arch/powerpc/kernel/static_call_test.c
 create mode 100644 kernel/static_call_selftest.c

--
2.37.3


More information about the Linuxppc-dev mailing list