[RFC PATCH 0/4] Out-of-line static calls for powerpc64 ELF V2

Thu Sep 1 15:58:19 AEST 2022

WIP implementation of out-of-line static calls for PowerPC 64-bit ELF V2 ABI.
Static calls patch an indirect branch into a direct branch at runtime.
Out-of-line specifically has a caller directly call a trampoline, and
the trampoline gets patched to directly call the target. This current
implementation has a known issue described in detail below, and is
presented here for any comments or suggestions.

64-bit ELF V2 specifies a table of contents (TOC) pointer stored in r2.
Functions that use a TOC can use it to perform various operations
relative to its value. When the caller and target use different TOCs,
the static call implementation must ensure the TOC is kept consistent
so that neither tries to use the other's TOC.

However, while the trampoline can change the caller's TOC to the target's
TOC, it cannot restore the caller's TOC when the target returns. For the
trampoline to do this would require the target to return to the trampoline,
and so the return address back to the caller would need to be saved to
the stack. But the trampoline cannot move the stack because the target
may be expecting parameters relative to the stack pointer (when registers
are insufficient or varargs are used). And as static calls are usable in
generic code, there can be no arch-specific restrictions on parameters
that would sidestep this issue.

Normally the TOC change issue is resolved by the caller, which will save
and restore its TOC if necessary. For static calls though the caller
sees the trampoline as a local function, so assumes it does not change
the TOC and treats r2 as nonvolatile (no save & restore added).

This is a simialar problem to that faced by livepatch. Static calls may have
a few more options though, because the call is performed through a
`static_call` macro, allowing annotation and insertion of inline assembly
at every callsite.

I can think of several possible solutions, but they are relatively complex:

1. Patching the callsites at runtime, as is done for inline static calls.
    This also requires some inline assembly to save `r2` to the TOC pointer
    Doubleword slot on the stack before each static call, as the caller may
    not have done so in its prologue. It should be easy to add though, because
    static calls are invoked through the `static_call` macro that can be
    modified appropriately. The runtime patching then modifies the trailing
    function call `nop` to restore this r2 value.

    The patching itself can probably be done at compile time at kernel callsites.

2. Use the livepatch stack method of growing the base of the stack backwards.
    I haven't looked too closely at the implementation though, especially
    regarding how much room is available.

    The benefit of this method is that there can be zero overhead when the
    trampoline and target share a TOC. So the trampoline in kernel-only
    calls can still just be a single direct branch.

3. Remove the local entry point from the trampoline. This makes the trampoline
    less efficient, as it cannot assume r2 to be correct, but should at least
    cause the caller to automatically save and restore r2 without manual patching.
    From the ABI manual:

    > 2.2.1. Function Call Linkage Protocols
    >   A function that uses a TOC pointer always has a separate local entry point
    >   [...], and preserves r2 when called via its local entry point.
    >
    > 2.2.2.1. Register Roles
    >   (a) Register r2 is nonvolatile with respect to calls between functions
    >       in the same compilation unit, except under the conditions in footnote (b)
    >   (b) Register r2 is volatile and available for use in a function that does not
    >       use a TOC pointer and that does not guarantee that it preserves r2.

    So not having a local entry point implies not using a TOC pointer, which
    implies r2 is volatile if the trampoline does not guarantee that it preserves
    r2. However experimenting with such a trampoline showed the caller still did
    not preserve its TOC when necessary, even when the trampoline used instructions
    that wrote to r2. Possibly there's an attribute that can be used to mark the
    necessary info, but I could not find one.

Benjamin Gray (3):
  static_call: Move static call selftest to static_call.c
  powerpc/64: Add support for out-of-line static calls
  powerpc/64: Add tests for out-of-line static calls

Russell Currey (1):
  powerpc/code-patching: add patch_memory() for writing RO text

 arch/powerpc/Kconfig                     |  23 +-
 arch/powerpc/include/asm/code-patching.h |   2 +
 arch/powerpc/include/asm/static_call.h   |  45 +++-
 arch/powerpc/kernel/Makefile             |   4 +-
 arch/powerpc/kernel/static_call.c        | 184 +++++++++++++++-
 arch/powerpc/kernel/static_call_test.c   | 257 +++++++++++++++++++++++
 arch/powerpc/lib/code-patching.c         |  65 ++++++
 kernel/static_call.c                     |  43 ++++
 kernel/static_call_inline.c              |  43 ----
 9 files changed, 613 insertions(+), 53 deletions(-)
 create mode 100644 arch/powerpc/kernel/static_call_test.c

base-commit: c5e4d5e99162ba8025d58a3af7ad103f155d2df7
--
2.37.2