[powerpc] ftrace warning kernel/trace/ftrace.c:2068 with code-patching selftests
Ard Biesheuvel
ardb at kernel.org
Fri Jan 28 00:59:31 AEDT 2022
On Thu, 27 Jan 2022 at 14:24, Mark Rutland <mark.rutland at arm.com> wrote:
>
> On Thu, Jan 27, 2022 at 02:07:03PM +0100, Ard Biesheuvel wrote:
> > On Thu, 27 Jan 2022 at 13:59, Mark Rutland <mark.rutland at arm.com> wrote:
> > >
> > > On Thu, Jan 27, 2022 at 01:22:17PM +0100, Ard Biesheuvel wrote:
> > > > On Thu, 27 Jan 2022 at 13:20, Mark Rutland <mark.rutland at arm.com> wrote:
> > > > > On Thu, Jan 27, 2022 at 01:03:34PM +0100, Ard Biesheuvel wrote:
> > > > >
> > > > > > These architectures use place-relative extables for the same reason:
> > > > > > place relative references are resolved at build time rather than at
> > > > > > runtime during relocation, making a build time sort feasible.
> > > > > >
> > > > > > arch/alpha/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/arm64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/ia64/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/parisc/include/asm/uaccess.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/powerpc/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/riscv/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/s390/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > > arch/x86/include/asm/extable.h:#define ARCH_HAS_RELATIVE_EXTABLE
> > > > > >
> > > > > > Note that the swap routine becomes something like the below, given
> > > > > > that the relative references need to be fixed up after the entry
> > > > > > changes place in the sorted list.
> > > > > >
> > > > > > static void swap_ex(void *a, void *b, int size)
> > > > > > {
> > > > > > struct exception_table_entry *x = a, *y = b, tmp;
> > > > > > int delta = b - a;
> > > > > >
> > > > > > tmp = *x;
> > > > > > x->insn = y->insn + delta;
> > > > > > y->insn = tmp.insn - delta;
> > > > > > ...
> > > > > > }
> > > > > >
> > > > > > As a bonus, the resulting footprint of the table in the image is
> > > > > > reduced by 8x, given that every 8 byte pointer has an accompanying 24
> > > > > > byte RELA record, so we go from 32 bytes to 4 bytes for every call to
> > > > > > __gnu_mcount_mc.
> > > > >
> > > > > Absolutely -- it'd be great if we could do that for the callsite locations; the
> > > > > difficulty is that the entries are generated by the compiler itself, so we'd
> > > > > either need some build/link time processing to convert each absolute 64-bit
> > > > > value to a relative 32-bit offset, or new compiler options to generate those as
> > > > > relative offsets from the outset.
> > > >
> > > > Don't we use scripts/recordmcount.pl for that?
> > >
> > > Not quite -- we could adjust it to do so, but today it doesn't consider
> > > existing mcount_loc entries, and just generates new ones where the compiler has
> > > generated calls to mcount, which it finds by scanning the instructions in the
> > > binary. Today it is not used:
> > >
> > > * On arm64 when we default to using `-fpatchable-function-entry=N`. That makes
> > > the compiler insert 2 NOPs in the function prologue, and log the location of
> > > that NOP sled to a section called. `__patchable_function_entries`.
> > >
> > > We need the compiler to do that since we can't reliably identify 2 NOPs in a
> > > function prologue as being intended to be a patch site, as e.g. there could
> > > be notrace functions where the compiler had to insert NOPs for alignment of a
> > > subsequent brnach or similar.
> > >
> > > * On architectures with `-nop-mcount`. On these, it's necessary to use
> > > `-mrecord-mcount` to have the compiler log the patch-site, for the same
> > > reason as with `-fpatchable-function-entry`.
> > >
> > > * On architectures with `-mrecord-mcount` generally, since this avoids the
> > > overhead of scanning each object.
> > >
> > > * On x86 when objtool is used.
> > >
> >
> > Right.
> >
> > I suppose that on arm64, we can work around this by passing
> > --apply-dynamic-relocs to the linker, so that all R_AARCH64_RELATIVE
> > targets are prepopulated with the link time value of the respective
> > addresses. It does cause some bloat, which is why we disable that
> > today, but we could make that dependent on ftrace being enabled.
>
> We'd also need to teach the build-time sort to update the relocations, unless
> you mean to also change the boot-time reloc code to RMW with the offset?
>
Why would that be necessary? Every RELA entry has the same effect on
its target address, as it just adds a fixed offset.
> I think for right now the best thing is to disable the build-time sort for
> arm64, but maybe something like that is the right thing to do longer term.
>
Fair enough.
> > I do wonder how much over head we accumulate, though, by having all
> > these relocations, but I suppose that is the situation today in any
> > case.
>
> Yeah; I suspect if we want to do something about that we want to do it more
> generally, and would probably need to do something like the x86 approach and
> rewrite the relocs at build-time to something more compressed. If we applied
> the dynamic relocs with the link-time address we'd only need 4 bytes to
> identify each pointer to apply an offset to.
>
> I'm not exactly sure how we could do that, nor what the trade-off look like in
> practice.
>
It would make sense for -fpic codegen to use relative offsets in
__mcount_loc, but since we don't actually use -fpic on arm64, that
doesn't really help :-)
More information about the Linuxppc-dev
mailing list