powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation targets on a module removal

Song Liu song at kernel.org
Tue Dec 13 19:13:46 AEDT 2022


)() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek <pmladek at suse.com> wrote:
>
> On Fri 2022-12-09 11:59:35, Song Liu wrote:
> > On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek <pmladek at suse.com> wrote:
> > > On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > > > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek <pmladek at suse.com> wrote:
> > > > >
> > > > > > --- a/arch/powerpc/kernel/module_64.c
> > > > > > +++ b/arch/powerpc/kernel/module_64.c
> > > > > > +#ifdef CONFIG_LIVEPATCH
> > > > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > > > +                    const char *strtab,
> > > > > > +                    unsigned int symindex,
> > > > > > +                    unsigned int relsec,
> > > > > > +                    struct module *me)
> > > > > > +{
>
> [...]
>
> > > > > > +
> > > > > > +             instruction = (u32 *)location;
> > > > > > +             if (is_mprofile_ftrace_call(symname))
> > > > > > +                     continue;
> > >
> > > Why do we ignore these symbols?
> > >
> > > I can't find any counter-part in apply_relocate_add(). It looks super
> > > tricky. It would deserve a comment.
> > >
> > > And I have no idea how we could maintain these exceptions.
> > >
> > > > > > +             if (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > > > > > +                     continue;
> > >
> > > Same here. It looks super tricky and there is no explanation.
> >
> > The two checks are from restore_r2(). But I cannot really remember
> > why we needed them. It is probably an updated version from an earlier
> > version (3 year earlier..).
>
> This is a good sign that it has to be explained in a comment.
> Or even better, it should not by copy pasted.
>
> > > > > > +             instruction += 1;
> > > > > > +             patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
>
> I believe that this is not enough. apply_relocate_add() does this:
>
> int apply_relocate_add(Elf64_Shdr *sechdrs,
> [...]
>                        struct module *me)
> {
> [...]
>                 case R_PPC_REL24:
>                         /* FIXME: Handle weak symbols here --RR */
>                         if (sym->st_shndx == SHN_UNDEF ||
>                             sym->st_shndx == SHN_LIVEPATCH) {
> [...]
>                         if (!restore_r2(strtab + sym->st_name,
>                                                         (u32 *)location + 1, me))
> [...]                                   return -ENOEXEC;
>
> --->                    if (patch_instruction((u32 *)location, ppc_inst(value)))
>                                 return -EFAULT;
>
> , where restore_r2() does:
>
> static int restore_r2(const char *name, u32 *instruction, struct module *me)
> {
> [...]
>         /* ld r2,R2_STACK_OFFSET(r1) */
> --->    if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
>                 return 0;
> [...]
> }
>
> By other words, apply_relocate_add() modifies two instructions:
>
>    + patch_instruction() called in restore_r2() writes into "location + 1"
>    + patch_instruction() called in apply_relocate_add() writes into "location"
>
> IMHO, we have to clear both.
>
> IMHO, we need to implement a function that reverts the changes done
> in restore_r2(). Also we need to revert the changes done in
> apply_relocate_add().

I finally got time to read all the details again and recalled what
happened with the code.

The failure happens when we
1) call apply_relocate_add() on klp load (or module first load,
   if klp was loaded first);
2) do nothing when the module is unloaded;
3) call apply_relocate_add() on module reload, which failed.

The failure happens at this check in restore_r2():

        if (*instruction != PPC_RAW_NOP()) {
                pr_err("%s: Expected nop after call, got %08x at %pS\n",
                        me->name, *instruction, instruction);
                return 0;
        }

Therefore, apply_relocate_add only fails when "location + 1"
is not NOP. And to make it not fail, we only need to write NOP to
"location + 1" in clear_relocate_add().

IIUC, you want clear_relocate_add() to undo everything we did
in apply_relocate_add(); while I was writing clear_relocate_add()
to make the next apply_relocate_add() not fail.

I agree that, based on the name, clear_relocate_add() should
undo everything by apply_relocate_add(). But I am not sure how
to handle some cases. For example, how do we undo

                case R_PPC64_ADDR32:
                        /* Simply set it */
                        *(u32 *)location = value;
                       break;

Shall we just write zeros? I don't think this matters.

I think this is the question we should answer first:
What shall clear_relocate_add() do?
1) undo everything by apply_relocate_add();
2) only do things needed to make the next
   apply_relocate_add succeed;
3) something between 1) and 2).

WDYT?

Thanks,
Song

>
> Please, avoid code duplication as much as possible. Especially,
> the two checks is_mprofile_ftrace_call() and
> instr_is_relative_link_branch() must be shared. IMHO, it is
> the only way to keep the code maintainable. We must make sure that
> we will clear the instructions only when they were patched. And
> copy pasting various tricky exceptions is a way to hell.
>
>
> > > int update_relocate_add(Elf64_Shdr *sechdrs,
> > >                        const char *strtab,
> > >                        unsigned int symindex,
> > >                        unsigned int relsec,
> > >                        struct module *me,
> > >                        bool apply)
> > > {
> > >         unsigned int i;
> > >         Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
> > >         Elf64_Sym *sym;
> > >         Elf64_Xword r_type;
> > >         unsigned long *location;
> > >
> > >         if (apply) {
> > >                 pr_debug("Applying ADD relocate section %u to %u\n", relsec,
> > >                        sechdrs[relsec].sh_info);
> > >         } else {
> > >                 pr_debug("Clearing ADD relocate section %u\n", relsec");
> > >         }
> > >
> > >         for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
> > >                 /* This is where to make the change */
> > >                 location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
> > >                         + rela[i].r_offset;
> > >                 /* This is the symbol it is referring to */
> > >                 sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
> > >                         + ELF64_R_SYM(rela[i].r_info);
> > >
> > >                 r_type = ELF64_R_TYPE(rela[i].r_info);
> > >
> > >                 if (apply) {
> > >                         apply_relocate_location(sym, location, r_type, rela[i].r_addend);
> > >                 } else {
> > >                         clear_relocate_location(sym, location, r_type);
> > >                 }
> >
> > I personally don't like too many "if (apply) {...} else {...}" patterns in
> > a function. And these new functions confuse me sometimes:
> >
> >     update_relocate_add(..., apply);
> >     apply_relocate_location();
> >     clear_relocate_location();
>
> Feel free to come up with another way how to avoid code duplication.
>
> > And I did think there wasn't too much duplicated code.
>
> I think that it looks very different when you are writing or reading
> or mantainting the code. It might be easier to write code and modify
> it. It is more complicated to find the differences later. Also it is
> more complicated to do the same changes many times when the common
> code is updated later.
>
> Best Regards,
> Petr


More information about the Linuxppc-dev mailing list