[PATCH RFC] powerpc: Panic on jump label code patching failure
Ritesh Harjani (IBM)
ritesh.list at gmail.com
Tue Sep 9 13:19:05 AEST 2025
Christophe Leroy <christophe.leroy at csgroup.eu> writes:
> Le 06/09/2025 à 05:52, Ritesh Harjani a écrit :
>> [Vous ne recevez pas souvent de courriers de riteshh at linux.ibm.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>>
>> Andrew Donnellan <ajd at linux.ibm.com> writes:
>>
>>> If patch_branch() or patch_instruction() fails while updating a jump
>>> label, we presently fail silently, leading to unpredictable behaviour
>>> later on.
>>>
>>> Change arch_jump_label_transform() to panic on a code patching failure,
>>> matching the existing behaviour of arch_static_call_transform().
>>>
>>> Reported-by: Erhard Furtner <erhard_f at mailbox.org>
>>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>>>
>>> ---
>>>
>>> Ran into this while debugging an issue that Erhard reported to me about my
>>> PAGE_TABLE_CHECK series on a G4, where updating a static key failed
>>> silently, but only for one call site, leading to an incorrect reference
>>> count later on. This looks to be due to the issue fixed in [0]. A loud
>>> failure would have saved us all considerable debugging time.
>>>
>>> Should I change the return type of arch_jump_label_transform() and handle
>>> this in an arch-independent way? Are there other users of code patching
>>> in powerpc that ought to be hardened?
>>>
>>> Or is this excessive?
>>>
>>> [0] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4b5e6eb281d7b1ea77619bee17095f905a125168.1757003584.git.christophe.leroy@csgroup.eu/
>>> ---
>>> arch/powerpc/kernel/jump_label.c | 9 +++++++--
>>> 1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
>>> index 2659e1ac8604..80d41ed7ac50 100644
>>> --- a/arch/powerpc/kernel/jump_label.c
>>> +++ b/arch/powerpc/kernel/jump_label.c
>>> @@ -12,9 +12,14 @@ void arch_jump_label_transform(struct jump_entry *entry,
>>> enum jump_label_type type)
>>> {
>>> u32 *addr = (u32 *)jump_entry_code(entry);
>>> + int err;
>>>
>>> if (type == JUMP_LABEL_JMP)
>>> - patch_branch(addr, jump_entry_target(entry), 0);
>>> + err = patch_branch(addr, jump_entry_target(entry), 0);
>>> else
>>> - patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>>> + err = patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>>> +
>>> + if (err)
>>> + panic("%s: patching failed, err %d, type %d, addr %pS, target %pS\n",
>>> + __func__, err, type, addr, (void *)jump_entry_target(entry));
>>> }
>>
>> arch_jump_label_transform() is mainly getting called from
>> __jump_level_update() and it's used for enabling or updating static keys / branch.
>>
>> But static keys can also be used by drivers / module subsystem whose
>> initialization happens late. Although I understand that if the above
>> fails, it might fail much before, from the arch setup code itself, but
>> panic() still feels like a big hammer.
>
> But not being able to patch the kernel as required means that you get a
> kernel behaving differently from what is expected.
>
> Imagine a kernel running on a board that is controlling a saw. There is
> a patch_instruction() to activate the safety feature which detects when
> your hands are too close to the blade. Do you want the kernel to
> continue running seamlessly when that patch_instruction() fails ? I'm
> sure you don't !
>
;) Sure. Not a fan of playing with blades or saws. :)
>>
>> Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
>> err?
>
> No, that's not enough, you can't rely on a kernel that will no behave as
> expected.
>
Sure, Christophe. Thanks for the clarification.
My main concern was that during module load time we should not bring the
system down. But as I see the behavior in arch_static_call_transform()
is also the same.
And as you said, maybe it's good to panic if an important functionality
like patch_instruction() itself fails which means the kernel may not be
doing what we are expecting it to.
>>
>> Also you said you ran into a problem at just one call site where above
>> was silently failing. With the above change are you able to hit the
>> panic() now? Because from what I see in patch_instruction(), it mainly
>> will boil down to calling __patch_mem() which always returns 0.
>
> As far as I can see, __patch_mem() returns -EPERM when
> __put_kernel_nofault() fails:
>
> static int __patch_mem(void *exec_addr, unsigned long val, void
> *patch_addr, bool is_dword)
> {
> if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> /* For big endian correctness: plain address would use the wrong half */
> u32 val32 = val;
>
> __put_kernel_nofault(patch_addr, &val32, u32, failed);
> } else {
> __put_kernel_nofault(patch_addr, &val, u64, failed);
> }
>
> asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> "r" (exec_addr));
>
> return 0;
>
> failed:
> mb(); /* sync */
> return -EPERM;
> }
>
Right, I somehow missed that "_nofault" part.
>
>> Although there are other places where there can be an error returned,
>> so I was wondering if that is what you were hitting or something else?
>
> Andrew was hitting the -EPERM because the memory area was read-only.
yes, that make sense.
>
> Christophe
Thanks for the comments!
-ritesh
More information about the Linuxppc-dev
mailing list