[PATCH RFC] powerpc: Panic on jump label code patching failure

Tue Sep 9 13:19:05 AEST 2025

Christophe Leroy <christophe.leroy at csgroup.eu> writes:

> Le 06/09/2025 à 05:52, Ritesh Harjani a écrit :
>> [Vous ne recevez pas souvent de courriers de riteshh at linux.ibm.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
>> 
>> Andrew Donnellan <ajd at linux.ibm.com> writes:
>> 
>>> If patch_branch() or patch_instruction() fails while updating a jump
>>> label, we presently fail silently, leading to unpredictable behaviour
>>> later on.
>>>
>>> Change arch_jump_label_transform() to panic on a code patching failure,
>>> matching the existing behaviour of arch_static_call_transform().
>>>
>>> Reported-by: Erhard Furtner <erhard_f at mailbox.org>
>>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>>>
>>> ---
>>>
>>> Ran into this while debugging an issue that Erhard reported to me about my
>>> PAGE_TABLE_CHECK series on a G4, where updating a static key failed
>>> silently, but only for one call site, leading to an incorrect reference
>>> count later on. This looks to be due to the issue fixed in [0]. A loud
>>> failure would have saved us all considerable debugging time.
>>>
>>> Should I change the return type of arch_jump_label_transform() and handle
>>> this in an arch-independent way? Are there other users of code patching
>>> in powerpc that ought to be hardened?
>>>
>>> Or is this excessive?
>>>
>>> [0] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4b5e6eb281d7b1ea77619bee17095f905a125168.1757003584.git.christophe.leroy@csgroup.eu/
>>> ---
>>>   arch/powerpc/kernel/jump_label.c | 9 +++++++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
>>> index 2659e1ac8604..80d41ed7ac50 100644
>>> --- a/arch/powerpc/kernel/jump_label.c
>>> +++ b/arch/powerpc/kernel/jump_label.c
>>> @@ -12,9 +12,14 @@ void arch_jump_label_transform(struct jump_entry *entry,
>>>                               enum jump_label_type type)
>>>   {
>>>        u32 *addr = (u32 *)jump_entry_code(entry);
>>> +     int err;
>>>
>>>        if (type == JUMP_LABEL_JMP)
>>> -             patch_branch(addr, jump_entry_target(entry), 0);
>>> +             err = patch_branch(addr, jump_entry_target(entry), 0);
>>>        else
>>> -             patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>>> +             err = patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>>> +
>>> +     if (err)
>>> +             panic("%s: patching failed, err %d, type %d, addr %pS, target %pS\n",
>>> +                   __func__, err, type, addr, (void *)jump_entry_target(entry));
>>>   }
>> 
>> arch_jump_label_transform() is mainly getting called from
>> __jump_level_update() and it's used for enabling or updating static keys / branch.
>> 
>> But static keys can also be used by drivers / module subsystem whose
>> initialization happens late. Although I understand that if the above
>> fails, it might fail much before, from the arch setup code itself, but
>> panic() still feels like a big hammer.
>
> But not being able to patch the kernel as required means that you get a 
> kernel behaving differently from what is expected.
>
> Imagine a kernel running on a board that is controlling a saw. There is 
> a patch_instruction() to activate the safety feature which detects when 
> your hands are too close to the blade. Do you want the kernel to 
> continue running seamlessly when that patch_instruction() fails ? I'm 
> sure you don't !
>

;) Sure. Not a fan of playing with blades or saws. :) 

>> 
>> Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
>> err?
>
> No, that's not enough, you can't rely on a kernel that will no behave as 
> expected.
>

Sure, Christophe. Thanks for the clarification.

My main concern was that during module load time we should not bring the
system down. But as I see the behavior in arch_static_call_transform()
is also the same.
And as you said, maybe it's good to panic if an important functionality
like patch_instruction() itself fails which means the kernel may not be
doing what we are expecting it to.

>> 
>> Also you said you ran into a problem at just one call site where above
>> was silently failing. With the above change are you able to hit the
>> panic() now? Because from what I see in patch_instruction(), it mainly
>> will boil down to calling __patch_mem() which always returns 0.
>
> As far as I can see, __patch_mem() returns -EPERM when 
> __put_kernel_nofault() fails:
>
> static int __patch_mem(void *exec_addr, unsigned long val, void 
> *patch_addr, bool is_dword)
> {
> 	if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
> 		/* For big endian correctness: plain address would use the wrong half */
> 		u32 val32 = val;
>
> 		__put_kernel_nofault(patch_addr, &val32, u32, failed);
> 	} else {
> 		__put_kernel_nofault(patch_addr, &val, u64, failed);
> 	}
>
> 	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
> 							    "r" (exec_addr));
>
> 	return 0;
>
> failed:
> 	mb();  /* sync */
> 	return -EPERM;
> }
>

Right, I somehow missed that "_nofault" part.

>
>> Although there are other places where there can be an error returned,
>> so I was wondering if that is what you were hitting or something else?
>
> Andrew was hitting the -EPERM because the memory area was read-only.

yes, that make sense.

>
> Christophe

Thanks for the comments!

-ritesh