[PATCH RFC] powerpc: Panic on jump label code patching failure

Sat Sep 6 16:42:07 AEST 2025

Le 06/09/2025 à 05:52, Ritesh Harjani a écrit :
> [Vous ne recevez pas souvent de courriers de riteshh at linux.ibm.com. Découvrez pourquoi ceci est important à https://aka.ms/LearnAboutSenderIdentification ]
> 
> Andrew Donnellan <ajd at linux.ibm.com> writes:
> 
>> If patch_branch() or patch_instruction() fails while updating a jump
>> label, we presently fail silently, leading to unpredictable behaviour
>> later on.
>>
>> Change arch_jump_label_transform() to panic on a code patching failure,
>> matching the existing behaviour of arch_static_call_transform().
>>
>> Reported-by: Erhard Furtner <erhard_f at mailbox.org>
>> Signed-off-by: Andrew Donnellan <ajd at linux.ibm.com>
>>
>> ---
>>
>> Ran into this while debugging an issue that Erhard reported to me about my
>> PAGE_TABLE_CHECK series on a G4, where updating a static key failed
>> silently, but only for one call site, leading to an incorrect reference
>> count later on. This looks to be due to the issue fixed in [0]. A loud
>> failure would have saved us all considerable debugging time.
>>
>> Should I change the return type of arch_jump_label_transform() and handle
>> this in an arch-independent way? Are there other users of code patching
>> in powerpc that ought to be hardened?
>>
>> Or is this excessive?
>>
>> [0] https://patchwork.ozlabs.org/project/linuxppc-dev/patch/4b5e6eb281d7b1ea77619bee17095f905a125168.1757003584.git.christophe.leroy@csgroup.eu/
>> ---
>>   arch/powerpc/kernel/jump_label.c | 9 +++++++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/arch/powerpc/kernel/jump_label.c b/arch/powerpc/kernel/jump_label.c
>> index 2659e1ac8604..80d41ed7ac50 100644
>> --- a/arch/powerpc/kernel/jump_label.c
>> +++ b/arch/powerpc/kernel/jump_label.c
>> @@ -12,9 +12,14 @@ void arch_jump_label_transform(struct jump_entry *entry,
>>                               enum jump_label_type type)
>>   {
>>        u32 *addr = (u32 *)jump_entry_code(entry);
>> +     int err;
>>
>>        if (type == JUMP_LABEL_JMP)
>> -             patch_branch(addr, jump_entry_target(entry), 0);
>> +             err = patch_branch(addr, jump_entry_target(entry), 0);
>>        else
>> -             patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>> +             err = patch_instruction(addr, ppc_inst(PPC_RAW_NOP()));
>> +
>> +     if (err)
>> +             panic("%s: patching failed, err %d, type %d, addr %pS, target %pS\n",
>> +                   __func__, err, type, addr, (void *)jump_entry_target(entry));
>>   }
> 
> arch_jump_label_transform() is mainly getting called from
> __jump_level_update() and it's used for enabling or updating static keys / branch.
> 
> But static keys can also be used by drivers / module subsystem whose
> initialization happens late. Although I understand that if the above
> fails, it might fail much before, from the arch setup code itself, but
> panic() still feels like a big hammer.

But not being able to patch the kernel as required means that you get a 
kernel behaving differently from what is expected.

Imagine a kernel running on a board that is controlling a saw. There is 
a patch_instruction() to activate the safety feature which detects when 
your hands are too close to the blade. Do you want the kernel to 
continue running seamlessly when that patch_instruction() fails ? I'm 
sure you don't !

> 
> Would pr_err() print with WARN_ON_ONCE(1) would suffice in case of an
> err?

No, that's not enough, you can't rely on a kernel that will no behave as 
expected.

> 
> Also you said you ran into a problem at just one call site where above
> was silently failing. With the above change are you able to hit the
> panic() now? Because from what I see in patch_instruction(), it mainly
> will boil down to calling __patch_mem() which always returns 0.

As far as I can see, __patch_mem() returns -EPERM when 
__put_kernel_nofault() fails:

static int __patch_mem(void *exec_addr, unsigned long val, void 
*patch_addr, bool is_dword)
{
	if (!IS_ENABLED(CONFIG_PPC64) || likely(!is_dword)) {
		/* For big endian correctness: plain address would use the wrong half */
		u32 val32 = val;

		__put_kernel_nofault(patch_addr, &val32, u32, failed);
	} else {
		__put_kernel_nofault(patch_addr, &val, u64, failed);
	}

	asm ("dcbst 0, %0; sync; icbi 0,%1; sync; isync" :: "r" (patch_addr),
							    "r" (exec_addr));

	return 0;

failed:
	mb();  /* sync */
	return -EPERM;
}

> Although there are other places where there can be an error returned,
> so I was wondering if that is what you were hitting or something else?

Andrew was hitting the -EPERM because the memory area was read-only.

Christophe