[RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching

Christophe Leroy christophe.leroy at c-s.fr
Wed Apr 15 19:12:35 AEST 2020



Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy at c-s.fr> wrote:
>>
>>   
>> This patch fixes the RFC series identified below.
>> It fixes three points:
>> - Failure with CONFIG_PPC_KUAP
>> - Failure to write do to lack of DIRTY bit set on the 8xx
>> - Inadequaly complex WARN post verification
>>
>> However, it has an impact on the CPU load. Here is the time
>> needed on an 8xx to run the ftrace selftests without and
>> with this series:
>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
>>
>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
>> Signed-off-by: Christophe Leroy <christophe.leroy at c-s.fr>
>> ---
>>   arch/powerpc/lib/code-patching.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
>> index f156132e8975..4ccff427592e 100644
>> --- a/arch/powerpc/lib/code-patching.c
>> +++ b/arch/powerpc/lib/code-patching.c
>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
>>   	}
>>   
>>   	pte = mk_pte(page, pgprot);
>> +	pte = pte_mkdirty(pte);
>>   	set_pte_at(patching_mm, patching_addr, ptep, pte);
>>   
>>   	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
>>   			(offset_in_page((unsigned long)addr) /
>>   				sizeof(unsigned int));
>>   
>> +	allow_write_to_user(patch_addr, sizeof(instr));
>>   	__patch_instruction(addr, instr, patch_addr);
>> +	prevent_write_to_user(patch_addr, sizeof(instr));
>>
> 
> On radix we can map the page with PAGE_KERNEL protection which ends up
> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> 
> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> because this is a temporary kernel mapping which really isn't userspace in
> the usual sense.

On the 8xx, that's pretty different.

The PTE doesn't control whether a page is user page or a kernel page. 
The only thing that is set in the PTE is whether a page is linked to a 
given PID or not.
PAGE_KERNEL tells that the page can be addressed with any PID.

The user access right is given by a kind of zone, which is in the PGD 
entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0. 
Every pages below PAGE_OFFSET are defined as belonging to zone 1.

By default, zone 0 can only be accessed by kernel, and zone 1 can only 
be accessed by user. When kernel wants to access zone 1, it temporarily 
changes properties of zone 1 to allow both kernel and user accesses.

So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel 
must unlock it to access it.


And this is more or less the same on hash/32. This is managed by segment 
registers. One segment register corresponds to a 256Mbytes area. Every 
pages below PAGE_OFFSET can only be read by default by kernel. Only user 
can write if the PTE allows it. When the kernel needs to write at an 
address below PAGE_OFFSET, it must change the segment properties in the 
corresponding segment register.

So, for both cases, if we want to have it local to a task while still 
allowing kernel access, it means we have to define a new special area 
between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.

That looks complex to me for a small benefit, especially as 8xx is not 
SMP and neither are most of the hash/32 targets.

Christophe


More information about the Linuxppc-dev mailing list