[RFC PATCH] powerpc/lib: Fixing use a temporary mm for code patching

Christopher M. Riedl cmr at informatik.wtf
Tue Apr 21 14:22:49 AEST 2020


On Sat Apr 18, 2020 at 12:27 PM, Christophe Leroy wrote:
>
> 
>
> 
> Le 15/04/2020 à 18:22, Christopher M Riedl a écrit :
> >> On April 15, 2020 4:12 AM Christophe Leroy <christophe.leroy at c-s.fr> wrote:
> >>
> >>   
> >> Le 15/04/2020 à 07:16, Christopher M Riedl a écrit :
> >>>> On March 26, 2020 9:42 AM Christophe Leroy <christophe.leroy at c-s.fr> wrote:
> >>>>
> >>>>    
> >>>> This patch fixes the RFC series identified below.
> >>>> It fixes three points:
> >>>> - Failure with CONFIG_PPC_KUAP
> >>>> - Failure to write do to lack of DIRTY bit set on the 8xx
> >>>> - Inadequaly complex WARN post verification
> >>>>
> >>>> However, it has an impact on the CPU load. Here is the time
> >>>> needed on an 8xx to run the ftrace selftests without and
> >>>> with this series:
> >>>> - Without CONFIG_STRICT_KERNEL_RWX		==> 38 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX			==> 40 seconds
> >>>> - With CONFIG_STRICT_KERNEL_RWX + this series	==> 43 seconds
> >>>>
> >>>> Link: https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=166003
> >>>> Signed-off-by: Christophe Leroy <christophe.leroy at c-s.fr>
> >>>> ---
> >>>>    arch/powerpc/lib/code-patching.c | 5 ++++-
> >>>>    1 file changed, 4 insertions(+), 1 deletion(-)
> >>>>
> >>>> diff --git a/arch/powerpc/lib/code-patching.c b/arch/powerpc/lib/code-patching.c
> >>>> index f156132e8975..4ccff427592e 100644
> >>>> --- a/arch/powerpc/lib/code-patching.c
> >>>> +++ b/arch/powerpc/lib/code-patching.c
> >>>> @@ -97,6 +97,7 @@ static int map_patch(const void *addr, struct patch_mapping *patch_mapping)
> >>>>    	}
> >>>>    
> >>>>    	pte = mk_pte(page, pgprot);
> >>>> +	pte = pte_mkdirty(pte);
> >>>>    	set_pte_at(patching_mm, patching_addr, ptep, pte);
> >>>>    
> >>>>    	init_temp_mm(&patch_mapping->temp_mm, patching_mm);
> >>>> @@ -168,7 +169,9 @@ static int do_patch_instruction(unsigned int *addr, unsigned int instr)
> >>>>    			(offset_in_page((unsigned long)addr) /
> >>>>    				sizeof(unsigned int));
> >>>>    
> >>>> +	allow_write_to_user(patch_addr, sizeof(instr));
> >>>>    	__patch_instruction(addr, instr, patch_addr);
> >>>> +	prevent_write_to_user(patch_addr, sizeof(instr));
> >>>>
> >>>
> >>> On radix we can map the page with PAGE_KERNEL protection which ends up
> >>> setting EAA[0] in the radix PTE. This means the KUAP (AMR) protection is
> >>> ignored (ISA v3.0b Fig. 35) since we are accessing the page from MSR[PR]=0.
> >>>
> >>> Can we employ a similar approach on the 8xx? I would prefer *not* to wrap
> >>> the __patch_instruction() with the allow_/prevent_write_to_user() KUAP things
> >>> because this is a temporary kernel mapping which really isn't userspace in
> >>> the usual sense.
> >>
> >> On the 8xx, that's pretty different.
> >>
> >> The PTE doesn't control whether a page is user page or a kernel page.
> >> The only thing that is set in the PTE is whether a page is linked to a
> >> given PID or not.
> >> PAGE_KERNEL tells that the page can be addressed with any PID.
> >>
> >> The user access right is given by a kind of zone, which is in the PGD
> >> entry. Every pages above PAGE_OFFSET are defined as belonging to zone 0.
> >> Every pages below PAGE_OFFSET are defined as belonging to zone 1.
> >>
> >> By default, zone 0 can only be accessed by kernel, and zone 1 can only
> >> be accessed by user. When kernel wants to access zone 1, it temporarily
> >> changes properties of zone 1 to allow both kernel and user accesses.
> >>
> >> So, if your mapping is below PAGE_OFFSET, it is in zone 1 and kernel
> >> must unlock it to access it.
> >>
> >>
> >> And this is more or less the same on hash/32. This is managed by segment
> >> registers. One segment register corresponds to a 256Mbytes area. Every
> >> pages below PAGE_OFFSET can only be read by default by kernel. Only user
> >> can write if the PTE allows it. When the kernel needs to write at an
> >> address below PAGE_OFFSET, it must change the segment properties in the
> >> corresponding segment register.
> >>
> >> So, for both cases, if we want to have it local to a task while still
> >> allowing kernel access, it means we have to define a new special area
> >> between TASK_SIZE and PAGE_OFFSET which belongs to kernel zone.
> >>
> >> That looks complex to me for a small benefit, especially as 8xx is not
> >> SMP and neither are most of the hash/32 targets.
> >>
> > 
> > Agreed. So I guess the solution is to differentiate between radix/non-radix
> > and use PAGE_SHARED for non-radix along with the KUAP functions when KUAP
> > is enabled. Hmm, I need to think about this some more, especially if it's
> > acceptable to temporarily map kernel text as PAGE_SHARED for patching. Do
> > you see any obvious problems on 8xx and hash/32 w/ using PAGE_SHARED?
>
> 
> No it shouldn't be a problem AFAICS, except maybe the CPU overhead it
> brings as I mentioned previously (ftrace selftests going from 40 to 43
> seconds ie 8% overhead.
>
Ok great. I will have some performance numbers for POWER8 and POWER9 with
the next spin of the RFC
> 
> > 
> > I don't necessarily want to drop the local mm patching idea for non-radix
> > platforms since that means we would have to maintain two implementations.
> > 
>
> 
> What's the problem with RADIX, why can't PAGE_SHARED be used on radix ?
>
It's not a problem. I would actually prefer to use PAGE_KERNEL since the
mapping is really for a kernel page. On radix using PAGE_KERNEL allows us
to avoid the KUAP functions due to the HW implementation (AMR and EAA).
> 
> Christophe
>
> 
>
> 



More information about the Linuxppc-dev mailing list