[PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

Christophe Leroy christophe.leroy at csgroup.eu
Mon Feb 1 22:39:16 AEDT 2021



Le 01/02/2021 à 11:22, PLATTNER Christoph a écrit :
> Hello to all, and thank you very much for first and second fast response.
> 
> I do not have a long history on PowerPC MMU environment, I hacked into this topic
> for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

Yes you are wrong on some points, sorry, see below.


> 
> What I learn so far from this MPC5121e (variant of e300c4 core):
> - It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the
>     branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
>     used on this CPU (not supporting MMU_FTR_HPTE_TABLE)

hash method is not used, this is SW TLB loading that is used, but still, all the PP and Ks/Kp keys 
defined in the segment register are used, see e300 core reference manual §6.4.2 Page Memory Protection

> - The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
>     As far I have understood, the TLB miss routines are responsible for checking permissions.
>     The TLB miss routines check the Linux PTE styled entries and generates the PP bits
>     for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...

PP bits ARE checked hoppefully. If it was not the case, then the TLB miss routines would install a 
TLB on a read, then the user could do a write without any verification being done ?

Refer to e300 Core reference Manual, §6.1.4 Memory Protection Facilities

As I explained in the patch, the problem is not that the HW doesn't check the permission. It is that 
user accessed been done with key 0 as programmed in the segment registers, PP 00 means RW access.

> - The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
>     read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

No, the PTE are read by the TLB miss exception handlers and writen into TLB entries.

> 
> In summary - as far as I understand it now - we have to handle the PTE bits differently
> (Linux style) for PROT_NONE permissions - OR - we have to expand the permission
> checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
> but may not used very often ...).

Yes, expanding the permission checking is the easiest solution, hence the patch I sent out based on 
your proposal.

> 
> Another related point:
> According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
> it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED
> bit was performed after successful permission check:
> 
>          bne-    DataAddressInvalid      /* return if access not permitted */
>          ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
>          /*
>           * NOTE! We are assuming this is not an SMP system, otherwise
>           * we would need to update the pte atomically with lwarx/stwcx.
>           */
>          stw     r0,0(r2)                /* update PTE (accessed bit) */
>          /* Convert linux-style PTE to low word of PPC-style PTE */
> 
> Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
> PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED
> is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):
> 
> 	cmplw	0,r1,r3
>   	mfspr	r2, SPRN_SDR1
> 	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
> 	rlwinm	r2, r2, 28, 0xfffff000
>   	bgt-	112f
> 
> What is the reason or relevance for checking this here ?
> Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
> Do you know the reason of change on this point ?

PAGE_ACCESSED is important for memory management, linux kernel need it.

But instead of spending time at every miss to perform a write which will be a no-op in 99% of cases, 
we prefer bailing out to the page_fault logic when the accessed bit is not set. Then the page_fault 
logic will set the bit.
This also allowed to simplify the handling in __set_pte()_at function by avoiding races in the 
update of PTEs.

> 
> Another remark to Core manual relevant for this:
> There is the reference manual for e300 core available (e300 RM). It includes
> many remarks in range of Memory Management section, that many features
> are optional or variable for dedicated implementations. On the other hand,
> the MPC5121e reference manual refers to the e300 core RM, but DOES NOT
> information, which of the optional points are there or nor. According my
> analysis, MPC5121e does not include any of the optional features.
> 

Not sure what you mean. As far as I understand, that chapter tells you that some functionnalities 
are optional for the powerpc architectecture, and provided (or not) by the e300 core. The MPC5121 
supports all the things that are defined by e300 core.


> 
> Thanks a lot for first reactions

You are welcome, don't hesitate if you have additional questions.

Christophe


More information about the Linuxppc-dev mailing list