[PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

PLATTNER Christoph christoph.plattner at thalesgroup.com
Mon Feb 1 21:22:16 AEDT 2021


Hello to all, and thank you very much for first and second fast response.

I do not have a long history on PowerPC MMU environment, I hacked into this topic
for about 3 months for analyzing that problem- so, sorry, if I am wrong in some points ...

What I learn so far from this MPC5121e (variant of e300c4 core):
- It uses book3s32 hash-code, but it DOES NOT provide KEY hash method, so always the 
   branch  "if (! Hash) ...." is taken, so, I assume that "key 0" and "key 1" setups are not
   used on this CPU (not supporting MMU_FTR_HPTE_TABLE)
- The PP bits are NOT checked by the CPU in HW, even if set to 00, the CPU does not react.
   As far I have understood, the TLB miss routines are responsible for checking permissions.
   The TLB miss routines check the Linux PTE styled entries and generates the PP bits
   for the TLB entry. The PowerPC PP bits are never check elsewhere on that CPU models ...
- The PTE entries in Linux are fully "void" in sense of this CPU type, as this CPU does not
   read any PTEs from RAM (no HW support in contrast to x86 or ARM or later ppc...).

In summary - as far as I understand it now - we have to handle the PTE bits differently
(Linux style) for PROT_NONE permissions - OR - we have to expand the permission 
checking like my proposed experimental patch. (PROT_NONE is not NUMA related only,
but may not used very often ...).

Another related point:
According e300 RM (manual) the ACCESSED bit in the PTE shall be set on TLB miss, as
it is an indication, that page is used. In 4.4 kernel this write back of the _PAGE_ACCESSED 
bit was performed after successful permission check:

        bne-    DataAddressInvalid      /* return if access not permitted */
        ori     r0,r0,_PAGE_ACCESSED    /* set _PAGE_ACCESSED in pte */
        /*
         * NOTE! We are assuming this is not an SMP system, otherwise
         * we would need to update the pte atomically with lwarx/stwcx.
         */
        stw     r0,0(r2)                /* update PTE (accessed bit) */
        /* Convert linux-style PTE to low word of PPC-style PTE */

Bit is set (ori ...) and written back (stw ...) to Linux PTE. May be, this is not needed, as the
PTE is never seen by the PPC chip. But I do not understand, WHY the PAGE_ACCCESSED 
is used for permission check in the late 5.4 kernel (not used in 4.4 kernel):

	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f

What is the reason or relevance for checking this here ?
Was not checked in 4.4, bit or-ed afterwards, as it is accessed now.
Do you know the reason of change on this point ?

Another remark to Core manual relevant for this:
There is the reference manual for e300 core available (e300 RM). It includes
many remarks in range of Memory Management section, that many features
are optional or variable for dedicated implementations. On the other hand, 
the MPC5121e reference manual refers to the e300 core RM, but DOES NOT 
information, which of the optional points are there or nor. According my
analysis, MPC5121e does not include any of the optional features.


Thanks a lot for first reactions
Christoph


-----Original Message-----
From: Christophe Leroy <christophe.leroy at csgroup.eu> 
Sent: Montag, 1. Februar 2021 07:30
To: Benjamin Herrenschmidt <benh at kernel.crashing.org>; Paul Mackerras <paulus at samba.org>; Michael Ellerman <mpe at ellerman.id.au>; PLATTNER Christoph <christoph.plattner at thalesgroup.com>
Cc: linux-kernel at vger.kernel.org; linuxppc-dev at lists.ozlabs.org
Subject: [PATCH] powerpc/603: Fix protection of user pages mapped with PROT_NONE

On book3s/32, page protection is defined by the PP bits in the PTE which provide the following protection depending on the access keys defined in the matching segment register:
- PP 00 means RW with key 0 and N/A with key 1.
- PP 01 means RW with key 0 and RO with key 1.
- PP 10 means RW with both key 0 and key 1.
- PP 11 means RO with both key 0 and key 1.

Since the implementation of kernel userspace access protection, PP bits have been set as follows:
- PP00 for pages without _PAGE_USER
- PP01 for pages with _PAGE_USER and _PAGE_RW
- PP11 for pages with _PAGE_USER and without _PAGE_RW

For kernelspace segments, kernel accesses are performed with key 0 and user accesses are performed with key 1. As PP00 is used for non _PAGE_USER pages, user can't access kernel pages not flagged _PAGE_USER while kernel can.

For userspace segments, both kernel and user accesses are performed with key 0, therefore pages not flagged _PAGE_USER are still accessible to the user.

This shouldn't be an issue, because userspace is expected to be accessible to the user. But unlike most other architectures, powerpc implements PROT_NONE protection by removing _PAGE_USER flag instead of flagging the page as not valid. This means that pages in userspace that are not flagged _PAGE_USER shall remain inaccessible.

To get the expected behaviour, just mimic other architectures in the TLB miss handler by checking _PAGE_USER permission on userspace accesses as if it was the _PAGE_PRESENT bit.

Note that this problem only is only for 603 cores. The 604+ have an hash table, and hash_page() function already implement the verification of _PAGE_USER permission on userspace pages.

Reported-by: Christoph Plattner <christoph.plattner at thalesgroup.com>
Fixes: f342adca3afc ("powerpc/32s: Prepare Kernel Userspace Access Protection")
Cc: stable at vger.kernel.org
Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
---
 arch/powerpc/kernel/head_book3s_32.S | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/head_book3s_32.S b/arch/powerpc/kernel/head_book3s_32.S
index 858fbc8b19f3..0004e8a6a58e 100644
--- a/arch/powerpc/kernel/head_book3s_32.S
+++ b/arch/powerpc/kernel/head_book3s_32.S
@@ -453,11 +453,12 @@ InstructionTLBMiss:
 	cmplw	0,r1,r3
 #endif
 	mfspr	r2, SPRN_SDR1
-	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 #ifdef CONFIG_MODULES
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1,_PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_EXEC
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 #endif
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
@@ -516,10 +517,11 @@ DataLoadTLBMiss:
 	lis	r1, TASK_SIZE at h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
@@ -593,10 +595,11 @@ DataStoreTLBMiss:
 	lis	r1, TASK_SIZE at h		/* check if kernel address */
 	cmplw	0,r1,r3
 	mfspr	r2, SPRN_SDR1
-	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED | _PAGE_USER
 	rlwinm	r2, r2, 28, 0xfffff000
 	bgt-	112f
 	lis	r2, (swapper_pg_dir - PAGE_OFFSET)@ha	/* if kernel address, use */
+	li	r1, _PAGE_RW | _PAGE_DIRTY | _PAGE_PRESENT | _PAGE_ACCESSED
 	addi	r2, r2, (swapper_pg_dir - PAGE_OFFSET)@l	/* kernel page table */
 112:	rlwimi	r2,r3,12,20,29		/* insert top 10 bits of address */
 	lwz	r2,0(r2)		/* get pmd entry */
--
2.25.0



More information about the Linuxppc-dev mailing list