[PATCH v5 7/7] powerpc/uaccess: Implement masked user access

Gabriel Paubert paubert at iram.es
Wed Jan 7 19:34:27 AEDT 2026


Hi Christophe,

On Wed, Dec 24, 2025 at 12:20:55PM +0100, Christophe Leroy (CS GROUP) wrote:
> From: Christophe Leroy <christophe.leroy at csgroup.eu>
> 
> Masked user access avoids the address/size verification by access_ok().
> Allthough its main purpose is to skip the speculation in the
> verification of user address and size hence avoid the need of spec
> mitigation, it also has the advantage of reducing the amount of
> instructions required so it even benefits to platforms that don't
> need speculation mitigation, especially when the size of the copy is
> not know at build time.
> 
> So implement masked user access on powerpc. The only requirement is
> to have memory gap that faults between the top user space and the
> real start of kernel area.
> 
> On 64 bits platforms the address space is divided that way:
> 
> 	0xffffffffffffffff	+------------------+
> 				|                  |
> 				|   kernel space   |
>  		 		|                  |
> 	0xc000000000000000	+------------------+  <== PAGE_OFFSET
> 				|//////////////////|
> 				|//////////////////|
> 	0x8000000000000000	|//////////////////|
> 				|//////////////////|
> 				|//////////////////|
> 	0x0010000000000000	+------------------+  <== TASK_SIZE_MAX
> 				|                  |
> 				|    user space    |
> 				|                  |
> 	0x0000000000000000	+------------------+
> 
> Kernel is always above 0x8000000000000000 and user always
> below, with a gap in-between. It leads to a 3 instructions sequence:
> 
>  150:	7c 69 fe 76 	sradi   r9,r3,63
>  154:	79 29 00 40 	clrldi  r9,r9,1
>  158:	7c 63 48 78 	andc    r3,r3,r9
> 
> This sequence leaves r3 unmodified when it is below 0x8000000000000000
> and clamps it to 0x8000000000000000 if it is above.
> 
> On 32 bits it is more tricky. In theory user space can go up to
> 0xbfffffff while kernel will usually start at 0xc0000000. So a gap
> needs to be added in-between. Allthough in theory a single 4k page
> would suffice, it is easier and more efficient to enforce a 128k gap
> below kernel, as it simplifies the masking.
> 
> e500 has the isel instruction which allows selecting one value or
> the other without branch and that instruction is not speculative, so
> use it. Allthough GCC usually generates code using that instruction,
> it is safer to use inline assembly to be sure. The result is:
> 
>   14:	3d 20 bf fe 	lis     r9,-16386
>   18:	7c 03 48 40 	cmplw   r3,r9
>   1c:	7c 69 18 5e 	iselgt  r3,r9,r3
> 
> On other ones, when kernel space is over 0x80000000 and user space
> is below, the logic in mask_user_address_simple() leads to a
> 3 instruction sequence:
> 
>   64:	7c 69 fe 70 	srawi   r9,r3,31
>   68:	55 29 00 7e 	clrlwi  r9,r9,1
>   6c:	7c 63 48 78 	andc    r3,r3,r9
> 
> This is the default on powerpc 8xx.
> 
> When the limit between user space and kernel space is not 0x80000000,
> mask_user_address_32() is used and a 6 instructions sequence is
> generated:


Actually I took the opportunity of a the recenet flu epidemic here
(first me, than my son and finally my wife), to work a bit on this, and
found a way to shrink the gap to 64k with 6 instructions, the exact
sequence depends on the MSB of the boundary, but it's just flipping
between "or" and "andc".

The test code below uses different constant names and interfaces (no
__user annotation for a start), but adapting your mask_user_address_32
to use it is trivial.


#define LIMIT 0x70000000 // or 0xc0000000
#define MASK (LIMIT-0x10000)


/* The generated code is (several gcc versions tested):
 * for LIMIT 0xc0000000
 * 00000000 <mask_addr>:
 * 	addis   r9,r3,16385
 *	andc    r9,r3,r9
 * 	srawi   r9,r9,31
 * 	andc    r3,r3,r9
 *  	andis.  r9,r9,49151
 * 	or      r3,r3,r9
 * 	blr
 * for LIMIT 0x70000000
 * 00000000 <mask_addr>:
 *	addis   r9,r3,4097
 *	or      r9,r9,r3
 *	srawi   r9,r9,31
 *	andc    r3,r3,r9
 *	andis.  r9,r9,28671
 *	or      r3,r3,r9
 * 	blr
 */

With some values of LIMIT, for example 0x70010000, the compiler
generates "rlwinm" instead of "andis.", but that's the only variation
I've seen.

The C code is:

unsigned long masked_addr(unsigned long addr)
{
	unsigned long mask;
	signed long tmp;
	if (MASK & 0x80000000) {
		tmp = addr - MASK; // positive if invalid
		tmp = addr & ~tmp; // positive if valid, else negative
	} else {
		tmp = addr + (0x80000000 - MASK); // negative if invalid
		tmp |= addr; // positive if valid, else negative
	}
	tmp >>= 31;                  // 0 if valid, -1 if not
	mask = tmp & MASK;           // 0 if valid, else LIMIT
	return (addr & ~tmp) | mask; // addr if valid, else LIMIT
}

Regards,
Gabriel

> 
>   24:	54 69 7c 7e 	srwi    r9,r3,17
>   28:	21 29 57 ff 	subfic  r9,r9,22527
>   2c:	7d 29 fe 70 	srawi   r9,r9,31
>   30:	75 2a b0 00 	andis.  r10,r9,45056
>   34:	7c 63 48 78 	andc    r3,r3,r9
>   38:	7c 63 53 78 	or      r3,r3,r10
> 
> The constraint is that TASK_SIZE be aligned to 128K in order to get
> the most optimal number of instructions.
> 
> When CONFIG_PPC_BARRIER_NOSPEC is not defined, fallback on the
> test-based masking as it is quicker than the 6 instructions sequence
> but not quicker than the 3 instructions sequences above.
> 
> As an exemple, allthough barrier_nospec() voids on the 8xx, this
> change has the following impact on strncpy_from_user(): the length of
> the function is reduced from 488 to 340 bytes:
> 
> Start of the function with the patch:
> 
> 00000000 <strncpy_from_user>:
>    0:	7c ab 2b 79 	mr.     r11,r5
>    4:	40 81 01 40 	ble     144 <strncpy_from_user+0x144>
>    8:	7c 89 fe 70 	srawi   r9,r4,31
>    c:	55 29 00 7e 	clrlwi  r9,r9,1
>   10:	7c 84 48 78 	andc    r4,r4,r9
>   14:	3d 20 dc 00 	lis     r9,-9216
>   18:	7d 3a c3 a6 	mtspr   794,r9
>   1c:	2f 8b 00 03 	cmpwi   cr7,r11,3
>   20:	40 9d 00 b4 	ble     cr7,d4 <strncpy_from_user+0xd4>
> ...
> 
> Start of the function without the patch:
> 
> 00000000 <strncpy_from_user>:
>    0:	7c a0 2b 79 	mr.     r0,r5
>    4:	40 81 01 10 	ble     114 <strncpy_from_user+0x114>
>    8:	2f 84 00 00 	cmpwi   cr7,r4,0
>    c:	41 9c 01 30 	blt     cr7,13c <strncpy_from_user+0x13c>
>   10:	3d 20 80 00 	lis     r9,-32768
>   14:	7d 24 48 50 	subf    r9,r4,r9
>   18:	7f 80 48 40 	cmplw   cr7,r0,r9
>   1c:	7c 05 03 78 	mr      r5,r0
>   20:	41 9d 01 00 	bgt     cr7,120 <strncpy_from_user+0x120>
>   24:	3d 20 80 00 	lis     r9,-32768
>   28:	7d 25 48 50 	subf    r9,r5,r9
>   2c:	7f 84 48 40 	cmplw   cr7,r4,r9
>   30:	38 e0 ff f2 	li      r7,-14
>   34:	41 9d 00 e4 	bgt     cr7,118 <strncpy_from_user+0x118>
>   38:	94 21 ff e0 	stwu    r1,-32(r1)
>   3c:	3d 20 dc 00 	lis     r9,-9216
>   40:	7d 3a c3 a6 	mtspr   794,r9
>   44:	2b 85 00 03 	cmplwi  cr7,r5,3
>   48:	40 9d 01 6c 	ble     cr7,1b4 <strncpy_from_user+0x1b4>
> ...
>  118:	7c e3 3b 78 	mr      r3,r7
>  11c:	4e 80 00 20 	blr
>  120:	7d 25 4b 78 	mr      r5,r9
>  124:	3d 20 80 00 	lis     r9,-32768
>  128:	7d 25 48 50 	subf    r9,r5,r9
>  12c:	7f 84 48 40 	cmplw   cr7,r4,r9
>  130:	38 e0 ff f2 	li      r7,-14
>  134:	41 bd ff e4 	bgt     cr7,118 <strncpy_from_user+0x118>
>  138:	4b ff ff 00 	b       38 <strncpy_from_user+0x38>
>  13c:	38 e0 ff f2 	li      r7,-14
>  140:	4b ff ff d8 	b       118 <strncpy_from_user+0x118>
> ...
> 
> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
> ---
> v4: Rebase on top of core-scoped-uaccess tag and simplified as suggested by Gabriel
> 
> v3: Rewrite mask_user_address_simple() for a smaller result on powerpc64, suggested by Gabriel
> 
> v2: Added 'likely()' to the test in mask_user_address_fallback()
> ---
>  arch/powerpc/include/asm/task_size_32.h |  6 +-
>  arch/powerpc/include/asm/uaccess.h      | 76 +++++++++++++++++++++++++
>  2 files changed, 79 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/task_size_32.h b/arch/powerpc/include/asm/task_size_32.h
> index 42a64bbd1964..725ddbf06217 100644
> --- a/arch/powerpc/include/asm/task_size_32.h
> +++ b/arch/powerpc/include/asm/task_size_32.h
> @@ -13,7 +13,7 @@
>  #define MODULES_SIZE	(CONFIG_MODULES_SIZE * SZ_1M)
>  #define MODULES_VADDR	(MODULES_END - MODULES_SIZE)
>  #define MODULES_BASE	(MODULES_VADDR & ~(UL(SZ_4M) - 1))
> -#define USER_TOP	MODULES_BASE
> +#define USER_TOP	(MODULES_BASE - SZ_4M)
>  #endif
>  
>  #ifdef CONFIG_PPC_BOOK3S_32
> @@ -21,11 +21,11 @@
>  #define MODULES_SIZE	(CONFIG_MODULES_SIZE * SZ_1M)
>  #define MODULES_VADDR	(MODULES_END - MODULES_SIZE)
>  #define MODULES_BASE	(MODULES_VADDR & ~(UL(SZ_256M) - 1))
> -#define USER_TOP	MODULES_BASE
> +#define USER_TOP	(MODULES_BASE - SZ_4M)
>  #endif
>  
>  #ifndef USER_TOP
> -#define USER_TOP	ASM_CONST(CONFIG_PAGE_OFFSET)
> +#define USER_TOP	((ASM_CONST(CONFIG_PAGE_OFFSET) - SZ_128K) & ~(UL(SZ_128K) - 1))
>  #endif
>  
>  #if CONFIG_TASK_SIZE < USER_TOP
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 721d65dbbb2e..ba1d878c3f40 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -2,6 +2,8 @@
>  #ifndef _ARCH_POWERPC_UACCESS_H
>  #define _ARCH_POWERPC_UACCESS_H
>  
> +#include <linux/sizes.h>
> +
>  #include <asm/processor.h>
>  #include <asm/page.h>
>  #include <asm/extable.h>
> @@ -435,6 +437,80 @@ static __must_check __always_inline bool __user_access_begin(const void __user *
>  #define user_access_save	prevent_user_access_return
>  #define user_access_restore	restore_user_access
>  
> +/*
> + * Masking the user address is an alternative to a conditional
> + * user_access_begin that can avoid the fencing. This only works
> + * for dense accesses starting at the address.
> + */
> +static inline void __user *mask_user_address_simple(const void __user *ptr)
> +{
> +	unsigned long addr = (unsigned long)ptr;
> +	unsigned long mask = (unsigned long)(((long)addr >> (BITS_PER_LONG - 1)) & LONG_MAX);
> +
> +	return (void __user *)(addr & ~mask);
> +}
> +
> +static inline void __user *mask_user_address_isel(const void __user *ptr)
> +{
> +	unsigned long addr;
> +
> +	asm("cmplw %1, %2; iselgt %0, %2, %1" : "=r"(addr) : "r"(ptr), "r"(TASK_SIZE) : "cr0");
> +
> +	return (void __user *)addr;
> +}
> +
> +/* TASK_SIZE is a multiple of 128K for shifting by 17 to the right */
> +static inline void __user *mask_user_address_32(const void __user *ptr)
> +{
> +	unsigned long addr = (unsigned long)ptr;
> +	unsigned long mask = (unsigned long)((long)((TASK_SIZE >> 17) - 1 - (addr >> 17)) >> 31);
> +
> +	addr = (addr & ~mask) | (TASK_SIZE & mask);
> +
> +	return (void __user *)addr;
> +}
> +
> +static inline void __user *mask_user_address_fallback(const void __user *ptr)
> +{
> +	unsigned long addr = (unsigned long)ptr;
> +
> +	return (void __user *)(likely(addr < TASK_SIZE) ? addr : TASK_SIZE);
> +}
> +
> +static inline void __user *mask_user_address(const void __user *ptr)
> +{
> +#ifdef MODULES_VADDR
> +	const unsigned long border = MODULES_VADDR;
> +#else
> +	const unsigned long border = PAGE_OFFSET;
> +#endif
> +
> +	if (IS_ENABLED(CONFIG_PPC64))
> +		return mask_user_address_simple(ptr);
> +	if (IS_ENABLED(CONFIG_E500))
> +		return mask_user_address_isel(ptr);
> +	if (TASK_SIZE <= UL(SZ_2G) && border >= UL(SZ_2G))
> +		return mask_user_address_simple(ptr);
> +	if (IS_ENABLED(CONFIG_PPC_BARRIER_NOSPEC))
> +		return mask_user_address_32(ptr);
> +	return mask_user_address_fallback(ptr);
> +}
> +
> +static __always_inline void __user *__masked_user_access_begin(const void __user *p,
> +							       unsigned long dir)
> +{
> +	void __user *ptr = mask_user_address(p);
> +
> +	might_fault();
> +	allow_user_access(ptr, dir);
> +
> +	return ptr;
> +}
> +
> +#define masked_user_access_begin(p) __masked_user_access_begin(p, KUAP_READ_WRITE)
> +#define masked_user_read_access_begin(p) __masked_user_access_begin(p, KUAP_READ)
> +#define masked_user_write_access_begin(p) __masked_user_access_begin(p, KUAP_WRITE)
> +
>  #define arch_unsafe_get_user(x, p, e) do {			\
>  	__long_type(*(p)) __gu_val;				\
>  	__typeof__(*(p)) __user *__gu_addr = (p);		\
> -- 
> 2.49.0
> 
> 
 

 



More information about the Linuxppc-dev mailing list