[PATCH v5 7/7] powerpc/uaccess: Implement masked user access
Gabriel Paubert
paubert at iram.es
Wed Jan 7 19:34:27 AEDT 2026
Hi Christophe,
On Wed, Dec 24, 2025 at 12:20:55PM +0100, Christophe Leroy (CS GROUP) wrote:
> From: Christophe Leroy <christophe.leroy at csgroup.eu>
>
> Masked user access avoids the address/size verification by access_ok().
> Allthough its main purpose is to skip the speculation in the
> verification of user address and size hence avoid the need of spec
> mitigation, it also has the advantage of reducing the amount of
> instructions required so it even benefits to platforms that don't
> need speculation mitigation, especially when the size of the copy is
> not know at build time.
>
> So implement masked user access on powerpc. The only requirement is
> to have memory gap that faults between the top user space and the
> real start of kernel area.
>
> On 64 bits platforms the address space is divided that way:
>
> 0xffffffffffffffff +------------------+
> | |
> | kernel space |
> | |
> 0xc000000000000000 +------------------+ <== PAGE_OFFSET
> |//////////////////|
> |//////////////////|
> 0x8000000000000000 |//////////////////|
> |//////////////////|
> |//////////////////|
> 0x0010000000000000 +------------------+ <== TASK_SIZE_MAX
> | |
> | user space |
> | |
> 0x0000000000000000 +------------------+
>
> Kernel is always above 0x8000000000000000 and user always
> below, with a gap in-between. It leads to a 3 instructions sequence:
>
> 150: 7c 69 fe 76 sradi r9,r3,63
> 154: 79 29 00 40 clrldi r9,r9,1
> 158: 7c 63 48 78 andc r3,r3,r9
>
> This sequence leaves r3 unmodified when it is below 0x8000000000000000
> and clamps it to 0x8000000000000000 if it is above.
>
> On 32 bits it is more tricky. In theory user space can go up to
> 0xbfffffff while kernel will usually start at 0xc0000000. So a gap
> needs to be added in-between. Allthough in theory a single 4k page
> would suffice, it is easier and more efficient to enforce a 128k gap
> below kernel, as it simplifies the masking.
>
> e500 has the isel instruction which allows selecting one value or
> the other without branch and that instruction is not speculative, so
> use it. Allthough GCC usually generates code using that instruction,
> it is safer to use inline assembly to be sure. The result is:
>
> 14: 3d 20 bf fe lis r9,-16386
> 18: 7c 03 48 40 cmplw r3,r9
> 1c: 7c 69 18 5e iselgt r3,r9,r3
>
> On other ones, when kernel space is over 0x80000000 and user space
> is below, the logic in mask_user_address_simple() leads to a
> 3 instruction sequence:
>
> 64: 7c 69 fe 70 srawi r9,r3,31
> 68: 55 29 00 7e clrlwi r9,r9,1
> 6c: 7c 63 48 78 andc r3,r3,r9
>
> This is the default on powerpc 8xx.
>
> When the limit between user space and kernel space is not 0x80000000,
> mask_user_address_32() is used and a 6 instructions sequence is
> generated:
Actually I took the opportunity of a the recenet flu epidemic here
(first me, than my son and finally my wife), to work a bit on this, and
found a way to shrink the gap to 64k with 6 instructions, the exact
sequence depends on the MSB of the boundary, but it's just flipping
between "or" and "andc".
The test code below uses different constant names and interfaces (no
__user annotation for a start), but adapting your mask_user_address_32
to use it is trivial.
#define LIMIT 0x70000000 // or 0xc0000000
#define MASK (LIMIT-0x10000)
/* The generated code is (several gcc versions tested):
* for LIMIT 0xc0000000
* 00000000 <mask_addr>:
* addis r9,r3,16385
* andc r9,r3,r9
* srawi r9,r9,31
* andc r3,r3,r9
* andis. r9,r9,49151
* or r3,r3,r9
* blr
* for LIMIT 0x70000000
* 00000000 <mask_addr>:
* addis r9,r3,4097
* or r9,r9,r3
* srawi r9,r9,31
* andc r3,r3,r9
* andis. r9,r9,28671
* or r3,r3,r9
* blr
*/
With some values of LIMIT, for example 0x70010000, the compiler
generates "rlwinm" instead of "andis.", but that's the only variation
I've seen.
The C code is:
unsigned long masked_addr(unsigned long addr)
{
unsigned long mask;
signed long tmp;
if (MASK & 0x80000000) {
tmp = addr - MASK; // positive if invalid
tmp = addr & ~tmp; // positive if valid, else negative
} else {
tmp = addr + (0x80000000 - MASK); // negative if invalid
tmp |= addr; // positive if valid, else negative
}
tmp >>= 31; // 0 if valid, -1 if not
mask = tmp & MASK; // 0 if valid, else LIMIT
return (addr & ~tmp) | mask; // addr if valid, else LIMIT
}
Regards,
Gabriel
>
> 24: 54 69 7c 7e srwi r9,r3,17
> 28: 21 29 57 ff subfic r9,r9,22527
> 2c: 7d 29 fe 70 srawi r9,r9,31
> 30: 75 2a b0 00 andis. r10,r9,45056
> 34: 7c 63 48 78 andc r3,r3,r9
> 38: 7c 63 53 78 or r3,r3,r10
>
> The constraint is that TASK_SIZE be aligned to 128K in order to get
> the most optimal number of instructions.
>
> When CONFIG_PPC_BARRIER_NOSPEC is not defined, fallback on the
> test-based masking as it is quicker than the 6 instructions sequence
> but not quicker than the 3 instructions sequences above.
>
> As an exemple, allthough barrier_nospec() voids on the 8xx, this
> change has the following impact on strncpy_from_user(): the length of
> the function is reduced from 488 to 340 bytes:
>
> Start of the function with the patch:
>
> 00000000 <strncpy_from_user>:
> 0: 7c ab 2b 79 mr. r11,r5
> 4: 40 81 01 40 ble 144 <strncpy_from_user+0x144>
> 8: 7c 89 fe 70 srawi r9,r4,31
> c: 55 29 00 7e clrlwi r9,r9,1
> 10: 7c 84 48 78 andc r4,r4,r9
> 14: 3d 20 dc 00 lis r9,-9216
> 18: 7d 3a c3 a6 mtspr 794,r9
> 1c: 2f 8b 00 03 cmpwi cr7,r11,3
> 20: 40 9d 00 b4 ble cr7,d4 <strncpy_from_user+0xd4>
> ...
>
> Start of the function without the patch:
>
> 00000000 <strncpy_from_user>:
> 0: 7c a0 2b 79 mr. r0,r5
> 4: 40 81 01 10 ble 114 <strncpy_from_user+0x114>
> 8: 2f 84 00 00 cmpwi cr7,r4,0
> c: 41 9c 01 30 blt cr7,13c <strncpy_from_user+0x13c>
> 10: 3d 20 80 00 lis r9,-32768
> 14: 7d 24 48 50 subf r9,r4,r9
> 18: 7f 80 48 40 cmplw cr7,r0,r9
> 1c: 7c 05 03 78 mr r5,r0
> 20: 41 9d 01 00 bgt cr7,120 <strncpy_from_user+0x120>
> 24: 3d 20 80 00 lis r9,-32768
> 28: 7d 25 48 50 subf r9,r5,r9
> 2c: 7f 84 48 40 cmplw cr7,r4,r9
> 30: 38 e0 ff f2 li r7,-14
> 34: 41 9d 00 e4 bgt cr7,118 <strncpy_from_user+0x118>
> 38: 94 21 ff e0 stwu r1,-32(r1)
> 3c: 3d 20 dc 00 lis r9,-9216
> 40: 7d 3a c3 a6 mtspr 794,r9
> 44: 2b 85 00 03 cmplwi cr7,r5,3
> 48: 40 9d 01 6c ble cr7,1b4 <strncpy_from_user+0x1b4>
> ...
> 118: 7c e3 3b 78 mr r3,r7
> 11c: 4e 80 00 20 blr
> 120: 7d 25 4b 78 mr r5,r9
> 124: 3d 20 80 00 lis r9,-32768
> 128: 7d 25 48 50 subf r9,r5,r9
> 12c: 7f 84 48 40 cmplw cr7,r4,r9
> 130: 38 e0 ff f2 li r7,-14
> 134: 41 bd ff e4 bgt cr7,118 <strncpy_from_user+0x118>
> 138: 4b ff ff 00 b 38 <strncpy_from_user+0x38>
> 13c: 38 e0 ff f2 li r7,-14
> 140: 4b ff ff d8 b 118 <strncpy_from_user+0x118>
> ...
>
> Signed-off-by: Christophe Leroy <christophe.leroy at csgroup.eu>
> ---
> v4: Rebase on top of core-scoped-uaccess tag and simplified as suggested by Gabriel
>
> v3: Rewrite mask_user_address_simple() for a smaller result on powerpc64, suggested by Gabriel
>
> v2: Added 'likely()' to the test in mask_user_address_fallback()
> ---
> arch/powerpc/include/asm/task_size_32.h | 6 +-
> arch/powerpc/include/asm/uaccess.h | 76 +++++++++++++++++++++++++
> 2 files changed, 79 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/task_size_32.h b/arch/powerpc/include/asm/task_size_32.h
> index 42a64bbd1964..725ddbf06217 100644
> --- a/arch/powerpc/include/asm/task_size_32.h
> +++ b/arch/powerpc/include/asm/task_size_32.h
> @@ -13,7 +13,7 @@
> #define MODULES_SIZE (CONFIG_MODULES_SIZE * SZ_1M)
> #define MODULES_VADDR (MODULES_END - MODULES_SIZE)
> #define MODULES_BASE (MODULES_VADDR & ~(UL(SZ_4M) - 1))
> -#define USER_TOP MODULES_BASE
> +#define USER_TOP (MODULES_BASE - SZ_4M)
> #endif
>
> #ifdef CONFIG_PPC_BOOK3S_32
> @@ -21,11 +21,11 @@
> #define MODULES_SIZE (CONFIG_MODULES_SIZE * SZ_1M)
> #define MODULES_VADDR (MODULES_END - MODULES_SIZE)
> #define MODULES_BASE (MODULES_VADDR & ~(UL(SZ_256M) - 1))
> -#define USER_TOP MODULES_BASE
> +#define USER_TOP (MODULES_BASE - SZ_4M)
> #endif
>
> #ifndef USER_TOP
> -#define USER_TOP ASM_CONST(CONFIG_PAGE_OFFSET)
> +#define USER_TOP ((ASM_CONST(CONFIG_PAGE_OFFSET) - SZ_128K) & ~(UL(SZ_128K) - 1))
> #endif
>
> #if CONFIG_TASK_SIZE < USER_TOP
> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> index 721d65dbbb2e..ba1d878c3f40 100644
> --- a/arch/powerpc/include/asm/uaccess.h
> +++ b/arch/powerpc/include/asm/uaccess.h
> @@ -2,6 +2,8 @@
> #ifndef _ARCH_POWERPC_UACCESS_H
> #define _ARCH_POWERPC_UACCESS_H
>
> +#include <linux/sizes.h>
> +
> #include <asm/processor.h>
> #include <asm/page.h>
> #include <asm/extable.h>
> @@ -435,6 +437,80 @@ static __must_check __always_inline bool __user_access_begin(const void __user *
> #define user_access_save prevent_user_access_return
> #define user_access_restore restore_user_access
>
> +/*
> + * Masking the user address is an alternative to a conditional
> + * user_access_begin that can avoid the fencing. This only works
> + * for dense accesses starting at the address.
> + */
> +static inline void __user *mask_user_address_simple(const void __user *ptr)
> +{
> + unsigned long addr = (unsigned long)ptr;
> + unsigned long mask = (unsigned long)(((long)addr >> (BITS_PER_LONG - 1)) & LONG_MAX);
> +
> + return (void __user *)(addr & ~mask);
> +}
> +
> +static inline void __user *mask_user_address_isel(const void __user *ptr)
> +{
> + unsigned long addr;
> +
> + asm("cmplw %1, %2; iselgt %0, %2, %1" : "=r"(addr) : "r"(ptr), "r"(TASK_SIZE) : "cr0");
> +
> + return (void __user *)addr;
> +}
> +
> +/* TASK_SIZE is a multiple of 128K for shifting by 17 to the right */
> +static inline void __user *mask_user_address_32(const void __user *ptr)
> +{
> + unsigned long addr = (unsigned long)ptr;
> + unsigned long mask = (unsigned long)((long)((TASK_SIZE >> 17) - 1 - (addr >> 17)) >> 31);
> +
> + addr = (addr & ~mask) | (TASK_SIZE & mask);
> +
> + return (void __user *)addr;
> +}
> +
> +static inline void __user *mask_user_address_fallback(const void __user *ptr)
> +{
> + unsigned long addr = (unsigned long)ptr;
> +
> + return (void __user *)(likely(addr < TASK_SIZE) ? addr : TASK_SIZE);
> +}
> +
> +static inline void __user *mask_user_address(const void __user *ptr)
> +{
> +#ifdef MODULES_VADDR
> + const unsigned long border = MODULES_VADDR;
> +#else
> + const unsigned long border = PAGE_OFFSET;
> +#endif
> +
> + if (IS_ENABLED(CONFIG_PPC64))
> + return mask_user_address_simple(ptr);
> + if (IS_ENABLED(CONFIG_E500))
> + return mask_user_address_isel(ptr);
> + if (TASK_SIZE <= UL(SZ_2G) && border >= UL(SZ_2G))
> + return mask_user_address_simple(ptr);
> + if (IS_ENABLED(CONFIG_PPC_BARRIER_NOSPEC))
> + return mask_user_address_32(ptr);
> + return mask_user_address_fallback(ptr);
> +}
> +
> +static __always_inline void __user *__masked_user_access_begin(const void __user *p,
> + unsigned long dir)
> +{
> + void __user *ptr = mask_user_address(p);
> +
> + might_fault();
> + allow_user_access(ptr, dir);
> +
> + return ptr;
> +}
> +
> +#define masked_user_access_begin(p) __masked_user_access_begin(p, KUAP_READ_WRITE)
> +#define masked_user_read_access_begin(p) __masked_user_access_begin(p, KUAP_READ)
> +#define masked_user_write_access_begin(p) __masked_user_access_begin(p, KUAP_WRITE)
> +
> #define arch_unsafe_get_user(x, p, e) do { \
> __long_type(*(p)) __gu_val; \
> __typeof__(*(p)) __user *__gu_addr = (p); \
> --
> 2.49.0
>
>
More information about the Linuxppc-dev
mailing list