[PATCH] powerpc: Simplify access_ok()

David Laight david.laight.linux at gmail.com
Mon Mar 23 10:04:02 AEDT 2026


On Sun, 22 Mar 2026 20:02:56 +0100
"Christophe Leroy (CS GROUP)" <chleroy at kernel.org> wrote:

> Le 22/03/2026 à 12:03, David Laight a écrit :
> > On Tue, 17 Mar 2026 19:07:04 +0100
> > "Christophe Leroy (CS GROUP)" <chleroy at kernel.org> wrote:
> >   
> >> With the implementation of masked user access, we always have a memory
> >> gap between user memory space and kernel memory space, so use it to
> >> simplify access_ok() by relying on access fault in case of an access
> >> in the gap.
> >>
> >> Most of the time the size is known at build time.
> >>
> >> On powerpc64, the kernel space starts at 0x8000000000000000 which is
> >> always more than two times TASK_USER_MAX so when the size is known at
> >> build time and lower than TASK_USER_MAX, only the address needs to be
> >> verified. If not, a binary or of address and size must be lower than
> >> TASK_USER_MAX. As TASK_USER_MAX is a power of 2, just check that
> >> there is no bit set outside of TASK_USER_MAX - 1 mask.
> >>
> >> On powerpc32, there is a garanteed gap of 128KB so when the size is
> >> known at build time and not greater than 128KB, just check that the
> >> address is below TASK_SIZE. Otherwise use the original formula.  
> > 
> > Given that the whole thing relies on the kernel code 'obeying the rules'
> > is it enough to require that the accesses will be 'moderately sequential'?
> > Provided there are no jumps greater than 128k the length can be ignored.
> > 
> > I think Linus thought about doing that for x86-64.  
> 
> You mean ignoring length completely ?

That is the thought.
The 'masked_user_access' code does that and it could be always done
provided all the call sites are checked.
The length was needed historically because the 386 allowed kernel
writes to read only pages - so COW had to be tested in software.

> 
> Yes we can probably do that on 64 bits. I don't know about x86_64 but 
> powerpc64 has TASK_USER < 0x0010000000000000 and kernel space is above 
> 0x8000000000000000, so oring size with address and comparing it to 
> 0x0010000000000000 doesn't add much cost compared to just comparing the 
> address.
> 
> > 
> > I can't imagine that happening unless there is code that probes the end of
> > the user buffer before starting a transfer - and that is pretty pointless.  
>  > > There are places that skip a few bytes (or just access in the wrong   
> order)
> > but it is likely to be alignment padding, and code should be doing the
> > access_ok() check for each fragment - not on the entire buffer.  
> 
> I don't follow you. Why not for the entire buffer ? We try to minimise 
> amount of stac/clac (or equivalent) and access_ok() is associated with 
> stac. When we use access_begin/access_end we tend to try and regroup 
> everything in a single bloc.

I mean that the access_ok/stac/clac are done together at the place where
the copy is done, rather than doing the access_ok() on the system call
interface and the actual copies much later.
There are a few ioctl (probably netlink and some of the sctp sockopt)
which process multiple 'items' in one user buffer - they'll do the
access_ok/stac/clac multiple times.

The only problem would be code that processes a buffer that contains
lots of items like:
	struct item {
		unsigned int size;
		char cmd[size];
	};
and advances to the next item believing the 'size' field (checked against
the outer buffer size) but without a separate access_ok() for the later
commands.
	

> 
> >   
> >>
> >> Signed-off-by: Christophe Leroy (CS GROUP) <chleroy at kernel.org>
> >> ---
> >>   arch/powerpc/include/asm/uaccess.h | 26 ++++++++++++++++++++++++++
> >>   1 file changed, 26 insertions(+)
> >>
> >> diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h
> >> index 570b3d91e2e4..ec210ae62be7 100644
> >> --- a/arch/powerpc/include/asm/uaccess.h
> >> +++ b/arch/powerpc/include/asm/uaccess.h
> >> @@ -15,8 +15,34 @@
> >>   #define TASK_SIZE_MAX		TASK_SIZE_USER64
> >>   #endif
> >>   
> >> +#define __access_ok __access_ok
> >> +
> >>   #include <asm-generic/access_ok.h>
> >>   
> >> +/*
> >> + * On powerpc64, TASK_SIZE_MAX is 0x0010000000000000 then even if both ptr and size
> >> + * are TASK_SIZE_MAX we are still inside the memory gap. So make it simple.
> >> + */
> >> +static __always_inline int __access_ok(const void __user *ptr, unsigned long size)
> >> +{
> >> +	unsigned long addr = (unsigned long)ptr;
> >> +
> >> +	if (IS_ENABLED(CONFIG_PPC64)) {
> >> +		BUILD_BUG_ON(!is_power_of_2(TASK_SIZE_MAX));
> >> +		BUILD_BUG_ON(TASK_SIZE_MAX > 0x0010000000000000);
> >> +
> >> +		if (__builtin_constant_p(size))
> >> +			return size <= TASK_SIZE_MAX && !(addr & ~(TASK_SIZE_MAX - 1));
> >> +		else
> >> +			return !((size | addr) & ~(TASK_SIZE_MAX - 1));  
> > 
> > The compiler may know an upper bound for 'size' even when it isn't a constant.
> > It might be 32bit or from 'size = is_compat_foo ? 16 : 24', so:
> > 		if (statically_true(size < TASK_SIZE_MAX)
> > 			return !(addr & ~(TASK_SIZE_MAX - 1);
> > 		return !((size | addr) & ~(TASK_SIZE_MAX - 1));  
> 
> I think you are missing the case where size is constant and > TASK_SIZE_MAX.

It is no different to a variable size that happens to be > TASK_SIZE_MAX.
But I suspect the compiler will optimise it to 'return 0'.

> Or maybe that case should be catched with a BUILD_BUG ?

Probably not worth slowing down the compile.

	David

> 
> Christophe
> 
> >   
> >> +	} else {
> >> +		if (__builtin_constant_p(size) && size < SZ_128K)  
> > 
> > Again the compiler may know an upper bound even if the value isn't constant:
> > 		if (statically_true(size < SZ_128K)
> > 
> > 	David
> >   
> >> +			return addr < TASK_SIZE;
> >> +		else
> >> +			return size <= TASK_SIZE && addr <= TASK_SIZE - size);
> >> +	}
> >> +}
> >> +
> >>   /*
> >>    * These are the main single-value transfer routines.  They automatically
> >>    * use the right size if we just have the right pointer type.  
> >   
> 



More information about the Linuxppc-dev mailing list