[PATCH 0/5] powerpc: Implement masked user access

David Laight david.laight.linux at gmail.com
Tue Jun 24 18:32:58 AEST 2025


On Tue, 24 Jun 2025 07:27:47 +0200
Christophe Leroy <christophe.leroy at csgroup.eu> wrote:

> Le 22/06/2025 à 18:20, David Laight a écrit :
> > On Sun, 22 Jun 2025 11:52:38 +0200
> > Christophe Leroy <christophe.leroy at csgroup.eu> wrote:
> >   
> >> Masked user access avoids the address/size verification by access_ok().
> >> Allthough its main purpose is to skip the speculation in the
> >> verification of user address and size hence avoid the need of spec
> >> mitigation, it also has the advantage to reduce the amount of
> >> instructions needed so it also benefits to platforms that don't
> >> need speculation mitigation, especially when the size of the copy is
> >> not know at build time.  
> > 
> > It also removes a conditional branch that is quite likely to be
> > statically predicted 'the wrong way'.  
> 
> But include/asm-generic/access_ok.h defines access_ok() as:
> 
> 	#define access_ok(addr, size) likely(__access_ok(addr, size))
> 
> So GCC uses the 'unlikely' variant of the branch instruction to force 
> the correct prediction, doesn't it ?

Nope...
Most architectures don't have likely/unlikely variants of branches.
So all gcc can do is decide which path is the fall-through and
whether the branch is forwards or backwards.
Additionally unless there is code in both the 'if' and 'else' clauses
the [un]likely seems to have no effect.
So on simple cpu that predict 'backwards branches taken' you can get
the desired effect - but it may need an 'asm comment' to force the
compiler to generate the required branches (eg forwards branch directly
to a backwards unconditional jump).

On x86 it is all more complicated.
I think the pre-fetch code is likely to assume 'not taken' (but might
use stale info on the cache line).
The predictor itself never does 'static prediction' - it is always
based on the referenced branch prediction data structure.
So, unless you are in a loop (eg running a benchmark!) there is pretty
much a 50% chance of a branch mispredict.

I've been trying to benchmark different versions of the u64 * u64 / u64
function - and I think mispredicted branches make a big difference.
I need to sit down and sequence the test cases so that I can see
the effect of each branch!

> 
> >   
> >> Unlike x86_64 which masks the address to 'all bits set' when the
> >> user address is invalid, here the address is set to an address in
> >> the gap. It avoids relying on the zero page to catch offseted
> >> accesses. On book3s/32 it makes sure the opening remains on user
> >> segment. The overcost is a single instruction in the masking.  
> > 
> > That isn't true (any more).
> > Linus changed the check to (approx):
> > 	if (uaddr > TASK_SIZE)
> > 		uaddr = TASK_SIZE;
> > (Implemented with a conditional move)  
> 
> Ah ok, I overlooked that, I didn't know the cmove instruction, seem 
> similar to the isel instruction on powerpc e500.

It got added for the 386 - I learnt 8086 :-)
I suspect x86 got there first...

Although called 'conditional move' I very much suspect the write is
actually unconditional.
So the hardware implementation is much the same as 'add carry' except
the ALU operation is a simple multiplex.
Which means it is unlikely to be speculative.

	David


> 
> Christophe
> 



More information about the Linuxppc-dev mailing list