[PATCH] powerpc/64s: Fix compiler store ordering to SLB shadow area

Fri Jun 1 08:52:27 AEST 2018

On Fri, 01 Jun 2018 00:22:21 +1000
Michael Ellerman <mpe at ellerman.id.au> wrote:

> Nicholas Piggin <npiggin at gmail.com> writes:
> 
> > The stores to update the SLB shadow area must be made as they appear
> > in the C code, so that the hypervisor does not see an entry with
> > mismatched vsid and esid. Use WRITE_ONCE for this.
> >
> > GCC has been observed to elide the first store to esid in the update,
> > which means that if the hypervisor interrupts the guest after storing
> > to vsid, it could see an entry with old esid and new vsid, which may
> > possibly result in memory corruption.
> >
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
> >  arch/powerpc/mm/slb.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/powerpc/mm/slb.c b/arch/powerpc/mm/slb.c
> > index 66577cc66dc9..2f4b33b24b3b 100644
> > --- a/arch/powerpc/mm/slb.c
> > +++ b/arch/powerpc/mm/slb.c
> > @@ -63,14 +63,14 @@ static inline void slb_shadow_update(unsigned long ea, int ssize,
> >  	 * updating it.  No write barriers are needed here, provided
> >  	 * we only update the current CPU's SLB shadow buffer.
> >  	 */
> > -	p->save_area[index].esid = 0;
> > -	p->save_area[index].vsid = cpu_to_be64(mk_vsid_data(ea, ssize, flags));
> > -	p->save_area[index].esid = cpu_to_be64(mk_esid_data(ea, ssize, index));
> > +	WRITE_ONCE(p->save_area[index].esid, 0);
> > +	WRITE_ONCE(p->save_area[index].vsid, cpu_to_be64(mk_vsid_data(ea, ssize, flags)));
> > +	WRITE_ONCE(p->save_area[index].esid, cpu_to_be64(mk_esid_data(ea, ssize, index)));  
> 
> What's the code-gen for that look like? I suspect it's terrible?

Yeah it's not great.

> 
> Should we just do it in inline-asm I wonder?

There should be no fundamental correctness reason why we can't store
to a volatile with a byteswap store. The other option we could do is
add a compiler barrier() between each store. The reason I didn't is
that in theory we don't need to invalidate all memory contents here,
but in practice probably the end result code generation would be
better.

Thanks,
Nick