[Skiboot] [PATCH 4/6] xive: Use io_complete() when changing the ESB mask bits

Fri Dec 8 14:01:34 AEDT 2017

On Fri, 2017-12-08 at 11:01 +1100, Oliver wrote:
> On Fri, Dec 8, 2017 at 4:48 AM, Benjamin Herrenschmidt
> <benh at kernel.crashing.org> wrote:
> > On Thu, 2017-12-07 at 12:47 +1100, Oliver wrote:
> > > On Thu, Dec 7, 2017 at 4:39 AM, Benjamin Herrenschmidt
> > > <benh at kernel.crashing.org> wrote:
> > > > Thus ensuring that the load has completed before anything else
> > > > is done (ie, the interrupt is actually masked).
> > > > 
> > > > Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> > > > ---
> > > >  hw/xive.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/xive.c b/hw/xive.c
> > > > index 76939b79..104e1e85 100644
> > > > --- a/hw/xive.c
> > > > +++ b/hw/xive.c
> > > > @@ -2451,7 +2451,7 @@ static void xive_update_irq_mask(struct xive_src *s, uint32_t idx, bool masked)
> > > >         if (s->flags & XIVE_SRC_SHIFT_BUG)
> > > >                 offset <<= 4;
> > > > 
> > > > -       in_be64(mmio_base + offset);
> > > > +       in_complete(in_be64(mmio_base + offset));
> > > 
> > > Can you elaborate a bit of what this is doing an why it needs to be done?
> > 
> > Well, the cset says it :-) Basically the twi/isync construct is the
> > same we use in the kernel in every in_* and it's supposed to make
> > the processor "consume" the load, and not execute past the isync until
> > the load value has returned.
> 
> Sure, but why is this necessary/preferable to, say, an eieio barrier?

An eieio is just a barrier. It will order previous loads or stores with
later ones. It provides no guarantee about the load having actually
completed and returned data to the CPU. The loads (or stores) will be
*issued* in order by the barrier, but the load can be stuck in the
device for a while and the CPU can continue execute past it until it
needs the resulting data (which can be especially long when we don't
care about said result at all as above).

This can be a problem for example in timed IO constructs. For example
imagine a resent sequence for a device needs the bit set for 1us.
something like:

	- store 1 to reset register
	- load from reset register (to "push" the store)
	- delay 1us
	- store 0 to reset register

Now the problem is that nothing in the delay loop will consume the
loaded value so the load might actually only complete late into the
loop and thus the delay isn't respected.

The twi "trap never" followed by isync forces the core to consume the
load data and prevent execution beyond the sequence. The twi itself
consumes the data and isync guarantees that any previous potentially
trapping instruction has completed. (Hopefully the core doesn't
optimize that twi form, but that sequence is old AIX lore so I assume
it's still ok).

In the case of the mask bits, it's just a quick optimisation, ie, it
forces the core to "push out" the ESB load and wait for its completion
before continuing.

It's also handy when we shut down the xive as it guarantees the
completion of the side effect load before we start tearing things
down. (Mind you we do a xive sync so it's probably not strictly
necessary).

Ben.

> > 
> > Ben.
> > 
> > > 
> > > >  }
> > > > 
> > > >  static int64_t xive_sync(struct xive *x)
> > > > --
> > > > 2.14.3
> > > > 
> > > > _______________________________________________
> > > > Skiboot mailing list
> > > > Skiboot at lists.ozlabs.org
> > > > https://lists.ozlabs.org/listinfo/skiboot
> > > 
> > > Tested-by: Oliver O'Halloran <oohall at gmail.com>