[Skiboot] [PATCH 5/6] xive: Fix occasional VC checkstops in xive_reset

Fri Dec 8 05:00:53 AEDT 2017

On Thu, 2017-12-07 at 16:44 +1100, Oliver wrote:
> On Thu, Dec 7, 2017 at 4:39 AM, Benjamin Herrenschmidt
> <benh at kernel.crashing.org> wrote:
> > The current workaround for the scrub bug described in
> > __xive_cache_scrub() has an issue in that it can leave
> > dirty invalid entries in the cache.
> > 
> > When cleaning up EQs or VPs during reset, if we then
> > remove the underlying indirect page for these entries,
> > the XIVE will checkstop when trying to flush them out
> > of the cache.
> > 
> > This replaces the existing workaround with a new pair of
> > workarounds for VPs and EQs:
> > 
> >  - The VP one does the dummy watch on another entry than
> > the one we scrubbed (which does the job of pushing old
> > stores out) using an entry that is known to be backed by
> > a permanent indirect page.
> > 
> >  - The EQ one switches to a more efficient workaround
> > which consists of doing a non-side-effect ESB load from
> > the EQ's ESe control bits.
> > 
> > Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> > ---
> >  hw/xive.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> >  1 file changed, 50 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/xive.c b/hw/xive.c
> > index 104e1e85..b08c6783 100644
> > --- a/hw/xive.c
> > +++ b/hw/xive.c
> > @@ -1251,6 +1251,48 @@ static int64_t __xive_cache_watch(struct xive *x, enum xive_cache_type ctype,
> >                                   void *new_data, bool light_watch,
> >                                   bool synchronous);
> > 
> > +static void xive_scrub_workaround_vp(struct xive *x, uint32_t block, uint32_t idx __unused)
> > +{
> > +       /* VP variant of the workaround described in __xive_cache_scrub(),
> > +        * we need to be careful to use for that workaround an NVT that
> > +        * sits on the same xive but isn NOT part of a donated indirect
> > +        * entry.
> > +        *
> > +        * The reason is that the dummy cache watch will re-create a
> > +        * dirty entry in the cache, even if the entry is marked
> > +        * invalid.
> > +        *
> > +        * Thus if we are about to dispose of the indirect entry backing
> > +        * it, we'll cause a checkstop later on when trying to write it
> > +        * out.
> > +        *
> > +        * Note: This means the workaround only works for block group
> > +        * mode.
> > +        */
> > +#ifdef USE_BLOCK_GROUP_MODE
> > +       __xive_cache_watch(x, xive_cache_vpc, block, INITIAL_VP_COUNT, 0,
> > +                          0, NULL, true, false);
> > +#else
> > +       /* WARNING: Some workarounds related to cache scrubs require us to
> > +        * have at least one firmware owned (permanent) indirect entry for
> > +        * each XIVE instance. This currently only happens in block group
> > +        * mode
> > +        */
> > +#warning Block group mode should not be disabled
> > +#endif
> 
> This is kinda gross. As far as I can tell the only time we need to
> worry about the indirect backing going away is in the xive_reset()
> path. Why not just disable the caches while we're doing the reset,
> clear all the structures and re-enable them once we're done?

Which would be rather gross too ... does the cache watch even work
when we disable them ? Do they get properly flushed or do they need to
be manually flushed when disabled ? How do that behave vs. the existing
known store ordering issues with the cache scrubs ? etc...

I worry about introducing new corner cases since that part of the HW
is rather finnicky. The above is what the HW folks tell me to do,
and this patch has already gone to field testing, I'm not that keen
on doing something different.

Note: there is a typo, it should be INITIAL_VP_BASE but the value
is identical so the testing is still valid, however I'll respin with
the right name later.

> > +}
> > +
> > +static void xive_scrub_workaround_eq(struct xive *x, uint32_t block __unused, uint32_t idx)
> > +{
> > +       void *mmio;
> > +
> > +       /* EQ variant of the workaround described in __xive_cache_scrub(),
> > +        * a simple non-side effect load from ESn will do
> > +        */
> > +       mmio = x->eq_mmio + idx * 0x20000;
> > +       in_complete(in_be64(mmio + 0x800));
> > +}
> > +
> >  static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> >                                   uint64_t block, uint64_t idx,
> >                                   bool want_inval, bool want_disable)
> > @@ -1270,6 +1312,9 @@ static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> >          * invalidate, then after the scrub, we do a dummy cache
> >          * watch which will make the HW read the data back, which
> >          * should be ordered behind all the preceding stores.
> > +        *
> > +        * Update: For EQs we can do a non-side effect ESB load instead
> > +        * which is faster.
> >          */
> >         want_inval = true;
> > 
> > @@ -1331,9 +1376,11 @@ static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> >         /* Workaround for HW bug described above (only applies to
> >          * EQC and VPC
> >          */
> > -       if (ctype == xive_cache_eqc || ctype == xive_cache_vpc)
> > -               __xive_cache_watch(x, ctype, block, idx, 0, 0, NULL,
> > -                                  true, false);
> > +       if (ctype == xive_cache_eqc)
> > +               xive_scrub_workaround_eq(x, block, idx);
> > +       else if (ctype == xive_cache_vpc)
> > +               xive_scrub_workaround_vp(x, block, idx);
> > +
> >         return 0;
> >  }
> > 
> > --
> > 2.14.3
> > 
> > _______________________________________________
> > Skiboot mailing list
> > Skiboot at lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/skiboot
> 
> Tested-by: Oliver O'Halloran <oohall at gmail.com>