[Skiboot] [PATCH 5/6] xive: Fix occasional VC checkstops in xive_reset
Oliver
oohall at gmail.com
Thu Dec 7 16:44:29 AEDT 2017
On Thu, Dec 7, 2017 at 4:39 AM, Benjamin Herrenschmidt
<benh at kernel.crashing.org> wrote:
> The current workaround for the scrub bug described in
> __xive_cache_scrub() has an issue in that it can leave
> dirty invalid entries in the cache.
>
> When cleaning up EQs or VPs during reset, if we then
> remove the underlying indirect page for these entries,
> the XIVE will checkstop when trying to flush them out
> of the cache.
>
> This replaces the existing workaround with a new pair of
> workarounds for VPs and EQs:
>
> - The VP one does the dummy watch on another entry than
> the one we scrubbed (which does the job of pushing old
> stores out) using an entry that is known to be backed by
> a permanent indirect page.
>
> - The EQ one switches to a more efficient workaround
> which consists of doing a non-side-effect ESB load from
> the EQ's ESe control bits.
>
> Signed-off-by: Benjamin Herrenschmidt <benh at kernel.crashing.org>
> ---
> hw/xive.c | 53 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 50 insertions(+), 3 deletions(-)
>
> diff --git a/hw/xive.c b/hw/xive.c
> index 104e1e85..b08c6783 100644
> --- a/hw/xive.c
> +++ b/hw/xive.c
> @@ -1251,6 +1251,48 @@ static int64_t __xive_cache_watch(struct xive *x, enum xive_cache_type ctype,
> void *new_data, bool light_watch,
> bool synchronous);
>
> +static void xive_scrub_workaround_vp(struct xive *x, uint32_t block, uint32_t idx __unused)
> +{
> + /* VP variant of the workaround described in __xive_cache_scrub(),
> + * we need to be careful to use for that workaround an NVT that
> + * sits on the same xive but isn NOT part of a donated indirect
> + * entry.
> + *
> + * The reason is that the dummy cache watch will re-create a
> + * dirty entry in the cache, even if the entry is marked
> + * invalid.
> + *
> + * Thus if we are about to dispose of the indirect entry backing
> + * it, we'll cause a checkstop later on when trying to write it
> + * out.
> + *
> + * Note: This means the workaround only works for block group
> + * mode.
> + */
> +#ifdef USE_BLOCK_GROUP_MODE
> + __xive_cache_watch(x, xive_cache_vpc, block, INITIAL_VP_COUNT, 0,
> + 0, NULL, true, false);
> +#else
> + /* WARNING: Some workarounds related to cache scrubs require us to
> + * have at least one firmware owned (permanent) indirect entry for
> + * each XIVE instance. This currently only happens in block group
> + * mode
> + */
> +#warning Block group mode should not be disabled
> +#endif
This is kinda gross. As far as I can tell the only time we need to
worry about the indirect backing going away is in the xive_reset()
path. Why not just disable the caches while we're doing the reset,
clear all the structures and re-enable them once we're done?
> +}
> +
> +static void xive_scrub_workaround_eq(struct xive *x, uint32_t block __unused, uint32_t idx)
> +{
> + void *mmio;
> +
> + /* EQ variant of the workaround described in __xive_cache_scrub(),
> + * a simple non-side effect load from ESn will do
> + */
> + mmio = x->eq_mmio + idx * 0x20000;
> + in_complete(in_be64(mmio + 0x800));
> +}
> +
> static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> uint64_t block, uint64_t idx,
> bool want_inval, bool want_disable)
> @@ -1270,6 +1312,9 @@ static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> * invalidate, then after the scrub, we do a dummy cache
> * watch which will make the HW read the data back, which
> * should be ordered behind all the preceding stores.
> + *
> + * Update: For EQs we can do a non-side effect ESB load instead
> + * which is faster.
> */
> want_inval = true;
>
> @@ -1331,9 +1376,11 @@ static int64_t __xive_cache_scrub(struct xive *x, enum xive_cache_type ctype,
> /* Workaround for HW bug described above (only applies to
> * EQC and VPC
> */
> - if (ctype == xive_cache_eqc || ctype == xive_cache_vpc)
> - __xive_cache_watch(x, ctype, block, idx, 0, 0, NULL,
> - true, false);
> + if (ctype == xive_cache_eqc)
> + xive_scrub_workaround_eq(x, block, idx);
> + else if (ctype == xive_cache_vpc)
> + xive_scrub_workaround_vp(x, block, idx);
> +
> return 0;
> }
>
> --
> 2.14.3
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
Tested-by: Oliver O'Halloran <oohall at gmail.com>
More information about the Skiboot
mailing list