[Skiboot] [PATCH] npu2-opencapi: Mask 2 XSL errors
Andrew Donnellan
ajd at linux.ibm.com
Thu May 30 10:51:25 AEST 2019
On 23/5/19 10:17 pm, Frederic Barrat wrote:
> Commit f8dfd699f584 ("hw/npu2: Setup an error interrupt on some
> opencapi FIRs") converted some FIR bits default action from system
> checkstop to raising an error interrupt. For 2 XSL error events that
> can be triggered by a misbehaving AFU, the error interrupt is raised
> twice, once for each link (the XSL logic in the NPU is shared between
> 2 links). So a badly behaving AFU could impact another, unsuspecting
> opencapi adapter.
> It doesn't look good and it turns out we can do better. We can mask
> those 2 XSL errors. The error will also be picked up by the OTL logic,
> which is per link. So we'll still get an error interrupt, but only on
> the relevant link, and the other opencapi adapter can stay functional.
>
> Fixes: f8dfd699f584 ("hw/npu2: Setup an error interrupt on some opencapi FIRs")
> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
Reviewed-by: Andrew Donnellan <ajd at linux.ibm.com>
> ---
> hw/npu2-opencapi.c | 29 ++++++++++++++++++++---------
> 1 file changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/hw/npu2-opencapi.c b/hw/npu2-opencapi.c
> index 7a90cfac..0ad206a9 100644
> --- a/hw/npu2-opencapi.c
> +++ b/hw/npu2-opencapi.c
> @@ -1583,26 +1583,37 @@ static void mask_nvlink_fir(struct npu2 *p)
>
> static int enable_interrupts(struct npu2 *p)
> {
> - uint64_t reg, val_xsl, val_override;
> + uint64_t reg, xsl_fault, xstop_override, xsl_mask;
>
> /*
> - * Enable translation interrupts for all bricks and override
> - * every brick-fatal error to send an interrupt instead of
> - * checkstopping.
> + * We need to:
> + * - enable translation interrupts for all bricks
> + * - override most brick-fatal errors from FIR2 to send an
> + * interrupt instead of the default action of checkstopping
> + * the systems, since we can just fence the brick and keep
> + * the system alive.
> + * - the exception to the above is 2 FIRs for XSL errors
> + * resulting of bad AFU behavior, for which we don't want to
> + * checkstop but can't configure to send an error interrupt
> + * either, as the XSL errors are reported on 2 links (the
> + * XSL is shared between 2 links). Instead, we mask
> + * them. The XSL errors will result in an OTL error, which
> + * is reported only once, for the correct link.
> *
> * FIR bits configured to trigger an interrupt must have their
> * default action masked
> */
> - val_xsl = PPC_BIT(0) | PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3);
> - val_override = 0x0FFFEFC00FF1B000;
> + xsl_fault = PPC_BIT(0) | PPC_BIT(1) | PPC_BIT(2) | PPC_BIT(3);
> + xstop_override = 0x0FFFEFC00F91B000;
> + xsl_mask = PPC_BIT(41) | PPC_BIT(42);
>
> xscom_read(p->chip_id, p->xscom_base + NPU2_MISC_FIR2_MASK, ®);
> - reg |= val_xsl | val_override;
> + reg |= xsl_fault | xstop_override | xsl_mask;
> xscom_write(p->chip_id, p->xscom_base + NPU2_MISC_FIR2_MASK, reg);
>
> reg = npu2_scom_read(p->chip_id, p->xscom_base, NPU2_MISC_IRQ_ENABLE2,
> NPU2_MISC_DA_LEN_8B);
> - reg |= val_xsl | val_override;
> + reg |= xsl_fault | xstop_override;
> npu2_scom_write(p->chip_id, p->xscom_base, NPU2_MISC_IRQ_ENABLE2,
> NPU2_MISC_DA_LEN_8B, reg);
>
> @@ -1613,7 +1624,7 @@ static int enable_interrupts(struct npu2 *p)
> */
> reg = npu2_scom_read(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
> NPU2_MISC_DA_LEN_8B);
> - reg |= val_override;
> + reg |= xstop_override;
> npu2_scom_write(p->chip_id, p->xscom_base, NPU2_MISC_FENCE_ENABLE2,
> NPU2_MISC_DA_LEN_8B, reg);
>
>
--
Andrew Donnellan OzLabs, ADL Canberra
ajd at linux.ibm.com IBM Australia Limited
More information about the Skiboot
mailing list