[Skiboot] [PATCH v3] capi: Poll Err/Status register during CAPP recovery
Alastair D'Silva
alastair at au1.ibm.com
Mon Mar 19 15:30:15 AEDT 2018
This patch updates do_capp_recovery_scoms() to poll the CAPP
> Err/Status control register, check for CAPP-Recovery to complete/fail
> based on indications of BITS-1,5,9 and then proceed with the
> CAPP-Recovery scoms iif recovery completed successfully. This would
> prevent cases where we bring-up the PCIe link while recovery
> sequencer
> on CAPP is still busy with casting out cache lines.
>
> In case CAPP-Recovery didn't complete successfully an error is
> returned
> from do_capp_recovery_scoms() asking phb4_creset() to keep the phb4
> fenced and mark it as broken.
>
> The loop that implements polling of Err/Status register will also log
> an error on the PHB when it continues for more than 168ms which is
> the
> max time to failure for CAPP-Recovery.
>
> Signed-off-by: Vaibhav Jain <vaibhav at linux.vnet.ibm.com>
> ---
> Changelog:
> v3 -> Introduced a timeout for the Poll loop of 168ms [Christophe]
>
> v2 -> Added an extra check for Bit(0) in Err/Status reg at the
> beginning to check if recovery mode was entered. [Christophe]
> ---
> hw/phb4.c | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++---
> ----------
> 1 file changed, 68 insertions(+), 17 deletions(-)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index 47175df2..30f46f9a 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -2857,25 +2857,75 @@ static int64_t load_capp_ucode(struct phb4
> *p)
> return rc;
> }
>
> -static void do_capp_recovery_scoms(struct phb4 *p)
> +static int do_capp_recovery_scoms(struct phb4 *p)
> {
> - uint64_t reg;
> - uint32_t offset;
> + uint64_t rc, reg, end;
> + uint64_t offset = PHB4_CAPP_REG_OFFSET(p);
>
> - PHBDBG(p, "Doing CAPP recovery scoms\n");
> -
> - offset = PHB4_CAPP_REG_OFFSET(p);
> - /* disable snoops */
> - xscom_write(p->chip_id, SNOOP_CAPI_CONFIG + offset, 0);
> - load_capp_ucode(p);
> - /* clear err rpt reg*/
> - xscom_write(p->chip_id, CAPP_ERR_RPT_CLR + offset, 0);
> - /* clear capp fir */
> - xscom_write(p->chip_id, CAPP_FIR + offset, 0);
>
> + /* Get the status of CAPP recovery */
> xscom_read(p->chip_id, CAPP_ERR_STATUS_CTRL + offset, ®);
> - reg &= ~(PPC_BIT(0) | PPC_BIT(1));
> - xscom_write(p->chip_id, CAPP_ERR_STATUS_CTRL + offset, reg);
> +
> + /* No recovery in progress ignore */
> + if ((reg & PPC_BIT(0)) == 0) {
> + PHBDBG(p, "CAPP: No recovery in progress\n");
> + return 0;
> + }
> +
> + PHBDBG(p, "CAPP: Waiting for recovery to complete\n");
> + /* recovery timer failure period 168ms */
> + end = mftb() + msecs_to_tb(168);
> + while ((reg & (PPC_BIT(1) | PPC_BIT(5) | PPC_BIT(9))) == 0)
> {
> +
> + time_wait_ms(5);
> + xscom_read(p->chip_id, CAPP_ERR_STATUS_CTRL +
> offset, ®);
> +
> + if (end && tb_compare(mftb(), end) != TB_AAFTERB) {
> + PHBERR(p, "CAPP: Capp recovery Timed-
> out.\n");
> + end = 0;
> + break;
> + }
> + }
> +
> + /* Check if the recovery failed or passed */
> + if (reg & PPC_BIT(1)) {
> + PHBDBG(p, "Doing CAPP recovery scoms\n");
> + /* disable snoops */
> + xscom_write(p->chip_id, SNOOP_CAPI_CONFIG + offset,
> 0);
> + load_capp_ucode(p);
> +
> + /* clear err rpt reg*/
> + xscom_write(p->chip_id, CAPP_ERR_RPT_CLR + offset,
> 0);
> +
> + /* clear capp fir */
> + xscom_write(p->chip_id, CAPP_FIR + offset, 0);
> +
> + /* Just reset Bit-0,1 and dont touch any other bit
> */
> + xscom_read(p->chip_id, CAPP_ERR_STATUS_CTRL +
> offset, ®);
> + reg &= ~(PPC_BIT(0) | PPC_BIT(1));
> + xscom_write(p->chip_id, CAPP_ERR_STATUS_CTRL +
> offset, reg);
> +
> + PHBDBG(p, "CAPP recovery complete\n");
> + rc = OPAL_SUCCESS;
> +
> + } else {
> + /* Most likely will checkstop here due to FIR ACTION
> for
> + * failed recovery. So this message would never be
> logged.
> + * But if we still enter here then return an error
> forcing a
> + * fence of the PHB.
> + */
> + if (reg & PPC_BIT(5))
> + PHBERR(p, "CAPP: Capp recovery Failed\n");
> + else if (reg & PPC_BIT(9))
> + PHBERR(p, "CAPP: Capp recovery hang
> detected\n");
> + else if (end != 0)
> + PHBERR(p, "CAPP: Unknown recovery
> failure\n");
> +
> + PHBDBG(p, "CAPP: Err/Status-reg=0x%016llx\n", reg);
> + rc = OPAL_HARDWARE;
> + }
> +
> + return rc;
> }
>
> static int64_t phb4_creset(struct pci_slot *slot)
> @@ -2934,8 +2984,9 @@ static int64_t phb4_creset(struct pci_slot
> *slot)
> PHBDBG(p, "CRESET: No pending
> transactions\n");
>
> /* capp recovery */
> - if (p->flags & PHB4_CAPP_RECOVERY)
> - do_capp_recovery_scoms(p);
> + if (p->flags & PHB4_CAPP_RECOVERY &&
> + do_capp_recovery_scoms(p))
Sorry, I missed a comment here on the last mail. Elsewhere in the
skiboot we explicitly check for ==/!= OPAL_SUCCESS, rather than
assuming the value evaluates to false.
> + goto error;
>
> /* Clear errors in PFIR and NFIR */
> xscom_write(p->chip_id, p->pci_stk_xscom +
> 0x1,
>
--
Alastair D'Silva
Open Source Developer
Linux Technology Centre, IBM Australia
mob: 0423 762 819
More information about the Skiboot
mailing list