[Skiboot] [PATCH] phb4: Reset pfir and nfir if new errors reported during ETU reset
Oliver
oohall at gmail.com
Mon Sep 17 13:22:44 AEST 2018
We should probably just merge this and if it turns out there's other
problems we can fix them later.
On Thu, Aug 30, 2018 at 10:57 PM, Vaibhav Jain <vaibhav at linux.ibm.com> wrote:
> During fast-reboot a PCI device can continue sending requests even
> after ETU-Reset is asserted. This will cause new errors to be reported
> in ETU fir-registers and will result in values of variables nfir_cache
> and pfir_cache to be out of sync.
>
> Presently during step-2 of CRESET nfir_cache and pfir_cache values are
> used to bring the PHB out of reset state. However if these variables
> are out of date the nfir/pfir registers are never reset completely and
> ETU still remains frozen.
>
> Hence this patch updates step-2 of phb4_creset to re-read the values of
> nfir/pfir registers to check if any new errors were reported after
> ETU-reset was asserted, report these new errors and reset the
> nfir/pfir registers. This should bring the ETU out of reset
> successfully.
>
> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>
> ---
> hw/phb4.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index d1245dce..9c4b54b5 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -3148,6 +3148,33 @@ static int64_t phb4_creset(struct pci_slot *slot)
> xscom_write(p->chip_id, p->pe_stk_xscom + 0x1,
> ~p->nfir_cache);
>
> + /* Re-read errors in PFIR and NFIR and reset any new
> + * error reported. This may happen as after fundamental
> + * reset was asserted in previous step the device may
> + * still be sending TLPs causing fence to be raised.
> + */
> + xscom_read(p->chip_id, p->pci_stk_xscom +
> + XPEC_PCI_STK_PCI_FIR, &p->pfir_cache);
> + xscom_read(p->chip_id, p->pe_stk_xscom +
> + XPEC_NEST_STK_PCI_NFIR, &p->nfir_cache);
> +
> + if (p->pfir_cache || p->nfir_cache) {
> + PHBERR(p, "CRESET: PHB still fenced !!\n");
> + PHBERR(p, "PCI FIR=0x%016llx\n",
> + p->pfir_cache);
> + PHBERR(p, "NEST FIR=0x%016llx\n",
> + p->nfir_cache);
The AIB error log register also useful here, can you also dump that?
Something like this does the trick:
diff --git a/hw/phb4.c b/hw/phb4.c
index f463d20d15be..0f6d8bc5eb0b 100644
--- a/hw/phb4.c
+++ b/hw/phb4.c
@@ -3200,9 +3200,16 @@ static int64_t phb4_creset(struct pci_slot *slot)
XPEC_NEST_STK_PCI_NFIR, &p->nfir_cache);
if (p->pfir_cache || p->nfir_cache) {
+ uint64_t err_aib;
+
+ xscom_read(p->chip_id, p->pci_stk_xscom +
+ XPEC_PCI_STK_PBAIB_ERR_REPORT,
+ &err_aib);
+
PHBERR(p, "CRESET: PHB still fenced !!\n");
PHBERR(p, "PCI FIR=0x%016llx\n",
p->pfir_cache);
PHBERR(p, "NEST FIR=0x%016llx\n",
+ PHBERR(p, "AIB ERR=0x%016llx\n", err_aib);
p->nfir_cache);
> + /* Dump other error registers */
> + phb4_eeh_dump_regs(p);
At this point the ETU is still in reset and trying to dump the error
registers floods the msglog with XSCOM errors. I think this is because
the HV indirect register access xscoms don't work while reset is
asserted so this should probably be removed.
Otherwise, Reviewed-by: Oliver O'Halloran <oohall at gmail.com>
> +
> + /* Reset the PHB errors */
> + xscom_write(p->chip_id, p->pci_stk_xscom +
> + XPEC_PCI_STK_PCI_FIR, 0);
> + xscom_write(p->chip_id, p->pe_stk_xscom +
> + XPEC_NEST_STK_PCI_NFIR, 0);
> + }
> +
> /* Clear PHB from reset */
> xscom_write(p->chip_id,
> p->pci_stk_xscom + XPEC_PCI_STK_ETU_RESET, 0x0);
> --
> 2.17.1
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot
More information about the Skiboot
mailing list