[Skiboot] [PATCH] phb4: Don't probe a PHB if its garded

Oliver oohall at gmail.com
Wed Aug 22 15:29:24 AEST 2018


On Tue, Aug 21, 2018 at 8:14 PM, Vaibhav Jain <vaibhav at linux.ibm.com> wrote:
> Presently phb4_probe_stack() causes an exception while trying to probe
> a PHB if its garded. This causes skiboot to go into a reboot loop with
> following exception log:
>
>  ***********************************************
>  Fatal MCE at 000000003006ecd4   .probe_phb4+0x570
>  CFAR : 00000000300b98a0
>  <snip>
>  Aborting!
> CPU 0018 Backtrace:
>  S: 0000000031cc37e0 R: 000000003001a51c   ._abort+0x4c
>  S: 0000000031cc3860 R: 0000000030028170   .exception_entry+0x180
>  S: 0000000031cc3a40 R: 0000000000001f10 *
>  S: 0000000031cc3c20 R: 000000003006ecb0   .probe_phb4+0x54c
>  S: 0000000031cc3e30 R: 0000000030014ca4   .main_cpu_entry+0x5b0
>  S: 0000000031cc3f00 R: 0000000030002700   boot_entry+0x1b8
>
> This is caused as phb4_probe_stack() will ignore all xscom read/write
> errors to enable PHB Bars and then tries to perform an mmio to read
> PHB Version registers that cause the fatal MCE.
>
> We fix this by ignoring the PHB probe if the first xscom_write() to
> populate the PHB Bar register fails, which indicates that there is
> something wrong with the PHB.

There's a fix for this in upstream hostboot too. Unfortunately, the HB
version in op-build hasn't been bumped in months due to compiler
issues so this is probably the best way to fix this bug.

Reviewed-by: Oliver O'Halloran <oohall at gmail.com>

> Signed-off-by: Vaibhav Jain <vaibhav at linux.ibm.com>

This should probably be CCed to stable so it goes into the sable builds too.

> ---
>  hw/phb4.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
>
> diff --git a/hw/phb4.c b/hw/phb4.c
> index d1245dce..89b0b859 100644
> --- a/hw/phb4.c
> +++ b/hw/phb4.c
> @@ -5546,6 +5546,7 @@ static void phb4_probe_stack(struct dt_node *stk_node, uint32_t pec_index,
>         char *path;
>         uint64_t capp_ucode_base;
>         unsigned int max_link_speed;
> +       int rc;
>
>         gcid = dt_get_chip_id(stk_node);
>         stk_index = dt_prop_get_u32(stk_node, "reg");
> @@ -5567,9 +5568,17 @@ static void phb4_probe_stack(struct dt_node *stk_node, uint32_t pec_index,
>
>         /* Initialize PHB register BAR */
>         phys_map_get(gcid, PHB4_REG_SPC, phb_num, &phb_bar, NULL);
> -       xscom_write(gcid, nest_stack + XPEC_NEST_STK_PHB_REG_BAR, phb_bar << 8);
> -       bar_en |= XPEC_NEST_STK_BAR_EN_PHB;
> +       rc = xscom_write(gcid, nest_stack + XPEC_NEST_STK_PHB_REG_BAR,
> +                        phb_bar << 8);
> +
> +       /* A scom error here probably indicates a defective/garded PHB */
> +       if (rc != OPAL_SUCCESS) {
> +               prerror("PHB[%d:%d] Unable to set PHB BAR. Error=%d\n",
> +                     gcid, phb_num, rc);
> +               return;

Setting status = "broken" in the device-tree might also be a good idea.

> +       }
>
> +       bar_en |= XPEC_NEST_STK_BAR_EN_PHB;
>
>         /* Same with INT BAR (ESB) */
>         phys_map_get(gcid, PHB4_XIVE_ESB, phb_num, &irq_bar, NULL);
> --
> 2.17.1
>
> _______________________________________________
> Skiboot mailing list
> Skiboot at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/skiboot


More information about the Skiboot mailing list