[Skiboot] [PATCH] xscom: Don't log xscom errors caused bu OPAL calls

Oliver O'Halloran oohall at gmail.com
Fri Dec 6 13:51:46 AEDT 2019


On Fri, Dec 6, 2019 at 2:32 AM Gautham R Shenoy <ego at linux.vnet.ibm.com> wrote:
>
> Hello Oliver,
>
>
> On Fri, Dec 06, 2019 at 12:14:34AM +1100, Oliver O'Halloran wrote:
> > The XSCOM read/write OPAL calls are largely there to support running
> > PRD in the OS. PRD itself handles submitting error logs (if needed)
> > when a XSCOM operation fails so there's no need to send an error log
> > from inside of OPAL.
> >
> > Cc: Gautham R. Shenoy <ego at linux.vnet.ibm.com>
> > Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
>
> This is much better, simpler version than what I posted.
>
> Only one question.
> > ---
> >  core/opal.c |  3 +++
> >  hw/xscom.c  | 15 +++++++++++----
> >  2 files changed, 14 insertions(+), 4 deletions(-)
> >
> > diff --git a/core/opal.c b/core/opal.c
> > index da746e805a3f..922dc5abddca 100644
> > --- a/core/opal.c
> > +++ b/core/opal.c
> > @@ -189,6 +189,9 @@ int64_t opal_exit_check(int64_t retval, struct stack_frame *eframe)
> >                     "Spent %llu msecs in OPAL call %llu!\n",
> >                     call_time, token);
> >       }
> > +
> > +     this_cpu->current_token = 0;
> > +
> >       return retval;
> >  }
> >
> > diff --git a/hw/xscom.c b/hw/xscom.c
> > index f3f4e1039ecf..381cf2c4aaa4 100644
> > --- a/hw/xscom.c
> > +++ b/hw/xscom.c
> > @@ -272,10 +272,17 @@ static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_add
> >               break;
> >       }
> >
> > -     /* XXX: Create error log entry ? */
> > -     log_simple_error(&e_info(OPAL_RC_XSCOM_RW),
> > -             "XSCOM: %s error gcid=0x%x pcb_addr=0x%x stat=0x%x\n",
> > -             is_write ? "write" : "read", gcid, pcb_addr, stat);
> > +     /*
> > +      * If we're in an XSCOM opal call then squash the error
> > +      * we assume that the caller (probably opal-prd) will
> > +      * handle logging it
> > +      */
> > +     if (this_cpu()->current_token != OPAL_XSCOM_READ &&
> > +         this_cpu()->current_token != OPAL_XSCOM_WRITE) {
>
>
> Shouldn't we restrict to not creating PELs only when the error is
> OPAL_XSCOM_ADDR_ERROR, as this was the only observable error when the
> XSCOM read/writes initiated by HBRT were failing ?

I'm pretty sure HBRT will creates PEL for any (unexpected) xscom
failure so we'd just be doubling up. The other error results we return
are:

#define OPAL_XSCOM_BUSY         OPAL_BUSY
#define OPAL_XSCOM_CHIPLET_OFF  OPAL_WRONG_STATE
#define OPAL_XSCOM_PARTIAL_GOOD -25
#define OPAL_XSCOM_ADDR_ERROR   -26
#define OPAL_XSCOM_CLOCK_ERROR  -27
#define OPAL_XSCOM_PARITY_ERROR -28
#define OPAL_XSCOM_TIMEOUT      -29
#define OPAL_XSCOM_CTR_OFFLINED -30

IMO these are all variations on the same theme. The caller did a scom
to a register that isn't accessible for $reason so we might as well
handle it consistently. If the error needs to be logged then the
caller can deal with it.


>
> > +             log_simple_error(&e_info(OPAL_RC_XSCOM_RW),
> > +                     "XSCOM: %s error gcid=0x%x pcb_addr=0x%x stat=0x%x\n",
> > +                     is_write ? "write" : "read", gcid, pcb_addr, stat);
> > +     }
> >
> >       /* We need to reset the XSCOM or we'll hang on the next access */
> >       xscom_reset(gcid, false);
> > --
> > 2.21.0
> >
>
> --
> Thanks and Regards
> gautham.


More information about the Skiboot mailing list