[Skiboot] [PATCH 5/7] hmi: Don't cause an unrecoverable HMI if a CPU is asleep

Russell Currey ruscur at russell.cc
Thu Mar 17 16:27:57 AEDT 2016


On Thu, 2016-03-17 at 16:22 +1100, Alistair Popple wrote:
> On Tue, 15 Mar 2016 18:33:55 Russell Currey wrote:
> > 
> > decode_core_fir looks at the FIR for every CPU.  If the XSCOM read
> > fails, which it will if the CPU is asleep, the code path results in
> > raising an unrecoverable HMI.  This is incorrect as if the CPU is
> > asleep, it's likely not the cause of any problems.
> > 
> > Resolve this by skipping the CPU if it's asleep.
> > 
> > If the sleeping CPU was actually the cause of the HMI and no other
> > components were found to have errors (i.e other CPUs, NX, CAPP), an
> > unknown, unrecoverable HMI is raised anyway.  This patch just prevents
> > unrecoverable errors being thrown when a recoverable component has a
> > HMI
> > and a CPU happens to be sleeping.
> > 
> > Signed-off-by: Russell Currey <ruscur at russell.cc>
> > ---
> >  core/hmi.c | 13 +++++++++++--
> >  1 file changed, 11 insertions(+), 2 deletions(-)
> > 
> > diff --git a/core/hmi.c b/core/hmi.c
> > index 127686f..09bf610 100644
> > --- a/core/hmi.c
> > +++ b/core/hmi.c
> > @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
> >  	uint32_t core_id;
> >  	int i;
> >  	bool found = false;
> > +	int64_t ret;
> >  
> >  	/* Sanity check */
> >  	if (!cpu || !hmi_evt)
> > @@ -307,10 +308,18 @@ static bool decode_core_fir(struct cpu_thread
> > *cpu,
> >  	core_id = pir_to_core_id(cpu->pir);
> >  
> >  	/* Get CORE FIR register value. */
> > -	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > CORE_FIR),
> > -							&core_fir) !=
> > 0) {
> > +	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id,
> > CORE_FIR),
> > +			 &core_fir);
> > +
> > +	if (ret == OPAL_HARDWARE) {
> >  		prerror("HMI: XSCOM error reading CORE FIR\n");
> >  		return false;
> > +	} else if (ret == OPAL_WRONG_STATE) {
> > +		/* CPU is asleep, so it probably didn't cause the
> > checkstop */
> It might be worth expanding this comment with an explanation of what
> happens 
> if this core is the cause of the checkstop.
> 
> > 
> > +		prlog(PR_DEBUG,
> > +		      "HMI: FIR read failed, chip %d core %d
> > asleep\n",
> > +		      cpu->chip_id, core_id);
> > +		return true;
> Doesn't returning true here cause a HMI event to be raised by 
> queue_hmi_event() in find_core_checkstop_reason()? It looks like 
> decode_core_fir() returns true when a core FIR is set so I don't see how 
> callers could differentiate each case?
You're right, it should return false, I got confused.  This should return
false and the XSCOM read error case should return true and raise a HMI of
some kind, there's no way that should *just* be an error message.
> 
> > 
> >  	}
> >  
> >  	prlog(PR_INFO, "HMI: CHIP ID: %x, CORE ID: %x, FIR:
> > %016llx\n",
> > 



More information about the Skiboot mailing list