[Skiboot] [PATCH 5/7] hmi: Don't cause an unrecoverable HMI if a CPU is asleep

Alistair Popple alistair at popple.id.au
Thu Mar 17 16:22:18 AEDT 2016


On Tue, 15 Mar 2016 18:33:55 Russell Currey wrote:
> decode_core_fir looks at the FIR for every CPU.  If the XSCOM read
> fails, which it will if the CPU is asleep, the code path results in
> raising an unrecoverable HMI.  This is incorrect as if the CPU is
> asleep, it's likely not the cause of any problems.
> 
> Resolve this by skipping the CPU if it's asleep.
> 
> If the sleeping CPU was actually the cause of the HMI and no other
> components were found to have errors (i.e other CPUs, NX, CAPP), an
> unknown, unrecoverable HMI is raised anyway.  This patch just prevents
> unrecoverable errors being thrown when a recoverable component has a HMI
> and a CPU happens to be sleeping.
> 
> Signed-off-by: Russell Currey <ruscur at russell.cc>
> ---
>  core/hmi.c | 13 +++++++++++--
>  1 file changed, 11 insertions(+), 2 deletions(-)
> 
> diff --git a/core/hmi.c b/core/hmi.c
> index 127686f..09bf610 100644
> --- a/core/hmi.c
> +++ b/core/hmi.c
> @@ -299,6 +299,7 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  	uint32_t core_id;
>  	int i;
>  	bool found = false;
> +	int64_t ret;
>  
>  	/* Sanity check */
>  	if (!cpu || !hmi_evt)
> @@ -307,10 +308,18 @@ static bool decode_core_fir(struct cpu_thread *cpu,
>  	core_id = pir_to_core_id(cpu->pir);
>  
>  	/* Get CORE FIR register value. */
> -	if (xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
> -							&core_fir) != 0) {
> +	ret = xscom_read(cpu->chip_id, XSCOM_ADDR_P8_EX(core_id, CORE_FIR),
> +			 &core_fir);
> +
> +	if (ret == OPAL_HARDWARE) {
>  		prerror("HMI: XSCOM error reading CORE FIR\n");
>  		return false;
> +	} else if (ret == OPAL_WRONG_STATE) {
> +		/* CPU is asleep, so it probably didn't cause the checkstop */

It might be worth expanding this comment with an explanation of what happens 
if this core is the cause of the checkstop.

> +		prlog(PR_DEBUG,
> +		      "HMI: FIR read failed, chip %d core %d asleep\n",
> +		      cpu->chip_id, core_id);
> +		return true;

Doesn't returning true here cause a HMI event to be raised by 
queue_hmi_event() in find_core_checkstop_reason()? It looks like 
decode_core_fir() returns true when a core FIR is set so I don't see how 
callers could differentiate each case?

>  	}
>  
>  	prlog(PR_INFO, "HMI: CHIP ID: %x, CORE ID: %x, FIR: %016llx\n",
> 



More information about the Skiboot mailing list