[Skiboot] [PATCH] OPAL:Handle mbox response with bad status:0x24 during FSP termination

Stewart Smith stewart at linux.vnet.ibm.com
Tue Feb 23 14:02:00 AEDT 2016


Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com> writes:
> Problem Description:
> During FSP termination/reset, FSP received mbox command from OPAL for  
> "Fetching platform management function data". As FSP is in termination 
> state DMAE operation failed to write memory data to hypervisor, 
> so FSP sent mbox command with response status as 0x24 to OPAL and
> OPAL committed a predictive log with SRC BB822411 and sent back 
> response status as 0xFE, which FSP IPMI will not understand the 
> failure at the Host and IPMI will log the error.
>
> Fix:This patch is to fix when OPAL receives a bad response from FSP 0x24 
> due to DMAE error, commit informational log and return response status 
> as SUCCESS and for all other bad status response commit predictive
> log.

Hi!

So I was trying to reproduce this on a FW840 machine doing "smgr
resetReload" on the FSP side. While I get a bunch of hidden error logs
from the FSP (and, mysteriously, on reset/reload the FSP seems to
re-inform us of the error logs that have previously been acknowledged),
I don't seem to get that specific SRC... can you share how you managed
to reproduce/test this?


> diff --git a/hw/fsp/fsp-ipmi.c b/hw/fsp/fsp-ipmi.c
> index 750d144..f803f17 100644
> --- a/hw/fsp/fsp-ipmi.c
> +++ b/hw/fsp/fsp-ipmi.c
> @@ -50,6 +50,10 @@ DEFINE_LOG_ENTRY(OPAL_RC_IPMI_RESP, OPAL_PLATFORM_ERR_EVT, OPAL_IPMI,
>  		 OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL,
>  		 OPAL_NA);
>  
> +DEFINE_LOG_ENTRY(OPAL_RC_IPMI_DMA_ERROR_RESP, OPAL_PLATFORM_ERR_EVT, OPAL_IPMI,
> +		 OPAL_PLATFORM_FIRMWARE, OPAL_INFO,
> +		 OPAL_NA);
> +
>  struct fsp_ipmi_msg {
>  	struct list_node	link;
>  	struct ipmi_msg		ipmi_msg;
> @@ -281,13 +285,19 @@ static bool fsp_ipmi_read_response(struct fsp_msg *msg)
>  	assert(msg->data.words[1] == PSI_DMA_PLAT_RESP_BUF);
>  
>  	if (status != FSP_STATUS_SUCCESS) {
> -		log_simple_error(&e_info(OPAL_RC_IPMI_RESP), "IPMI: Response "
> -				 "with bad status:0x%02x\n", status);
> +		if(status == FSP_STATUS_DMA_ERROR)
> +			log_simple_error(&e_info(OPAL_RC_IPMI_DMA_ERROR_RESP), "IPMI: Received "
> +				"DMA ERROR response from FSP, this may be due to FSP "
> +				"is in termination state:0x%02x\n", status);
> +		else
> +			log_simple_error(&e_info(OPAL_RC_IPMI_RESP), "IPMI: FSP response "
> +				 "received with bad status:0x%02x\n", status);
> +
>  		fsp_ipmi_cmd_done(ipmi_msg->cmd,
>  				  IPMI_NETFN_RETURN_CODE(ipmi_msg->netfn),
>  				  IPMI_ERR_UNSPECIFIED);
>  		return fsp_ipmi_send_response(FSP_RSP_PLAT_DATA |
> -					      FSP_STATUS_GENERIC_ERROR);
> +					      FSP_STATUS_SUCCESS);

So... responding with success here seems really counter intuitive for
the error case.

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list