[Skiboot] [PATCH] OPAL:Handle mbox response with bad status:0x24 during FSP termination
Mamatha Inamdar
mamatha4 at linux.vnet.ibm.com
Mon Apr 4 16:57:20 AEST 2016
Hi Stewart,
Sorry for the late response.
On 02/23/2016 08:32 AM, Stewart Smith wrote:
> Mamatha Inamdar <mamatha4 at linux.vnet.ibm.com> writes:
>> Problem Description:
>> During FSP termination/reset, FSP received mbox command from OPAL for
>> "Fetching platform management function data". As FSP is in termination
>> state DMAE operation failed to write memory data to hypervisor,
>> so FSP sent mbox command with response status as 0x24 to OPAL and
>> OPAL committed a predictive log with SRC BB822411 and sent back
>> response status as 0xFE, which FSP IPMI will not understand the
>> failure at the Host and IPMI will log the error.
>>
>> Fix:This patch is to fix when OPAL receives a bad response from FSP 0x24
>> due to DMAE error, commit informational log and return response status
>> as SUCCESS and for all other bad status response commit predictive
>> log.
> Hi!
>
> So I was trying to reproduce this on a FW840 machine doing "smgr
> resetReload" on the FSP side. While I get a bunch of hidden error logs
> from the FSP (and, mysteriously, on reset/reload the FSP seems to
> re-inform us of the error logs that have previously been acknowledged),
> I don't seem to get that specific SRC... can you share how you managed
> to reproduce/test this?
It's difficult to recreate the issue based on the traces, we have
observed during termination, FSP receives mbox command from OPAL, as FSP
is in termination state DMAE operation failed, so FSP sent mbox command
with response status as 0x24 to OPAL and OPAL committed a predictive log
with SRC BB822411.
Me and Mahesh discussed on this issue with Gajendra(FSP mbox component
owner) and came up with a fix, when OPAL receives a bad response from
FSP 0x24 due to DMAE error, commit informational log and return response
status as SUCCESS and for all other bad status response commit predictive
log.
>> diff --git a/hw/fsp/fsp-ipmi.c b/hw/fsp/fsp-ipmi.c
>> index 750d144..f803f17 100644
>> --- a/hw/fsp/fsp-ipmi.c
>> +++ b/hw/fsp/fsp-ipmi.c
>> @@ -50,6 +50,10 @@ DEFINE_LOG_ENTRY(OPAL_RC_IPMI_RESP, OPAL_PLATFORM_ERR_EVT, OPAL_IPMI,
>> OPAL_PLATFORM_FIRMWARE, OPAL_PREDICTIVE_ERR_GENERAL,
>> OPAL_NA);
>>
>> +DEFINE_LOG_ENTRY(OPAL_RC_IPMI_DMA_ERROR_RESP, OPAL_PLATFORM_ERR_EVT, OPAL_IPMI,
>> + OPAL_PLATFORM_FIRMWARE, OPAL_INFO,
>> + OPAL_NA);
>> +
>> struct fsp_ipmi_msg {
>> struct list_node link;
>> struct ipmi_msg ipmi_msg;
>> @@ -281,13 +285,19 @@ static bool fsp_ipmi_read_response(struct fsp_msg *msg)
>> assert(msg->data.words[1] == PSI_DMA_PLAT_RESP_BUF);
>>
>> if (status != FSP_STATUS_SUCCESS) {
>> - log_simple_error(&e_info(OPAL_RC_IPMI_RESP), "IPMI: Response "
>> - "with bad status:0x%02x\n", status);
>> + if(status == FSP_STATUS_DMA_ERROR)
>> + log_simple_error(&e_info(OPAL_RC_IPMI_DMA_ERROR_RESP), "IPMI: Received "
>> + "DMA ERROR response from FSP, this may be due to FSP "
>> + "is in termination state:0x%02x\n", status);
>> + else
>> + log_simple_error(&e_info(OPAL_RC_IPMI_RESP), "IPMI: FSP response "
>> + "received with bad status:0x%02x\n", status);
>> +
>> fsp_ipmi_cmd_done(ipmi_msg->cmd,
>> IPMI_NETFN_RETURN_CODE(ipmi_msg->netfn),
>> IPMI_ERR_UNSPECIFIED);
>> return fsp_ipmi_send_response(FSP_RSP_PLAT_DATA |
>> - FSP_STATUS_GENERIC_ERROR);
>> + FSP_STATUS_SUCCESS);
> So... responding with success here seems really counter intuitive for
> the error case.
As per FSP team for any bad response of ipmi, OPAL should not send back
response status as 0xFE. FSP IPMI determines the failure at the Host and
IPMI will log the error. Hence we are sending success as response and
committing predictive log error
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/skiboot/attachments/20160404/42faded1/attachment.html>
More information about the Skiboot
mailing list