[Skiboot] [PATCH] fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM

Stewart Smith stewart at linux.vnet.ibm.com
Wed Oct 11 19:57:38 AEDT 2017


Stewart Smith <stewart at linux.vnet.ibm.com> writes:
> We had a race condition between FSP Reset/Reload and powering down
> the system from the host:
>
> Roughly:
>
>   FSP                Host
>   ---                ----
>   Power on
>                      Power on
>   (inject EPOW)
>   (trigger FSP R/R)
>                      Processes EPOW event, starts shutting down
>                      calls OPAL_CEC_POWER_DOWN
>   (is still in R/R)
>                      gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
>   (FSP comes back)
>                      spinning in opal_poll_events
>   (thinks host is running)
>
> The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
> error path for fsp_sync_msg() is to return -1, which means we give
> the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
> docs give us the opportunity to return OPAL_BUSY when trying again
> later may be successful, and we're ambiguous as to if you should retry
> on OPAL_INTERNAL_ERROR.
>
> For reference, the linux code looks like this:
>>static void __noreturn pnv_power_off(void)
>>{
>>        long rc = OPAL_BUSY;
>>
>>        pnv_prepare_going_down();
>>
>>        while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
>>                rc = opal_cec_power_down(0);
>>                if (rc == OPAL_BUSY_EVENT)
>>                        opal_poll_events(NULL);
>>                else
>>                        mdelay(10);
>>        }
>>        for (;;)
>>                opal_poll_events(NULL);
>>}
>
> Which means that *practically* our only option is to return OPAL_BUSY
> or OPAL_BUSY_EVENT.
>
> We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
> running pollers to communicate with the FSP and do the final bits of
> Reset/Reload handling before we power off the system.
>
> Additionally, we really should update our documentation to point all
> of these return codes and what action an OS should take.
>
> CC: stable
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
> Signed-off-by: Stewart Smith <stewart at linux.vnet.ibm.com>
> ---
>  doc/opal-api/opal-cec-power-down-5.rst | 18 +++++++++++++++---
>  doc/opal-api/return-codes.rst          |  6 +++++-
>  platforms/ibm-fsp/common.c             |  2 +-
>  3 files changed, 21 insertions(+), 5 deletions(-)

Merged to master as of 696d378d7b7295366e115e89a785640bf72a5043
and 5.4.x as of 70af010ad33300eb150187c3076c525617565d33
(and made it into 5.4.8)

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list