[Skiboot] [PATCH] fsp: return OPAL_BUSY_EVENT on failure sending FSP_CMD_POWERDOWN_NORM
Stewart Smith
stewart at linux.vnet.ibm.com
Mon Oct 2 12:08:25 AEDT 2017
We had a race condition between FSP Reset/Reload and powering down
the system from the host:
Roughly:
FSP Host
--- ----
Power on
Power on
(inject EPOW)
(trigger FSP R/R)
Processes EPOW event, starts shutting down
calls OPAL_CEC_POWER_DOWN
(is still in R/R)
gets OPAL_INTERNAL_ERROR, spins in opal_poll_events
(FSP comes back)
spinning in opal_poll_events
(thinks host is running)
The call to OPAL_CEC_POWER_DOWN is only made once as the reset/reload
error path for fsp_sync_msg() is to return -1, which means we give
the OS OPAL_INTERNAL_ERROR, which is fine, except that our own API
docs give us the opportunity to return OPAL_BUSY when trying again
later may be successful, and we're ambiguous as to if you should retry
on OPAL_INTERNAL_ERROR.
For reference, the linux code looks like this:
>static void __noreturn pnv_power_off(void)
>{
> long rc = OPAL_BUSY;
>
> pnv_prepare_going_down();
>
> while (rc == OPAL_BUSY || rc == OPAL_BUSY_EVENT) {
> rc = opal_cec_power_down(0);
> if (rc == OPAL_BUSY_EVENT)
> opal_poll_events(NULL);
> else
> mdelay(10);
> }
> for (;;)
> opal_poll_events(NULL);
>}
Which means that *practically* our only option is to return OPAL_BUSY
or OPAL_BUSY_EVENT.
We choose OPAL_BUSY_EVENT for FSP systems as we do want to ensure we're
running pollers to communicate with the FSP and do the final bits of
Reset/Reload handling before we power off the system.
Additionally, we really should update our documentation to point all
of these return codes and what action an OS should take.
CC: stable
Reported-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
Signed-off-by: Stewart Smith <stewart at linux.vnet.ibm.com>
---
doc/opal-api/opal-cec-power-down-5.rst | 18 +++++++++++++++---
doc/opal-api/return-codes.rst | 6 +++++-
platforms/ibm-fsp/common.c | 2 +-
3 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/doc/opal-api/opal-cec-power-down-5.rst b/doc/opal-api/opal-cec-power-down-5.rst
index 6daea3de65ff..bdcb84e3b958 100644
--- a/doc/opal-api/opal-cec-power-down-5.rst
+++ b/doc/opal-api/opal-cec-power-down-5.rst
@@ -24,16 +24,28 @@ Return Values
-------------
``OPAL_SUCCESS``
- the power down was updated successful
+ the power down request was successful.
+ This may/may not result in immediate power down. An OS should
+ spin in a loop after getting `OPAL_SUCCESS` as it is likely that there
+ will be a delay before instructions stop being executed.
``OPAL_BUSY``
- unable to power down, try again later
+ unable to power down, try again later.
+
+``OPAL_BUSY_EVENT``
+ Unable to power down, call `opal_run_pollers` and try again.
``OPAL_PARAMETER``
a parameter was incorrect
``OPAL_INTERNAL_ERROR``
- hal code sent incorrect data to hardware device
+ Something went wrong, and waiting and trying again is unlikely to be
+ successful. Although, considering that in a shutdown code path, there's
+ unlikely to be any other valid option to take, retrying is perfectly valid.
+
+ In older OPAL versions (prior to skiboot v5.9), on IBM FSP systems, this
+ return code was returned erroneously instead of OPAL_BUSY_EVENT during an
+ FSP Reset/Reload.
``OPAL_UNSUPPORTED``
this platform does not support being powered off.
diff --git a/doc/opal-api/return-codes.rst b/doc/opal-api/return-codes.rst
index 03ea5c19d728..3ea4a3d85169 100644
--- a/doc/opal-api/return-codes.rst
+++ b/doc/opal-api/return-codes.rst
@@ -40,7 +40,8 @@ OPAL_BUSY
#define OPAL_BUSY -2
-Try again later.
+Try again later. Related to `OPAL_BUSY_EVENT`, but `OPAL_BUSY` indicates that the
+caller need not call `OPAL_POLL_EVENTS` itself. **TODO** Clarify current situation.
OPAL_PARTIAL
------------
@@ -126,6 +127,9 @@ OPAL_BUSY_EVENT
#define OPAL_BUSY_EVENT -12
+The same as `OPAL_BUSY` but signals that the OS should call `OPAL_POLL_EVENTS` as
+that may be required to get into a state where the call will succeed.
+
OPAL_HARDWARE_FROZEN
--------------------
::
diff --git a/platforms/ibm-fsp/common.c b/platforms/ibm-fsp/common.c
index 237b63fb4f05..0a9b06f77e33 100644
--- a/platforms/ibm-fsp/common.c
+++ b/platforms/ibm-fsp/common.c
@@ -223,7 +223,7 @@ int64_t ibm_fsp_cec_power_down(uint64_t request)
printf("FSP: Sending shutdown command to FSP...\n");
if (fsp_sync_msg(fsp_mkmsg(FSP_CMD_POWERDOWN_NORM, 1, request), true))
- return OPAL_INTERNAL_ERROR;
+ return OPAL_BUSY_EVENT;
fsp_reset_links();
return OPAL_SUCCESS;
--
2.13.5
More information about the Skiboot
mailing list