[Skiboot] [RESEND PATCH v4] hw/xscom: Reset XSCOM engine after finite number of retries when busy
Stewart Smith
stewart at linux.vnet.ibm.com
Tue Jul 5 17:16:18 AEST 2016
Vipin K Parashar <vipin at linux.vnet.ibm.com> writes:
> OPAL retries XSCOM read/write operations forever till it succeeds.
> This can cause XSCOM ops to hang forever when XSCOM engine remains
> busy for some reason. Changed it to retry XSCOM operations only
> XSCOM_BUSY_MAX_RETRIES number of times instead of retrying forever.
> Also added logic to reset XSCOM engine after XSCOM_BUSY_RESET_THRESHOLD
> number of retries to unblock it when it remains busy.
>
> Cc: stable # 9c2d82394fd2 ("xscom: Return OPAL_WRONG_STATE on XSCOM ops..")
> Signed-off-by: Vipin K Parashar <vipin at linux.vnet.ibm.com>
> Signed-off-by: Vaidyanathan Srinivasan <svaidy at linux.vnet.ibm.com>
> ---
> Changes in v4:
> - Changed xscom operation retries loop counter variable to 'retries'.
> - Added a FIXME comment to replace 10 ms delay with exact HW delay
> or better method to confirm XSCOM reset completion.
>
> Changes in v3:
> - Added delay of 10ms after XSCOM engine reset when XSCOM is found busy.
> - Modified 'if' condition to check return value of xscom_handle_error().
>
> Changes in v2:
> - Changed newly added macro names to slightly more intuitive names.
> Used XSCOM_BUSY_MAX_RETRIES to signify total retries allowed if
> XSCOM remains busy and XSCOM_BUSY_RESET_THRESHOLD to hold threshold
> count for resetting XSCOM before retrying XSCOM operation again.
>
> hw/xscom.c | 74 +++++++++++++++++++++++++++++++++++++++++-------------
> include/errorlog.h | 1 +
> include/xscom.h | 6 +++++
> 3 files changed, 64 insertions(+), 17 deletions(-)
>
> diff --git a/hw/xscom.c b/hw/xscom.c
> index 4d88ce5..2a91a7b 100644
> --- a/hw/xscom.c
> +++ b/hw/xscom.c
> @@ -23,6 +23,7 @@
> #include <centaur.h>
> #include <errorlog.h>
> #include <opal-api.h>
> +#include <timebase.h>
>
> /* Mask of bits to clear in HMER before an access */
> #define HMER_CLR_MASK (~(SPR_HMER_XSCOM_FAIL | \
> @@ -41,6 +42,10 @@ DEFINE_LOG_ENTRY(OPAL_RC_XSCOM_RESET, OPAL_PLATFORM_ERR_EVT, OPAL_XSCOM,
> OPAL_CEC_HARDWARE, OPAL_PREDICTIVE_ERR_GENERAL,
> OPAL_NA);
>
> +DEFINE_LOG_ENTRY(OPAL_RC_XSCOM_BUSY, OPAL_PLATFORM_ERR_EVT, OPAL_XSCOM,
> + OPAL_CEC_HARDWARE, OPAL_PREDICTIVE_ERR_GENERAL,
> + OPAL_NA);
> +
> /* xscom details to trigger xstop */
> static struct {
> uint64_t addr;
> @@ -118,18 +123,49 @@ static void xscom_reset(uint32_t gcid)
> */
> }
>
> -static int xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr,
> - bool is_write)
> +static int64_t xscom_handle_error(uint64_t hmer, uint32_t gcid, uint32_t pcb_addr,
> + bool is_write, int64_t retries)
> {
> + struct timespec ts;
> unsigned int stat = GETFIELD(SPR_HMER_XSCOM_STATUS, hmer);
>
> /* XXX Figure out error codes from doc and error
> * recovery procedures
> */
> switch(stat) {
> - /* XSCOM blocked, just retry */
> + /*
> + * XSCOM engine is blocked, need to retry. Reset XSCOM engine
> + * after crossing retry threshold before retrying again.
> + */
> case 1:
> + if (retries && !(retries % XSCOM_BUSY_RESET_THRESHOLD)) {
> + prlog(PR_NOTICE, "XSCOM: Busy even after %d retries, "
> + "resetting XSCOM now. Total retries = %lld\n",
> + XSCOM_BUSY_RESET_THRESHOLD, retries);
> + xscom_reset(gcid);
> +
> + /*
> + * Its observed that sometimes immediate retry of
> + * XSCOM operation returns wrong data. Adding a
> + * delay for XSCOM reset to be effective. Delay of
> + * 10 ms is found to be working fine experimentally.
> + * FIXME: Replace 10ms delay by exact delay needed
> + * or other alternate method to confirm XSCOM reset
> + * completion, after checking from HW folks.
> + */
> + ts.tv_sec = 0;
> + ts.tv_nsec = 10 * 1000;
> + nanosleep_nopoll(&ts, NULL);
As much as I dislike sleeping, I think this is about the best way to do
it without getting a bit nuts, and the previous behaviour was just to
sit in a tight loop and wait forever... and 10ms is significantly less
time than forever.
Merged to master as of e761222593a1ae932cddbc81239b6a7cd98ddb70
and cherry picked back to 5.2.x as of 86673d26ca2a2f49e3b4fa216140cfc8eae1500a
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list