[Skiboot] [PATCH v2 2/2] mbox: Reduce default BMC timeouts

Nicholas Piggin npiggin at gmail.com
Fri Mar 23 11:42:40 AEDT 2018


On Fri, 23 Mar 2018 10:56:52 +1100
Cyril Bur <cyril.bur at au1.ibm.com> wrote:

> Rebooting a BMC can take 70 seconds. Skiboot cannot possibly spin for
> 70 seconds waiting for a BMC to come back. This also makes the current
> default of 30 seconds a bit pointless, is it far too short to be a
> worse case wait time but too long to avoid hitting hardlockup detectors
> and wrecking havoc inside host linux.
> 
> Just change it to three seconds so that host linux will survive and
> that, reads and writes will fail but at least the host stays up.
> 
> Also refactored the waiting loop just a bit so that it's easier to read.
> 
> Reported-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
> Tested-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
> Signed-off-by: Cyril Bur <cyril.bur at au1.ibm.com>

Thanks for splitting it out. There's one open-coded wait_for_bmc
timeout now that may be a bit out of date after this change to
defaults -- should that just be changed to use mbox_flash->timeout
too now?

Reviewed-by: Nicholas Piggin <npiggin at gmail.com>


> ---
> v2: Split this patch out from the other one and lowered the default
>     timeout.
> 
>  libflash/mbox-flash.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/libflash/mbox-flash.c b/libflash/mbox-flash.c
> index 7bff34c4..dd1f11c2 100644
> --- a/libflash/mbox-flash.c
> +++ b/libflash/mbox-flash.c
> @@ -42,7 +42,7 @@
>  /* Same technique as BUILD_BUG_ON from linux */
>  #define CHECK_HANDLER_SIZE(handlers) ((void)sizeof(char[1 - 2*!!(ARRAY_SIZE(handlers) != (MBOX_COMMAND_COUNT + 1))]))
>  
> -#define MBOX_DEFAULT_TIMEOUT 30
> +#define MBOX_DEFAULT_TIMEOUT 3 /* seconds */
>  
>  #define MSG_CREATE(init_command) { .command = init_command }
>  
> @@ -334,15 +334,14 @@ static int wait_for_bmc(struct mbox_flash_data *mbox_flash, unsigned int timeout
>  {
>  	unsigned long last = 1, start = tb_to_secs(mftb());
>  	prlog(PR_TRACE, "Waiting for BMC\n");
> -	while (mbox_flash->busy && timeout_sec) {
> +	while (mbox_flash->busy && timeout_sec > last) {
>  		long now = tb_to_secs(mftb());
>  		if (now - start > last) {
> -			timeout_sec--;
> -			last = now - start;
>  			if (last < timeout_sec / 2)
>  				prlog(PR_TRACE, "Been waiting for the BMC for %lu secs\n", last);
>  			else
>  				prlog(PR_ERR, "BMC NOT RESPONDING %lu second wait\n", last);
> +			last++;
>  		}
>  		/*
>  		 * Both functions are important.



More information about the Skiboot mailing list