[Skiboot-stable] [PATCH] Fix hang in pnv_platform_error_reboot path due to TOD failure.

Fri Mar 29 15:52:32 AEDT 2019

Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> On TOD failure, with TB stuck, when linux heads down to
> pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
> cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
> all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
> But with panic cpu stuck inside OPAL, linux never recovers/reboot.
>
> p0 c1 t0
> NIA : 0x000000003001dd3c <.time_wait+0x64>
> CFAR : 0x000000003001dce4 <.time_wait+0xc>
> MSR : 0x9000000002803002
> LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>
> STACK: SP NIA
> 0x0000000031c236e0 0x0000000031c23760 (big-endian)
> 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
>
> This is because, there is a while loop towards the end of
> ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
> with "msg". It loops over time_wait_ms() until exit condition is met. In
> normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
> a chance to check ipmi response and set sync_msg to NULL.
>
> 	while (sync_msg == msg)
>                 time_wait_ms(10);
>
> But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
> returns immediately without calling pollers and hence we end up looping
> forever. This patch fixes this hang by calling opal_run_pollers() in TB
> failed state as well.
>
> Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
> Cc: skiboot-stable at lists.ozlabs.org
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
>  core/timebase.c |   10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/core/timebase.c b/core/timebase.c
> index 0ae11d253..b1a02a4a1 100644
> --- a/core/timebase.c
> +++ b/core/timebase.c
> @@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration)
>  	unsigned long period = msecs_to_tb(5);
>  
>  	if (this_cpu()->tb_invalid) {
> +		/*
> +		 * Run pollers to allow some backends to process response.
> +		 *
> +		 * In TOD failure case where TOD is unrecoverable, running
> +		 * pollers allows ipmi backend to deal with ipmi response
> +		 * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
> +		 * Thus it avoids linux kernel to hang during panic due to
> +		 * TOD failure.
> +		 */
> +		opal_run_pollers();
>  		cpu_relax();
>  		return;
>  	}

Hah! Well, that's obvious once you see it. I guess the bit that only
slightly worries me is if we have any poller that could run and then
depend on timebase or call time_wait_poll() and then do recursive
pollers... Although, since the testing for this didn't explode horribly,
it's probably at least *mostly* okay.

It'd be fun to do some chaos generating tests (lots of I2c, PNOR
activity etc) though and inject TOD errors to see if we could make it
explode... Actually, I'd almost bet on it.

I wonder if we can come up with some ways to help it *not* explode in
that state.

Anyway, merged to master as of 2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.

-- 
Stewart Smith
OPAL Architect, IBM.