[Skiboot] [Skiboot-stable] [PATCH] Fix hang in pnv_platform_error_reboot path due to TOD failure.

Fri Mar 29 19:42:34 AEDT 2019

On 3/29/19 10:22 AM, Stewart Smith wrote:
> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
>> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>>
>> On TOD failure, with TB stuck, when linux heads down to
>> pnv_platform_error_reboot() path due to unrecoverable hmi event, the panic
>> cpu gets stuck in OPAL inside ipmi_queue_msg_sync(). At this time, rest
>> all other cpus are in smp_handle_nmi_ipi() waiting for panic cpu to proceed.
>> But with panic cpu stuck inside OPAL, linux never recovers/reboot.
>>
>> p0 c1 t0
>> NIA : 0x000000003001dd3c <.time_wait+0x64>
>> CFAR : 0x000000003001dce4 <.time_wait+0xc>
>> MSR : 0x9000000002803002
>> LR : 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>>
>> STACK: SP NIA
>> 0x0000000031c236e0 0x0000000031c23760 (big-endian)
>> 0x0000000031c23760 0x000000003002ecf8 <.ipmi_queue_msg_sync+0xec>
>> 0x0000000031c237f0 0x00000000300aa5f8 <.hiomap_queue_msg_sync+0x7c>
>> 0x0000000031c23880 0x00000000300aaadc <.hiomap_window_move+0x150>
>> 0x0000000031c23950 0x00000000300ab1d8 <.ipmi_hiomap_write+0xcc>
>> 0x0000000031c23a90 0x00000000300a7b18 <.blocklevel_raw_write+0xbc>
>> 0x0000000031c23b30 0x00000000300a7c34 <.blocklevel_write+0xfc>
>> 0x0000000031c23bf0 0x0000000030030be0 <.flash_nvram_write+0xd4>
>> 0x0000000031c23c90 0x000000003002c128 <.opal_write_nvram+0xd0>
>> 0x0000000031c23d20 0x00000000300051e4 <opal_entry+0x134>
>> 0xc000001fea6e7870 0xc0000000000a9060 <opal_nvram_write+0x80>
>> 0xc000001fea6e78c0 0xc000000000030b84 <nvram_write_os_partition+0x94>
>> 0xc000001fea6e7960 0xc0000000000310b0 <nvram_pstore_write+0xb0>
>> 0xc000001fea6e7990 0xc0000000004792d4 <pstore_dump+0x1d4>
>> 0xc000001fea6e7ad0 0xc00000000018a570 <kmsg_dump+0x140>
>> 0xc000001fea6e7b40 0xc000000000028e5c <panic_flush_kmsg_end+0x2c>
>> 0xc000001fea6e7b60 0xc0000000000a7168 <pnv_platform_error_reboot+0x68>
>> 0xc000001fea6e7bd0 0xc0000000000ac9b8 <hmi_event_handler+0x1d8>
>> 0xc000001fea6e7c80 0xc00000000012d6c8 <process_one_work+0x1b8>
>> 0xc000001fea6e7d20 0xc00000000012da28 <worker_thread+0x88>
>> 0xc000001fea6e7db0 0xc0000000001366f4 <kthread+0x164>
>> 0xc000001fea6e7e20 0xc00000000000b65c <ret_from_kernel_thread+0x5c>
>>
>> This is because, there is a while loop towards the end of
>> ipmi_queue_msg_sync() which keeps looping until "sync_msg" does not match
>> with "msg". It loops over time_wait_ms() until exit condition is met. In
>> normal scenario time_wait_ms() calls run pollers so that ipmi backend gets
>> a chance to check ipmi response and set sync_msg to NULL.
>>
>> 	while (sync_msg == msg)
>>                 time_wait_ms(10);
>>
>> But in the event when TB is in failed state time_wait_ms()->time_wait_poll()
>> returns immediately without calling pollers and hence we end up looping
>> forever. This patch fixes this hang by calling opal_run_pollers() in TB
>> failed state as well.
>>
>> Fixes: 1764f2452 ("opal: Fix hang in time_wait* calls on HMI for TB errors.")
>> Cc: skiboot-stable at lists.ozlabs.org
>> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>> ---
>>  core/timebase.c |   10 ++++++++++
>>  1 file changed, 10 insertions(+)
>>
>> diff --git a/core/timebase.c b/core/timebase.c
>> index 0ae11d253..b1a02a4a1 100644
>> --- a/core/timebase.c
>> +++ b/core/timebase.c
>> @@ -29,6 +29,16 @@ static void time_wait_poll(unsigned long duration)
>>  	unsigned long period = msecs_to_tb(5);
>>  
>>  	if (this_cpu()->tb_invalid) {
>> +		/*
>> +		 * Run pollers to allow some backends to process response.
>> +		 *
>> +		 * In TOD failure case where TOD is unrecoverable, running
>> +		 * pollers allows ipmi backend to deal with ipmi response
>> +		 * from bmc and helps ipmi_queue_msg_sync() to get un-stuck.
>> +		 * Thus it avoids linux kernel to hang during panic due to
>> +		 * TOD failure.
>> +		 */
>> +		opal_run_pollers();
>>  		cpu_relax();
>>  		return;
>>  	}
> 
> Hah! Well, that's obvious once you see it. I guess the bit that only
> slightly worries me is if we have any poller that could run and then
> depend on timebase or call time_wait_poll() and then do recursive
> pollers... Although, since the testing for this didn't explode horribly,
> it's probably at least *mostly* okay.
> 
> It'd be fun to do some chaos generating tests (lots of I2c, PNOR
> activity etc) though and inject TOD errors to see if we could make it
> explode... Actually, I'd almost bet on it.
> 
> I wonder if we can come up with some ways to help it *not* explode in
> that state.

How about a nested_poller check inside tb_invalid case and return
immediately to stop the recursion.

Thanks,
-Mahesh.

> 
> Anyway, merged to master as of 2c71d7032484f4267ca425b5a58c5a6f06a2eaf3.
>