[Skiboot] [PATCH] hw/fsp/rtc: read/write cached rtc tod on fsp hir.
Ananth N Mavinakayanahalli
ananth at linux.vnet.ibm.com
Tue Apr 4 18:33:48 AEST 2017
On Mon, Apr 03, 2017 at 07:27:21AM +0530, Pridhiviraj Paidipeddi wrote:
> Currently fsp-rtc reads/writes the cached RTC TOD on an fsp
> reset. Use latest fsp_in_rr() function to properly read the cached rtc
> value when fsp reset initiated by the hir.
>
> Below is the kernel trace when we set hw clock, when hir process starts.
>
> [ 1727.775824] NMI watchdog: BUG: soft lockup - CPU#57 stuck for 23s! [hwclock:7688]
> [ 1727.775856] Modules linked in: vmx_crypto ibmpowernv ipmi_powernv uio_pdrv_genirq ipmi_devintf powernv_op_panel uio ipmi_msghandler powernv_rng leds_powernv ip_tables x_tables autofs4 ses enclosure scsi_transport_sas crc32c_vpmsum lpfc ipr tg3 scsi_transport_fc
> [ 1727.775883] CPU: 57 PID: 7688 Comm: hwclock Not tainted 4.10.0-14-generic #16-Ubuntu
> [ 1727.775883] task: c000000fdfdc8400 task.stack: c000000fdfef4000
> [ 1727.775884] NIP: c00000000090540c LR: c0000000000846f4 CTR: 000000003006dd70
> [ 1727.775885] REGS: c000000fdfef79a0 TRAP: 0901 Not tainted (4.10.0-14-generic)
> [ 1727.775886] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>
> [ 1727.775889] CR: 28024442 XER: 20000000
> [ 1727.775890] CFAR: c00000000008472c SOFTE: 1
> GPR00: 0000000030005128 c000000fdfef7c20 c00000000144c900 fffffffffffffff4
> GPR04: 0000000028024442 c00000000090540c 9000000000009033 0000000000000000
> GPR08: 0000000000000000 0000000031fc4000 c000000000084710 9000000000001003
> GPR12: c0000000000846e8 c00000000fba0100
> [ 1727.775897] NIP [c00000000090540c] opal_set_rtc_time+0x4c/0xb0
> [ 1727.775899] LR [c0000000000846f4] opal_return+0xc/0x48
> [ 1727.775899] Call Trace:
> [ 1727.775900] [c000000fdfef7c20] [c00000000090540c] opal_set_rtc_time+0x4c/0xb0 (unreliable)
> [ 1727.775901] [c000000fdfef7c60] [c000000000900828] rtc_set_time+0xb8/0x1b0
> [ 1727.775903] [c000000fdfef7ca0] [c000000000902364] rtc_dev_ioctl+0x454/0x630
> [ 1727.775904] [c000000fdfef7d40] [c00000000035b1f4] do_vfs_ioctl+0xd4/0x8c0
> [ 1727.775906] [c000000fdfef7de0] [c00000000035bab4] SyS_ioctl+0xd4/0xf0
> [ 1727.775907] [c000000fdfef7e30] [c00000000000b184] system_call+0x38/0xe0
> [ 1727.775908] Instruction dump:
> [ 1727.775909] f821ffc1 39200000 7c832378 91210028 38a10020 39200000 38810028 f9210020
> [ 1727.775911] 4bfffe6d e8810020 80610028 4b77f61d <60000000> 7c7f1b78 3860000a 2fbffff4
>
> This is found when executing the testcase
> https://github.com/open-power/op-test-framework/blob/master/testcases/fspresetReload.py
>
> With this fix ran fsp hir torture testcase in the above test
> which is working fine.
>
> Signed-off-by: Pridhiviraj Paidipeddi <ppaidipe at linux.vnet.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth at linux.vnet.ibm.com>
This will work, but we will need to audit the other FSP_RESET_START
cases also.
A particular sequence of actions (with timeouts) need to be executed to
put an FSP in a state ready to be reset (HIR case). The actual RESET_START
notification is sent after the HIR sequence actually triggers the FSP
reset. The window between the HIR sequence start to the actual
notification is where this problem can occur.
More information about the Skiboot
mailing list