[Skiboot] [PATCH] npu2: Move NPU2_XTS_BDF_MAP_VALID assignment to context init

Stewart Smith stewart at linux.ibm.com
Tue Apr 10 16:30:39 AEST 2018


Reza Arbab <arbab at linux.ibm.com> writes:
> A bad GPU or other condition may leave us with a subset of links that
> never get initialized. If an ATSD is sent to one of those bricks, it
> will never complete, leaving us waiting forever for a response:
>
> watchdog: BUG: soft lockup - CPU#23 stuck for 23s! [acos:2050]
> ...
> Modules linked in: nvidia_uvm(O) nvidia(O)
> CPU: 23 PID: 2050 Comm: acos Tainted: G        W  O    4.14.0 #2
> task: c0000000285cfc00 task.stack: c000001fea860000
> NIP:  c0000000000abdf0 LR: c0000000000acc48 CTR: c0000000000ace60
> REGS: c000001fea863550 TRAP: 0901   Tainted: G        W  O     (4.14.0)
> MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 28004484  XER: 20040000
> CFAR: c0000000000abdf4 SOFTE: 1
> GPR00: c0000000000acc48 c000001fea8637d0 c0000000011f7c00 c000001fea863820
> GPR04: 0000000002000000 0004100026000000 c0000000012778c8 c00000000127a560
> GPR08: 0000000000000001 0000000000000080 c000201cc7cb7750 ffffffffffffffff
> GPR12: 0000000000008000 c000000003167e80
> NIP [c0000000000abdf0] mmio_invalidate_wait+0x90/0xc0
> LR [c0000000000acc48] mmio_invalidate.isra.11+0x158/0x370
>
> ATSDs are only sent to bricks which have a valid entry in the XTS_BDF
> table. So to prevent the hang, don't set NPU2_XTS_BDF_MAP_VALID unless
> we make it all the way to creating a context for the BDF.
>
> Signed-off-by: Reza Arbab <arbab at linux.ibm.com>
> ---
>  hw/npu2.c | 15 ++++++++-------
>  1 file changed, 8 insertions(+), 7 deletions(-)

Thanks, merged to master as of 4724d2c07fa63b2b95f0f42fb13e07856251e48a

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the Skiboot mailing list