power9 NUMA crash while reading debugfs imc_cmd

Michael Ellerman mpe at ellerman.id.au
Fri Jun 28 13:12:48 AEST 2019


Qian Cai <cai at lca.pw> writes:
> Read of debugfs imc_cmd file for a memory-less node will trigger a crash below
> on this power9 machine which has the following NUMA layout.

What type of machine is it?

cheers

> I don't understand why I only saw it recently on linux-next where it
> was tested everyday. I can reproduce it back to 4.20 where 4.18 seems
> work fine.
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252 (On a 4.18-based kernel)
> 0x0000000000000000
>
> # numactl -H
> available: 6 nodes (0,8,252-255)
> node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
> 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
> 53 54 55 56 57 58 59 60 61 62 63
> node 0 size: 130210 MB
> node 0 free: 128406 MB
> node 8 cpus: 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85
> 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108
> 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127
> node 8 size: 130784 MB
> node 8 free: 130051 MB
> node 252 cpus:
> node 252 size: 0 MB
> node 252 free: 0 MB
> node 253 cpus:
> node 253 size: 0 MB
> node 253 free: 0 MB
> node 254 cpus:
> node 254 size: 0 MB
> node 254 free: 0 MB
> node 255 cpus:
> node 255 size: 0 MB
> node 255 free: 0 MB
> node distances:
> node   0   8  252  253  254  255 
>   0:  10  40  80  80  80  80 
>   8:  40  10  80  80  80  80 
>  252:  80  80  10  80  80  80 
>  253:  80  80  80  10  80  80 
>  254:  80  80  80  80  10  80 
>  255:  80  80  80  80  80  10
>
> # cat /sys/kernel/debug/powerpc/imc/imc_cmd_252
>
> [ 1139.415461][ T5301] Faulting instruction address: 0xc0000000000d0d58
> [ 1139.415492][ T5301] Oops: Kernel access of bad area, sig: 11 [#1]
> [ 1139.415509][ T5301] LE PAGE_SIZE=64K MMU=Radix MMU=Hash SMP NR_CPUS=256
> DEBUG_PAGEALLOC NUMA PowerNV
> [ 1139.415542][ T5301] Modules linked in: i2c_opal i2c_core ip_tables x_tables
> xfs sd_mod bnx2x mdio ahci libahci tg3 libphy libata firmware_class dm_mirror
> dm_region_hash dm_log dm_mod
> [ 1139.415595][ T5301] CPU: 67 PID: 5301 Comm: cat Not tainted 5.2.0-rc6-next-
> 20190627+ #19
> [ 1139.415634][ T5301] NIP:  c0000000000d0d58 LR: c00000000049aa18 CTR:
> c0000000000d0d50
> [ 1139.415675][ T5301] REGS: c00020194548f9e0 TRAP: 0300   Not tainted  (5.2.0-
> rc6-next-20190627+)
> [ 1139.415705][ T5301] MSR:  9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
> 28022822  XER: 00000000
> [ 1139.415777][ T5301] CFAR: c00000000049aa14 DAR: 000000000003fc08 DSISR:
> 40000000 IRQMASK: 0 
> [ 1139.415777][ T5301] GPR00: c00000000049aa18 c00020194548fc70 c0000000016f8b00
> 000000000003fc08 
> [ 1139.415777][ T5301] GPR04: c00020194548fcd0 0000000000000000 0000000014884e73
> ffffffff00011eaa 
> [ 1139.415777][ T5301] GPR08: 000000007eea5a52 c0000000000d0d50 0000000000000000
> 0000000000000000 
> [ 1139.415777][ T5301] GPR12: c0000000000d0d50 c000201fff7f8c00 0000000000000000
> 0000000000000000 
> [ 1139.415777][ T5301] GPR16: 000000000000000d 00007fffeb0c3368 ffffffffffffffff
> 0000000000000000 
> [ 1139.415777][ T5301] GPR20: 0000000000000000 0000000000000000 0000000000000000
> 0000000000020000 
> [ 1139.415777][ T5301] GPR24: 0000000000000000 0000000000000000 0000000000020000
> 000000010ec90000 
> [ 1139.415777][ T5301] GPR28: c00020194548fdf0 c00020049a584ef8 0000000000000000
> c00020049a584ea8 
> [ 1139.416116][ T5301] NIP [c0000000000d0d58] imc_mem_get+0x8/0x20
> [ 1139.416143][ T5301] LR [c00000000049aa18] simple_attr_read+0x118/0x170
> [ 1139.416158][ T5301] Call Trace:
> [ 1139.416182][ T5301] [c00020194548fc70] [c00000000049a970]
> simple_attr_read+0x70/0x170 (unreliable)
> [ 1139.416255][ T5301] [c00020194548fd10] [c00000000054385c]
> debugfs_attr_read+0x6c/0xb0
> [ 1139.416305][ T5301] [c00020194548fd60] [c000000000454c1c]
> __vfs_read+0x3c/0x70
> [ 1139.416363][ T5301] [c00020194548fd80] [c000000000454d0c] vfs_read+0xbc/0x1a0
> [ 1139.416392][ T5301] [c00020194548fdd0] [c00000000045519c]
> ksys_read+0x7c/0x140
> [ 1139.416434][ T5301] [c00020194548fe20] [c00000000000b108]
> system_call+0x5c/0x70
> [ 1139.416473][ T5301] Instruction dump:
> [ 1139.416511][ T5301] 4e800020 60000000 7c0802a6 60000000 7c801d28 38600000
> 4e800020 60000000 
> [ 1139.416572][ T5301] 60000000 60000000 7c0802a6 60000000 <7d201c28> 38600000
> f9240000 4e800020 
> [ 1139.416636][ T5301] ---[ end trace c44d1fb4ace04784 ]---
> [ 1139.520686][ T5301] 
> [ 1140.520820][ T5301] Kernel panic - not syncing: Fatal exception


More information about the Linuxppc-dev mailing list