early soft lockup in 6.15-rc2 on PowerNV
Dan Horák
dan at danny.cz
Wed Apr 16 18:45:52 AEST 2025
Hi,
after updating to Fedora built 6.15-rc2 kernel from 6.14 I am getting a
soft lockup early in the boot and NVME related timeout/crash later
(could it be related?). I am first checking if this is a known issue
as I have not started bisecting yet.
[ 0.000000] dt-cpu-ftrs: setup for ISA 3000
[ 0.000000] dt-cpu-ftrs: final cpu/mmu features = 0x0001f86b8f5fb187 0x3c007041
[ 0.000000] radix-mmu: Page sizes from device-tree:
[ 0.000000] radix-mmu: Page size shift = 12 AP=0x0
[ 0.000000] radix-mmu: Page size shift = 16 AP=0x5
[ 0.000000] radix-mmu: Page size shift = 21 AP=0x1
[ 0.000000] radix-mmu: Page size shift = 30 AP=0x2
[ 0.000000] Activating Kernel Userspace Access Prevention
[ 0.000000] Activating Kernel Userspace Execution Prevention
[ 0.000000] radix-mmu: Mapped 0x0000000000000000-0x0000000003a00000 with 2.00 MiB pages (exec)
[ 0.000000] radix-mmu: Mapped 0x0000000003a00000-0x0000000040000000 with 2.00 MiB pages
[ 0.000000] radix-mmu: Mapped 0x0000000040000000-0x0000000800000000 with 1.00 GiB pages
[ 0.000000] radix-mmu: Mapped 0x0000200000000000-0x00002007c0000000 with 1.00 GiB pages
[ 0.000000] radix-mmu: Mapped 0x00002007c0000000-0x00002007fac00000 with 2.00 MiB pages
[ 0.000000] radix-mmu: Mapped 0x00002007fac00000-0x00002007fad00000 with 64.0 KiB pages
[ 0.000000] radix-mmu: Mapped 0x00002007fcd00000-0x00002007fce00000 with 64.0 KiB pages
[ 0.000000] radix-mmu: Mapped 0x00002007fce00000-0x0000200800000000 with 2.00 MiB pages
[ 0.000000] radix-mmu: Mapped 0x00002007fad00000-0x00002007fcd00000 with 64.0 KiB pages
[ 0.000000] radix-mmu: Initializing Radix MMU
[ 0.000000] Linux version 6.15.0-0.rc2.22.fc43.ppc64le (mockbuild at a0290efb436b46e8b89e5361c3c4e240) (gcc (GCC) 15.0.1 20250410 (Red Hat 15.0.1-0), GNU ld version 2.44-3.fc43) #1 SMP Mon A
pr 14 13:53:55 UTC 2025
[ 0.000000] OF: reserved mem: 0x00002007fcd30000..0x00002007fce2ffff (1024 KiB) map non-reusable HCODE at 2007fcd30000
[ 0.000000] OF: reserved mem: 0x00002007fd0e0000..0x00002007fd1dffff (1024 KiB) map non-reusable OCC at 2007fd0e0000
[ 0.000000] OF: reserved mem: 0x00002007fcd00000..0x00002007fcd2ffff (192 KiB) map non-reusable RINGOVD at 2007fcd00000
[ 0.000000] OF: reserved mem: 0x00002007fce30000..0x00002007fd0dffff (2752 KiB) map non-reusable WOFDATA at 2007fce30000
[ 0.000000] OF: reserved mem: 0x0000000035e00000..0x0000000038341fff (38152 KiB) map non-reusable ibm,firmware-allocs-memory at 35e00000
[ 0.000000] OF: reserved mem: 0x0000200000000000..0x0000200002a43fff (43280 KiB) map non-reusable ibm,firmware-allocs-memory at 200000000000
[ 0.000000] OF: reserved mem: 0x0000000030000000..0x00000000303fffff (4096 KiB) map non-reusable ibm,firmware-code at 30000000
[ 0.000000] OF: reserved mem: 0x0000000031000000..0x0000000031bfffff (12288 KiB) map non-reusable ibm,firmware-data at 31000000
[ 0.000000] OF: reserved mem: 0x0000000030400000..0x0000000030ffffff (12288 KiB) map non-reusable ibm,firmware-heap at 30400000
[ 0.000000] OF: reserved mem: 0x0000000031c00000..0x0000000035dfffff (67584 KiB) map non-reusable ibm,firmware-stacks at 31c00000
[ 0.000000] OF: reserved mem: 0x00002007fd230000..0x00002007fd66ffff (4352 KiB) map non-reusable ibm,hbrt-code-image at 2007fd230000
[ 0.000000] OF: reserved mem: 0x00002007fd670000..0x00002007fd7fffff (1600 KiB) map non-reusable ibm,hbrt-data at 2007fd670000
[ 0.000000] OF: reserved mem: 0x00002007fd800000..0x00002007fdbfffff (4096 KiB) map non-reusable ibm,homer-image at 2007fd800000
[ 0.000000] OF: reserved mem: 0x00002007fdc00000..0x00002007fdffffff (4096 KiB) map non-reusable ibm,homer-image at 2007fdc00000
[ 0.000000] OF: reserved mem: 0x00002007ff800000..0x00002007ffffffff (8192 KiB) map non-reusable ibm,occ-common-area at 2007ff800000
[ 0.000000] OF: reserved mem: 0x00002007fd200000..0x00002007fd20ffff (64 KiB) map non-reusable ibm,sbe-comm at 2007fd200000
[ 0.000000] OF: reserved mem: 0x00002007fd220000..0x00002007fd22ffff (64 KiB) map non-reusable ibm,sbe-comm at 2007fd220000
[ 0.000000] OF: reserved mem: 0x00002007fd1f0000..0x00002007fd1fffff (64 KiB) map non-reusable ibm,sbe-ffdc at 2007fd1f0000
[ 0.000000] OF: reserved mem: 0x00002007fd210000..0x00002007fd21ffff (64 KiB) map non-reusable ibm,sbe-ffdc at 2007fd210000
[ 0.000000] OF: reserved mem: 0x00002007fd1e0000..0x00002007fd1effff (64 KiB) map non-reusable ibm,secure-crypt-algo-code at 2007fd1e0000
[ 0.000000] Found initrd at 0xc000000006380000:0xc00000000a057310
[ 0.000000] OPAL: Found memory mapped LPC bus on chip 0
[ 0.000000] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
[ 0.000000] CPU maps initialized for 4 threads per core
[ 0.000000] -----------------------------------------------------
[ 0.000000] phys_mem_size = 0x1000000000
[ 0.000000] dcache_bsize = 0x80
[ 0.000000] icache_bsize = 0x80
[ 0.000000] cpu_features = 0x0001f86b8f5fb187
[ 0.000000] possible = 0x003ffbfbcf5fb187
[ 0.000000] always = 0x0000000380008181
[ 0.000000] cpu_user_features = 0xdc0065c2 0xaef00000
[ 0.000000] mmu_features = 0x3c007641
[ 0.000000] firmware_features = 0x0000000110000000
[ 0.000000] vmalloc start = 0xc008000000000000
[ 0.000000] IO start = 0xc00a000000000000
[ 0.000000] vmemmap start = 0xc00c000000000000
[ 0.000000] -----------------------------------------------------
[ 0.000000] NODE_DATA(0) allocated [mem 0x7ffd2dc00-0x7ffd3597f]
[ 0.000000] NODE_DATA(8) allocated [mem 0x2007ff420b00-0x2007ff42887f]
[ 0.000000] kvm_cma_reserve: reserving 3276 MiB for global area
[ 0.000000] cma: Reserved 3276 MiB at 0x0000000000000000
[ 0.000000] rfi-flush: mttrig type flush available
[ 0.000000] count-cache-flush: flush disabled.
[ 0.000000] link-stack-flush: software flush enabled.
[ 0.000000] stf-barrier: eieio barrier available
[ 0.000000] OPAL nvram setup, 589824 bytes
[ 0.000000] barrier-nospec: using ORI speculation barrier
[ 0.000000] Zone ranges:
[ 0.000000] Normal [mem 0x0000000000000000-0x00002007ffffffff]
[ 0.000000] Device empty
[ 0.000000] Movable zone start for each node
[ 0.000000] Early memory node ranges
[ 0.000000] node 0: [mem 0x0000000000000000-0x00000007ffffffff]
[ 0.000000] node 8: [mem 0x0000200000000000-0x00002007ffffffff]
[ 0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000007ffffffff]
[ 0.000000] Initmem setup node 8 [mem 0x0000200000000000-0x00002007ffffffff]
[ 0.000000] percpu: Embedded 4 pages/cpu s124056 r0 d138088 u262144
[ 0.000000] Kernel command line: root=/dev/mapper/Linux-Root ro rd.lvm.lv=Linux/Root rd.md.uuid=60936c65:08d9f6bc:b191c895:332a4d53 rd.md.uuid=06128381:0df3ab4b:02ebd84d:84921066 rd.md.uu
id=3c52d341:6485ed32:9da81f4c:706b231f console=tty1 console=hvc0
[ 0.000000] random: crng init done
[ 0.000000] printk: log_buf_len individual max cpu contribution: 4096 bytes
[ 0.000000] printk: log_buf_len total cpu_extra contributions: 258048 bytes
[ 0.000000] printk: log_buf_len min size: 262144 bytes
[ 0.000000] printk: log buffer data + meta data: 524288 + 1835008 = 2359296 bytes
[ 0.000000] printk: early log buf free: 254416(97%)
[ 0.000000] Fallback order for Node 0: 0 8
[ 0.000000] Fallback order for Node 8: 8 0
[ 0.000000] Built 2 zonelists, mobility grouping on. Total pages: 1048576
[ 0.000000] Policy zone: Normal
[ 0.000000] mem auto-init: stack:all(zero), heap alloc:on, heap free:off
[ 0.000000] SLUB: HWalign=128, Order=0-3, MinObjects=0, CPUs=64, Nodes=9
[ 0.000000] ftrace: allocating 54036 entries in 20 pages
[ 0.000000] ftrace: allocated 20 pages with 2 groups
[ 0.000000] rcu: Hierarchical RCU implementation.
[ 0.000000] rcu: RCU event tracing is enabled.
[ 0.000000] rcu: RCU restricting CPUs from NR_CPUS=2048 to nr_cpu_ids=64.
[ 0.000000] Rude variant of Tasks RCU enabled.
[ 0.000000] Tracing variant of Tasks RCU enabled.
[ 0.000000] rcu: RCU calculated value of scheduler-enlistment delay is 10 jiffies.
[ 0.000000] rcu: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=64
[ 0.000000] RCU Tasks Rude: Setting shift to 6 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=64.
[ 0.000000] RCU Tasks Trace: Setting shift to 6 and lim to 1 rcu_task_cb_adjust=1 rcu_task_cpu_ids=64.
[ 0.000000] NR_IRQS: 512, nr_irqs: 512, preallocated irqs: 16
[ 0.000000] xive: Interrupt handling initialized with native backend
[ 0.000000] xive: Using priority 7 for all interrupts
[ 0.000000] xive: Using 64kB queues
[ 0.000000] rcu: srcu_init: Setting srcu_struct sizes based on contention.
[ 0.000002] time_init: 56 bit decrementer (max: 7fffffffffffff)
[ 0.000006] clocksource: timebase: mask: 0xffffffffffffffff max_cycles: 0x761537d007, max_idle_ns: 440795202126 ns
[ 0.000010] clocksource: timebase mult[1f40000] shift[24] registered
[ 0.001157] kfence: initialized - using 33554432 bytes for 255 objects at 0x(____ptrval____)-0x(____ptrval____)
[ 0.001583] Console: colour dummy device 80x25
[ 0.001591] printk: legacy console [tty1] enabled
[ 0.002206] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
[ 0.002222] pid_max: default: 65536 minimum: 512
[ 0.002787] LSM: initializing lsm=lockdown,capability,yama,selinux,bpf,landlock,ipe,ima,evm
[ 0.003040] Yama: becoming mindful.
[ 0.003053] SELinux: Initializing.
[ 0.004545] LSM support for eBPF active
[ 0.004613] landlock: Up and running.
[ 0.009052] Dentry cache hash table entries: 8388608 (order: 10, 67108864 bytes, vmalloc hugepage)
[ 0.011240] Inode-cache hash table entries: 4194304 (order: 9, 33554432 bytes, vmalloc hugepage)
[ 0.011462] Mount-cache hash table entries: 131072 (order: 4, 1048576 bytes, vmalloc)
[ 0.011546] Mountpoint-cache hash table entries: 131072 (order: 4, 1048576 bytes, vmalloc)
[ 0.030196] POWER9 performance monitor hardware support registered
[ 0.030381] rcu: Hierarchical SRCU implementation.
[ 0.030387] rcu: Max phase no-delay instances is 1000.
[ 0.030496] Timer migration: 3 hierarchy levels; 8 children per group; 2 crossnode level
[ 0.031884] smp: Bringing up secondary CPUs ...
[ 2.861944] smp: Brought up 2 nodes, 64 CPUs
[ 2.861964] numa: Node 0 CPUs: 0-31
[ 2.861977] numa: Node 8 CPUs: 32-63
[ 2.866399] Memory: 63016960K/67108864K available (25152K kernel code, 4416K rwdata, 24000K rodata, 9792K init, 1796K bss, 476160K reserved, 3356672K cma-reserved)
[ 2.874121] devtmpfs: initialized
[ 24.037685] watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
[ 24.037690] CPU#0 Utilization every 4s during lockup:
[ 24.037692] #1: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 24.037697] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 24.037701] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 24.037704] #4: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 24.037707] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 24.037711] Modules linked in:
[ 24.037716] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.15.0-0.rc2.22.fc43.ppc64le #1 VOLUNTARY
[ 24.037722] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
[ 24.037725] NIP: c00000000308a72c LR: c00000000308a7d0 CTR: c0000000018012c0
[ 24.037729] REGS: c000200006637a50 TRAP: 0900 Not tainted (6.15.0-0.rc2.22.fc43.ppc64le)
[ 24.037733] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 48000828 XER: 00000000
[ 24.037750] CFAR: 0000000000000000 IRQMASK: 0
[ 24.037750] GPR00: c00000000308a7d0 c000200006637cf0 c0000000025baa00 0000000000000040
[ 24.037750] GPR04: c0002007ff390b00 0000000000010000 0000000000000000 c0002007ff3a0b00
[ 24.037750] GPR08: 00000000002007ff 000000000012d092 0000000000000000 0000000000000000
[ 24.037750] GPR12: 0000000000000000 c000000003fb0000 c000000000011320 0000000000000000
[ 24.037750] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.037750] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.037750] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 24.037750] GPR28: 0000000000000000 c000000003f10be0 c0000000019efaf8 0000000000037940
[ 24.037806] NIP [c00000000308a72c] memory_dev_init+0xb4/0x194
[ 24.037815] LR [c00000000308a7d0] memory_dev_init+0x158/0x194
[ 24.037820] Call Trace:
[ 24.037822] [c000200006637cf0] [c00000000308a7d0] memory_dev_init+0x158/0x194 (unreliable)
[ 24.037830] [c000200006637d70] [c000000003089bd0] driver_init+0x74/0xa0
[ 24.037836] [c000200006637d90] [c00000000300f628] kernel_init_freeable+0x204/0x288
[ 24.037843] [c000200006637df0] [c000000000011344] kernel_init+0x2c/0x1b8
[ 24.037849] [c000200006637e50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
[ 24.037855] --- interrupt: 0 at 0x0
[ 24.037858] Code: 7c651b78 40820010 3fa20195 3bbd61e0 48000080 3c62ff89 389e00c8 3863e510 4bf7a625 60000000 39290001 7c284840 <41800088> 792aaac2 7c2a2840 4080ffec
[ 48.045039] watchdog: BUG: soft lockup - CPU#0 stuck for 44s! [swapper/0:1]
[ 48.045043] CPU#0 Utilization every 4s during lockup:
[ 48.045045] #1: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 48.045049] #2: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 48.045053] #3: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 48.045056] #4: 101% system, 0% softirq, 0% hardirq, 0% idle
[ 48.045059] #5: 100% system, 0% softirq, 0% hardirq, 0% idle
[ 48.045063] Modules linked in:
[ 48.045067] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G L ------ --- 6.15.0-0.rc2.22.fc43.ppc64le #1 VOLUNTARY
[ 48.045073] Tainted: [L]=SOFTLOCKUP
[ 48.045075] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
[ 48.045077] NIP: c00000000308a72c LR: c00000000308a7d0 CTR: c0000000018012c0
[ 48.045081] REGS: c000200006637a50 TRAP: 0900 Tainted: G L ------ --- (6.15.0-0.rc2.22.fc43.ppc64le)
[ 48.045085] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 48000828 XER: 00000000
[ 48.045100] CFAR: 0000000000000000 IRQMASK: 0
[ 48.045100] GPR00: c00000000308a7d0 c000200006637cf0 c0000000025baa00 0000000000000040
[ 48.045100] GPR04: c0002007ff390b00 0000000000010000 0000000000000000 c0002007ff3a0b00
[ 48.045100] GPR08: 00000000002007ff 00000000000a65fd 0000000000000000 0000000000000000
[ 48.045100] GPR12: 0000000000000000 c000000003fb0000 c000000000011320 0000000000000000
[ 48.045100] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 48.045100] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 48.045100] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 48.045100] GPR28: 0000000000000000 c000000003f10be0 c0000000019efaf8 000000000007f880
[ 48.045155] NIP [c00000000308a72c] memory_dev_init+0xb4/0x194
[ 48.045161] LR [c00000000308a7d0] memory_dev_init+0x158/0x194
[ 48.045166] Call Trace:
[ 48.045167] [c000200006637cf0] [c00000000308a7d0] memory_dev_init+0x158/0x194 (unreliable)
[ 48.045175] [c000200006637d70] [c000000003089bd0] driver_init+0x74/0xa0
[ 48.045181] [c000200006637d90] [c00000000300f628] kernel_init_freeable+0x204/0x288
[ 48.045187] [c000200006637df0] [c000000000011344] kernel_init+0x2c/0x1b8
[ 48.045193] [c000200006637e50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
[ 48.045199] --- interrupt: 0 at 0x0
[ 48.045202] Code: 7c651b78 40820010 3fa20195 3bbd61e0 48000080 3c62ff89 389e00c8 3863e510 4bf7a625 60000000 39290001 7c284840 <41800088> 792aaac2 7c2a2840 4080ffec
[ 62.919422] rcu: INFO: rcu_sched self-detected stall on CPU
[ 62.919431] rcu: 0-....: (5999 ticks this GP) idle=7764/1/0x4000000000000002 softirq=103/103 fqs=2993
[ 62.919450] rcu: (t=6000 jiffies g=-935 q=2 ncpus=64)
[ 62.919459] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Tainted: G L ------ --- 6.15.0-0.rc2.22.fc43.ppc64le #1 VOLUNTARY
[ 62.919465] Tainted: [L]=SOFTLOCKUP
[ 62.919467] Hardware name: T2P9D01 REV 1.00 POWER9 0x4e1202 opal:skiboot-bc106a0 PowerNV
[ 62.919470] NIP: c00000000308a744 LR: c00000000308a7d0 CTR: c0000000018012c0
[ 62.919473] REGS: c000200006637a50 TRAP: 0900 Tainted: G L ------ --- (6.15.0-0.rc2.22.fc43.ppc64le)
[ 62.919477] MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 88000828 XER: 00000000
[ 62.919492] CFAR: 0000000000000000 IRQMASK: 0
[ 62.919492] GPR00: c00000000308a7d0 c000200006637cf0 c0000000025baa00 0000000000000040
[ 62.919492] GPR04: c0002007ff390b00 0000000000010000 0000000000000000 c0002007ff3a0b00
[ 62.919492] GPR08: 00000000002007ff 000000000012fdce 00000000000012f8 0000000000000000
[ 62.919492] GPR12: 0000000000000000 c000000003fb0000 c000000000011320 0000000000000000
[ 62.919492] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.919492] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.919492] GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[ 62.919492] GPR28: 0000000000000000 c000000003f10be0 c0000000019efaf8 00000000000b3d80
[ 62.919546] NIP [c00000000308a744] memory_dev_init+0xcc/0x194
[ 62.919552] LR [c00000000308a7d0] memory_dev_init+0x158/0x194
[ 62.919557] Call Trace:
[ 62.919558] [c000200006637cf0] [c00000000308a7d0] memory_dev_init+0x158/0x194 (unreliable)
[ 62.919565] [c000200006637d70] [c000000003089bd0] driver_init+0x74/0xa0
[ 62.919572] [c000200006637d90] [c00000000300f628] kernel_init_freeable+0x204/0x288
[ 62.919578] [c000200006637df0] [c000000000011344] kernel_init+0x2c/0x1b8
[ 62.919584] [c000200006637e50] [c00000000000debc] ret_from_kernel_user_thread+0x14/0x1c
[ 62.919589] --- interrupt: 0 at 0x0
and for the NVME issue
...
[ 114.881200] [drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
[ 114.884117] amdgpu 0000:01:00.0: BAR 2 [mem 0x6000010000000-0x60000101fffff 64bit pref]: releasing
[ 114.884153] amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x600000fffffff 64bit pref]: releasing
[ 114.884197] pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]: releasing
[ 114.884232] pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x600017fffffff 64bit pref]: assigned
[ 114.884258] amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x60000ffffffff 64bit pref]: assigned
[ 114.884301] amdgpu 0000:01:00.0: BAR 2 [mem 0x6000100000000-0x60001001fffff 64bit pref]: assigned
[ 114.884334] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 114.884354] pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff]
[ 114.884377] pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
[ 114.884428] amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
[ 114.884461] amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[ 114.884486] [drm] Detected VRAM RAM=4096M, BAR=4096M
[ 114.884501] [drm] RAM width 128bits GDDR5
[ 114.884516] amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
[ 114.884819] [drm] amdgpu: 4096M of VRAM memory ready
[ 114.884837] [drm] amdgpu: 32570M of GTT memory ready.
[ 114.884923] [drm] GART: num cpu pages 4096, num gpu pages 65536
[ 114.885601] [drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
[ 114.890493] [drm] Chained IB support enabled!
[drm] vm size is 256 GB, 2 levels, block size is 10-bit, fragment size is 9-bit
amdgpu 0000:01:00.0: BAR 2 [mem 0x6000010000000-0x60000101fffff 64bit pref]: releasing
amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x600000fffffff 64bit pref]: releasing
pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]: releasing
pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x600017fffffff 64bit pref]: assigned
amdgpu 0000:01:00.0: BAR 0 [mem 0x6000000000000-0x60000ffffffff 64bit pref]: assigned
amdgpu 0000:01:00.0: BAR 2 [mem 0x6000100000000-0x60001001fffff 64bit pref]: assigned
pci 0000:00:00.0: PCI bridge to [bus 01]
pci 0000:00:00.0: bridge window [mem 0x600c000000000-0x600c07fefffff]
pci 0000:00:00.0: bridge window [mem 0x6000000000000-0x6003fbff0ffff 64bit pref]
amdgpu 0000:01:00.0: amdgpu: VRAM: 4096M 0x000000F400000000 - 0x000000F4FFFFFFFF (4096M used)
amdgpu 0000:01:00.0: amdgpu: GART: 256M 0x000000FF00000000 - 0x000000FF0FFFFFFF
[drm] Detected VRAM RAM=4096M, BAR=4096M
[drm] RAM width 128bits GDDR5
amdgpu 0000:01:00.0: dma_iommu_get_required_mask: returning bypass mask 0xfffffffffffffff
[drm] amdgpu: 4096M of VRAM memory ready
[drm] amdgpu: 32570M of GTT memory ready.
[drm] GART: num cpu pages 4096, num gpu pages 65536
[drm] PCIE GART of 256M enabled (table at 0x000000F4FFF80000).
[drm] Chained IB support enabled!
[ 114.911510] amdgpu: hwmgr_sw_init smu backed is polaris10_smu
amdgpu: hwmgr_sw_init smu backed is polaris10_smu
[ 114.957192] [drm] Found UVD firmware Version: 1.130 Family ID: 16
[drm] Found UVD firmware Version: 1.130 Family ID: 16
[ 114.974490] [drm] Found VCE firmware Version: 53.26 Binary ID: 3
[drm] Found VCE firmware Version: 53.26 Binary ID: 3
[ 115.001810] [drm] Display Core v3.2.325 initialized on DCE 11.2
[drm] Display Core v3.2.325 initialized on DCE 11.2
[ 115.143971] [drm] UVD and UVD ENC initialized successfully.
[drm] UVD and UVD ENC initialized successfully.
[ 115.271914] [drm] VCE initialized successfully.
[ 115.275652] kfd kfd: amdgpu: skipped device 1002:67e3, PCI rejects atomics 730<0
[ 115.275695] amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16
[ 115.280222] amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[drm] VCE initialized successfully.
kfd kfd: amdgpu: skipped device 1002:67e3, PCI rejects atomics 730<0
amdgpu 0000:01:00.0: amdgpu: SE 2, SH per SE 1, CU per SH 8, active_cu_number 16
amdgpu 0000:01:00.0: amdgpu: Using BACO for runtime pm
[ 115.281521] amdgpu 0000:01:00.0: [drm] Registered 5 planes with drm panic
[ 115.281550] [drm] Initialized amdgpu 3.63.0 for 0000:01:00.0 on minor 0
amdgpu 0000:01:00.0: [drm] Registered 5 planes with drm panic
[drm] Initialized amdgpu 3.63.0 for 0000:01:00.0 on minor 0
Console: switching to colour frame buffer device 240x75[ 115.334341] Console: switching to colour frame buffer device 240x75
[ 115.351211] amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
amdgpu 0000:01:00.0: [drm] fb0: amdgpudrmfb frame buffer device
[ OK ] Stopped systemd-vconsole-setup.service - Virtual Console Setup.
Stopping systemd-vconsole-setup.service - Virtual Console Setup...
Starting systemd-vconsole-setup.service - Virtual Console Setup...
[ OK ] Finished systemd-vconsole-setup.service - Virtual Console Setup.
[ 125.951686] pci 0030:02:07.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.951754] pci 0031:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.951800] pci 0032:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.951844] pci 0033:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.951888] pci 0000:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.951944] pci 0001:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952016] pci 0002:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952099] pci 0003:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952170] pci 0004:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952230] pci 0005:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952325] pci 0005:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952409] pci 0030:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952482] pci 0030:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952565] pci 0030:02:04.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952648] pci 0030:02:05.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 125.952705] pci 0030:02:06.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:02:07.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0031:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0032:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0033:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0000:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0001:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0002:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0003:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0004:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0005:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0005:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:00:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:01:00.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:02:04.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:02:05.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
pci 0030:02:06.0: deferred probe pending: pci: wait for supplier /interrupt-controller at 0
[ 345.065439618,3] PHB#0030[8:0]: brdgCtl = 00000002
[ 345.065504727,3] PHB#0030[8:0]: deviceStatus = 00060020
[ 345.065555303,3] PHB#0030[8:0]: slotStatus = 00402000
[ 345.065598361,3] PHB#0030[8:0]: linkStatus = a0830008
[ 345.065646434,3] PHB#0030[8:0]: devCmdStatus = 00100107
[ 345.065688036,3] PHB#0030[8:0]: devSecStatus = 00000800
[ 345.065725760,3] PHB#0030[8:0]: rootErrorStatus = 00000000
[ 345.065761005,3] PHB#0030[8:0]: corrErrorStatus = 00000000
[ 345.065799111,3] PHB#0030[8:0]: uncorrErrorStatus = 00000000
[ 345.065842333,3] PHB#0030[8:0]: devctl = 00000020
[ 345.065884810,3] PHB#0030[8:0]: devStat = 00000006
[ 345.065929734,3] PHB#0030[8:0]: tlpHdr1 = 00000000
[ 345.065976000,3] PHB#0030[8:0]: tlpHdr2 = 00000000
[ 345.066022862,3] PHB#0030[8:0]: tlpHdr3 = 00000000
[ 345.066063591,3] PHB#0030[8:0]: tlpHdr4 = 00000000
[ 345.066118027,3] PHB#0030[8:0]: sourceId = 00000000
[ 345.066166060,3] PHB#0030[8:0]: nFir = 0000000000000000
[ 345.066216807,3] PHB#0030[8:0]: nFirMask = 0030001c00000000
[ 345.066262342,3] PHB#0030[8:0]: nFirWOF = 0000000000000000
[ 345.066307131,3] PHB#0030[8:0]: phbPlssr = 0000001c00000000
[ 345.066347956,3] PHB#0030[8:0]: phbCsr = 0000001c00000000
[ 345.066400929,3] PHB#0030[8:0]: lemFir = 0000000100000080
[ 345.066464941,3] PHB#0030[8:0]: lemErrorMask = 0000000000000000
[ 345.066508121,3] PHB#0030[8:0]: lemWOF = 0000000000000080
[ 345.066552807,3] PHB#0030[8:0]: phbErrorStatus = 0000028000000000
[ 345.066598507,3] PHB#0030[8:0]: phbFirstErrorStatus = 0000020000000000
[ 345.066645892,3] PHB#0030[8:0]: phbErrorLog0 = 2148000098000240
[ 345.066694406,3] PHB#0030[8:0]: phbErrorLog1 = a008400000000000
[ 345.066738324,3] PHB#0030[8:0]: phbTxeErrorStatus = 0000000000000000
[ 345.066789036,3] PHB#0030[8:0]: phbTxeFirstErrorStatus = 0000000000000000
[ 345.066839733,3] PHB#0030[8:0]: phbTxeErrorLog0 = 0000000000000000
[ 345.066890491,3] PHB#0030[8:0]: phbTxeErrorLog1 = 0000000000000000
[ 345.066934140,3] PHB#0030[8:0]: phbRxeArbErrorStatus = 0000000000000000
[ 345.066976699,3] PHB#0030[8:0]: phbRxeArbFrstErrorStatus = 0000000000000000
[ 345.067020438,3] PHB#0030[8:0]: phbRxeArbErrorLog0 = 0000000000000000
[ 345.067067083,3] PHB#0030[8:0]: phbRxeArbErrorLog1 = 0000000000000000
[ 345.067117696,3] PHB#0030[8:0]: phbRxeMrgErrorStatus = 0000000000000000
[ 345.067164954,3] PHB#0030[8:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[ 345.067212157,3] PHB#0030[8:0]: phbRxeMrgErrorLog0 = 0000000000000000
[ 345.067255830,3] PHB#0030[8:0]: phbRxeMrgErrorLog1 = 0000000000000000
[ 345.067296445,3] PHB#0030[8:0]: phbRxeTceErrorStatus = 2000000000000000
[ 345.067337662,3] PHB#0030[8:0]: phbRxeTceFrstErrorStatus = 2000000000000000
[ 345.067388492,3] PHB#0030[8:0]: phbRxeTceErrorLog0 = c0000000000001fa
[ 345.067439384,3] PHB#0030[8:0]: phbRxeTceErrorLog1 = 0000000000000000
[ 345.067485879,3] PHB#0030[8:0]: phbPblErrorStatus = 0000000000020000
[ 345.067528771,3] PHB#0030[8:0]: phbPblFirstErrorStatus = 0000000000020000
[ 345.067571445,3] PHB#0030[8:0]: phbPblErrorLog0 = 0000000000000000
[ 345.067612425,3] PHB#0030[8:0]: phbPblErrorLog1 = 0000000000000000
[ 345.067663057,3] PHB#0030[8:0]: phbPcieDlpErrorLog1 = 0000000000000000
[ 345.067713544,3] PHB#0030[8:0]: phbPcieDlpErrorLog2 = 0000000000000000
[ 345.067756077,3] PHB#0030[8:0]: phbPcieDlpErrorStatus = 0000000000000000
[ 345.067804777,3] PHB#0030[8:0]: phbRegbErrorStatus = 0000004000000000
[ 345.067846597,3] PHB#0030[8:0]: phbRegbFirstErrorStatus = 0000004000000000
[ 345.067887318,3] PHB#0030[8:0]: phbRegbErrorLog0 = 8800000c00000000
[ 345.067932570,3] PHB#0030[8:0]: phbRegbErrorLog1 = 0000000007011000
[ 345.067980596,3] PHB#0030[8:0]: PEST[506] = 8300b03800000000 8000000000000000
[ 345.068048045,3] PHB#0030[8:0]: PEST[507] = 8300b03800000000 8000000000000000
[ 345.068099305,3] PHB#0030[8:0]: PEST[511] = 3740002a01000000 0000000000000000
[ 140.099956] EEH: Recovering PHB#30-PE#1fa
[ 140.100001] EEH: PE location: N/A, PHB location: N/A
[ 140.100032] EEH: Frozen PHB#30-PE#1fa detected
[ 140.100071] EEH: Call Trace:
[ 140.100096] EEH: [00000000ffe66fe6] __eeh_send_failure_event+0xa4/0x180
[ 140.100147] EEH: [00000000cde11bd8] eeh_dev_check_failure+0x3d8/0x740
[ 140.100183] EEH: [0000000063d788bb] nvme_timeout+0x288/0x750 [nvme]
[ 140.100223] EEH: [0000000043ae3de7] blk_mq_handle_expired+0x98/0xf0
[ 140.100259] EEH: [0000000018e27476] bt_iter+0xec/0x120
[ 140.100293] EEH: [00000000ffb65dd3] blk_mq_queue_tag_busy_iter+0x414/0xa60
[ 140.100331] EEH: [0000000024de88c5] blk_mq_timeout_work+0x1c8/0x230
[ 140.100848] EEH: [000000003e6b6b37] process_one_work+0x1f0/0x520
[ 140.101347] EEH: [00000000f4e3d3a4] worker_thread+0x33c/0x510
EEH: Recovering PHB#30-PE#1fa[ 140.102032] EEH: [00000000ee3ba07d] kthread+0x150/0x160
[ 345.069730133,3] PHB#0030[8:0]: brdgCtl = 00000002
[ 345.069770540,3] PHB#0030[8:0]: deviceStatus = 00060020
[ 345.069818070,3] PHB#0030[8:0]: slotStatus = 00402000
[ 345.069857447,3] PHB#0030[8:0]: linkStatus = a0830008
[ 345.069900881,3] PHB#0030[8:0]: devCmdStatus = 00100107
[ 140[ 345.069940313,3] PHB#0030[8:0]: devSecStatus = 00000800
.102037][ 345.069989240,3] PHB#0030[8:0]: rootErrorStatus = 00000000
EEH: [0[ 345.070039547,3] PHB#0030[8:0]: corrErrorStatus = 00000000
00000009f50efe6] start_k[ 345.070080475,3] PHB#0030[8:0]: uncorrErrorStatus = 00000000
ernel_th[ 345.070161917,3] PHB#0030[8:0]: devctl = 00000020
read+0x14/0x18
[ 345.070208284,3] PHB#0030[8:0]: devStat = 00000006
[ 345.070273572,3] PHB#0030[8:0]: tlpHdr1 = 00000000
[ 345.070317083,3] PHB#0030[8:0]: tlpHdr2 = 00000000
[ 345.070356757,3] PHB#0030[8:0]: tlpHdr3 = 00000000
[ 345.070393297,3] PHB#0030[8:0]: tlpHdr4 = 00000000
[ 345.070427096,3] PHB#0030[8:0]: sourceId = 00000000
[ 345.070463542,3] PHB#0030[8:0]: nFir = 0000000000000000
[ 345.070515927,3] PHB#0030[8:0]: nFirMask = 0030001c00000000
[ 345.070562368,3] PHB#0030[8:0]: nFirWOF = 0000000000000000
[ 140.1[ 345.070608115,3] PHB#0030[8:0]: phbPlssr = 0000001c00000000
02041] EEH: This[ 345.070654917,3] PHB#0030[8:0]: phbCsr = 0000001c00000000
PCI device has [ 345.070702546,3] PHB#0030[8:0]: lemFir = 0000000100000080
failed 1 times i[ 345.070753051,3] PHB#0030[8:0]: lemErrorMask = 0000000000000000
[ 345.070805729,3] PHB#0030[8:0]: lemWOF = 0000000000000080
[ 345.070852396,3] PHB#0030[8:0]: phbErrorStatus = 0000028000000000
[ 345.070898231,3] PHB#0030[8:0]: phbFirstErrorStatus = 0000020000000000
[ 345.070939973,3] PHB#0030[8:0]: phbErrorLog0 = 2148000098000240
[ 345.070978718,3] PHB#0030[8:0]: phbErrorLog1 = a008400000000000
[ 345.071017848,3] PHB#0030[8:0]: phbTxeErrorStatus = 0000000000000000
[ 345.071060984,3] PHB#0030[8:0]: phbTxeFirstErrorStatus = 0000000000000000
[ 345.071111129,3] PHB#0030[8:0]: phbTxeErrorLog0 = 0000000000000000
[ 345.071159107,3] PHB#0030[8:0]: phbTxeErrorLog1 = 0000000000000000
[ 345.071206324,3] PHB#0030[8:0]: phbRxeArbErrorStatus = 0000000000000000
[ 345.071261659,3] PHB#0030[8:0]: phbRxeArbFrstErrorStatus = 0000000000000000
n the last hour [ 345.071306657,3] PHB#0030[8:0]: phbRxeArbErrorLog0 = 0000000000000000
[ 345.071357111,3] PHB#0030[8:0]: phbRxeArbErrorLog1 = 0000000000000000
[ 345.071405231,3] PHB#0030[8:0]: phbRxeMrgErrorStatus = 0000000000000000
[ 345.071452755,3] PHB#0030[8:0]: phbRxeMrgFrstErrorStatus = 0000000000000000
[ 345.071499236,3] PHB#0030[8:0]: phbRxeMrgErrorLog0 = 0000000000000000
[ 345.071543098,3] PHB#0030[8:0]: phbRxeMrgErrorLog1 = 0000000000000000
[ 345.071582643,3] PHB#0030[8:0]: phbRxeTceErrorStatus = 2000000000000000
[ 345.071623042,3] PHB#0030[8:0]: phbRxeTceFrstErrorStatus = 2000000000000000
[ 345.071697314,3] PHB#0030[8:0]: phbRxeTceErrorLog0 = c0000000000001fa
[ 345.071745637,3] PHB#0030[8:0]: phbRxeTceErrorLog1 = 0000000000000000
[ 345.071791896,3] PHB#0030[8:0]: phbPblErrorStatus = 0000000000020000
[ 345.071835205,3] PHB#0030[8:0]: phbPblFirstErrorStatus = 0000000000020000
[ 345.071878565,3] PHB#0030[8:0]: phbPblErrorLog0 = 0000000000000000
[ 345.071923843,3] PHB#0030[8:0]: phbPblErrorLog1 = 0000000000000000
[ 345.071972070,3] PHB#0030[8:0]: phbPcieDlpErrorLog1 = 0000000000000000
[ 345.072031244,3] PHB#0030[8:0]: phbPcieDlpErrorLog2 = 0000000000000000
[ 345.072077340,3] PHB#0030[8:0]: phbPcieDlpErrorStatus = 0000000000000000
[ 345.072133718,3] PHB#0030[8:0]: phbRegbErrorStatus = 0000004000000000
[ 345.072184534,3] PHB#0030[8:0]: phbRegbFirstErrorStatus = 0000004000000000
[ 345.072228261,3] PHB#0030[8:0]: phbRegbErrorLog0 = 8800000c00000000
[ 345.072277480,3] PHB#0030[8:0]: phbRegbErrorLog1 = 0000000007011000
[ 345.072324956,3] PHB#0030[8:0]: PEST[506] = 8300b03800000000 8000000000000000
[ 345.072384578,3] PHB#0030[8:0]: PEST[507] = 8300b03800000000 8000000000000000
[ 345.072447509,3] PHB#0030[8:0]: PEST[511] = 3740002a01000000 0000000000000000
and will be permanently disabled after 5 failures.
[ 140.102044] EEH: Notify device drivers to shutdown
[ 140.102046] EEH: Beginning: 'error_detected(IO frozen)'
[ 140.102050] PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen)
[ 140.102055] nvme nvme1: frozen state error detected, reset controller
EEH: PE location: N/A, PHB location: N/A
EEH: Frozen PHB#30-PE#1fa detected
EEH: Call Trace:
EEH: [00000000ffe66fe6] __eeh_send_failure_event+0xa4/0x180
EEH: [00000000cde11bd8] eeh_dev_check_failure+0x3d8/0x740
EEH: [0000000063d788bb] nvme_timeout+0x288/0x750 [nvme]
EEH: [0000000043ae3de7] blk_mq_handle_expired+0x98/0xf0
EEH: [0000000018e27476] bt_iter+0xec/0x120
EEH: [00000000ffb65dd3] blk_mq_queue_tag_busy_iter+0x414/0xa60
EEH: [0000000024de88c5] blk_mq_timeout_work+0x1c8/0x230
EEH: [000000003e6b6b37] process_one_work+0x1f0/0x520
EEH: [00000000f4e3d3a4] worker_thread+0x33c/0x510
EEH: [00000000ee3ba07d] kthread+0x150/0x160
EEH: [000000009f50efe6] start_kernel_thread+0x14/0x18
EEH: This PCI device has failed 1 times in the last hour and will be permanently disabled after 5 failures.
EEH: Notify device drivers to shutdown
EEH: Beginning: 'error_detected(IO frozen)'
PCI 0030:0e:00.0#01fa: EEH: Invoking nvme->error_detected(IO frozen)
nvme nvme1: frozen state error detected, reset controller
[ 140.242588] nvme1n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
[ 140.243142] I/O error, dev nvme1n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
nvme1n1: I/O Cmd(0x2) @ LBA 1875384832, 128 blocks, I/O Error (sct 0x3 / sc 0x71)
I/O error, dev nvme1n1, sector 1875384832 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
[ 140.271706] nvme nvme1: Failed to get ANA log: -4
nvme nvme1: Failed to get ANA log: -4
[ 140.291182] PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'need reset'
[ 140.291190] EEH: Finished:'error_detected(IO frozen)' with aggregate recovery state:'need reset'
[ 140.292301] EEH: Collect temporary log
[ 140.292840] EEH: of node=0030:0e:00.0
[ 140.293359] EEH: PCI device/vendor: a808144d
[ 140.293884] EEH: PCI cmd/status register: 00100142
[ 140.294394] EEH: PCI-E capabilities and status follow:
[ 140.294912] EEH: PCI-E 00: 0002b010 10648fc1 00002830 00437043
[ 140.295437] EEH: PCI-E 10: 10430000 00000000 00000000 00000000
[ 140.295950] EEH: PCI-E 20: 00000000
[ 140.296471] EEH: PCI-E AER capability register set follows:
[ 140.297001] EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
[ 140.297533] EEH: PCI-E AER 10: 00000000 0000e000 000003e0 00000000
[ 140.298071] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
[ 140.298591] EEH: PCI-E AER 30: 00000000 00000000
[ 140.299113] PHB4 PHB#48 Diag-data (Version: 1)
[ 140.299636] brdgCtl: 00000002
[ 140.300158] RootSts: 00060020 00402000 a0830008 00100107 00000800
[ 140.300700] PhbSts: 0000001c00000000 0000001c00000000
[ 140.301319] Lem: 0000000100000080 0000000000000000 0000000000000080
PCI 0030:0e:00.0#01fa: EEH: nvme driver reports: 'need reset'[ 140.302109] PhbErr: 0000028000000000 0000020000000000 2148000098000240 a008400000000000
[ 140.302114] RxeTceErr: 2000000000000000 2000000000000000 c0000000000001fa 0000000000000000
[ 140.302118] PblErr: 0000000000020000 0000000000020000 0000000000000000 0000000000000000
[ 140.302121] RegbErr: 0000004000000000 0000004000000000 8800000c00000000 0000000007011000
[ 140.302129] PE[1fa] A/B: 8300b03800000000 8000000000000000
[ 140.302133] PE[..1fb] A/B: as above
[ 140.302135] EEH: Reset without hotplug activity
Thanks,
Dan
More information about the Linuxppc-dev
mailing list