ppc64el kernel bug?

Michael Ellerman mpe at ellerman.id.au
Thu Jan 6 15:03:34 AEDT 2022


Nathan Lynch <nathanl at linux.ibm.com> writes:
> Kip Warner <kip at thevertigo.com> writes:
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277591] BUG: Unable to handle kernel data access on write at 0x132b47d38499fd58
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277624] Faulting instruction address: 0xc0000000004d0434
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277636] Oops: Kernel access of bad area, sig: 11 [#150]
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277656] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA PowerNV
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277669] Modules linked in: veth nft_masq zfs(PO) zunicode(PO) zzstd(O) zlua(O) zcommon(PO) znvpair(PO) zavl(PO) icp(PO) spl(O) vhost_vsock vmw_vsock_virtio_transport_common vhost vhost_iotlb vsock xt_CHECKSUM nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_tcpudp nft_compat bridge stp llc nf_tables nfnetlink binfmt_misc dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua joydev input_leds ipmi_powernv mac_hid ipmi_devintf ipmi_msghandler ofpart cmdlinepart at24 powernv_flash mtd uio_pdrv_genirq opal_prd uio ibmpowernv vmx_crypto sch_fq_codel jc42 ip_tables x_tables autofs4 xfs btrfs blake2b_generic raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor hid_generic usbhid hid raid6_pq libcrc32c raid1 raid0 multipath linear nouveau ses enclosure scsi_transport_sas ast drm_vram_helper i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core drm crct10dif_vpmsum
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277776]  crc32c_vpmsum xhci_pci tg3 aacraid xhci_pci_renesas drm_panel_orientation_quirks
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277918] CPU: 26 PID: 144937 Comm: postgres Tainted: P      D    O      5.11.0-41-generic #45-Ubuntu
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277943] NIP: c0000000004d0434 LR: c0000000004d032c CTR: c0000000010a90e0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.277975] REGS: c000000056b9f6b0 TRAP: 0380   Tainted: P      D    O       (5.11.0-41-generic)
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278008] MSR: 9000000000009033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR: 88002281  XER: 0000008c
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] CFAR: c0000000004d041c IRQMASK: 0 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR00: c0000000004d032c c000000056b9f950 c000000002409a00 0000000000000000 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR04: 0000000000400cc0 0000000000000097 ffffffffffffffff c000000ffda9d0d0 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR08: 0000000ffbd90000 132b47d38499fce8 0000000000000070 d4ff277338704e25 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR12: 0000000000002000 c000000ffffd2c00 0000000000000000 c000000116c512d0 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR16: 0000000000000154 c000000116c51570 c000000056b9fc88 0000000000000154 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR24: c000000000ecccc0 0000000000000001 c0000000024588fc c000000000ec9954 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278050] GPR28: ffffffffffffffff c00000001d597e40 0000000000400cc0 c000000003018880 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278213] NIP [c0000000004d0434] kmem_cache_alloc_node+0x1d4/0x490
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278237] LR [c0000000004d032c] kmem_cache_alloc_node+0xcc/0x490
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278268] Call Trace:
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278283] [c000000056b9f950] [c0000000004d032c] kmem_cache_alloc_node+0xcc/0x490 (unreliable)
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278328] [c000000056b9f9c0] [c000000000ec9954] __alloc_skb+0x74/0x2d0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278369] [c000000056b9fa20] [c000000000ecccc0] alloc_skb_with_frags+0x70/0x2e0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278403] [c000000056b9faa0] [c000000000ec0f38] sock_alloc_send_pskb+0x1d8/0x200
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278436] [c000000056b9fb10] [c0000000010a93a8] unix_stream_sendmsg+0x2c8/0x710
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278471] [c000000056b9fc10] [c000000000eb64e0] sock_sendmsg+0x80/0xb0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278494] [c000000056b9fc40] [c000000000ebab88] __sys_sendto+0xf8/0x1a0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278526] [c000000056b9fd90] [c000000000ebaca0] sys_send+0x30/0x40
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278558] [c000000056b9fdb0] [c000000000036ffc] system_call_exception+0x10c/0x230
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278601] [c000000056b9fe10] [c00000000000d374] system_call_vectored_common+0xf4/0x26c
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278634] --- interrupt: 3000 at 0x7ec638a194f4
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278654] NIP: 00007ec638a194f4 LR: 0000000000000000 CTR: 0000000000000000
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278685] REGS: c000000056b9fe80 TRAP: 3000   Tainted: P      D    O       (5.11.0-41-generic)
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278719] MSR: 900000000280f033 <SF,HV,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE>  CR: 48008281 XER: 00000000
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] IRQMASK: 0 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR00: 000000000000014e 00007fffe99c1800 00007ec638a47f00 0000000000000009 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR04: 00000043809d1148 0000000000000154 0000000000000000 0000000000001ae8 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR08: 0000004362347d00 0000000000000000 0000000000000000 0000000000000000 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR12: 0000000000000000 00007ec6348e0890 0000000000000000 ffffffffffffffff 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR16: 0000000000000000 000000436233f7a0 0000000000000001 0000000000000000 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR20: 00007fffe99c18ac 0000004362344f48 0000000000000004 00007fffe99c18b0 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR24: 0000000006000001 0000000000000000 0000000000000154 00000043809d1148 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278766] GPR28: 0000000000000000 00007ec6348d9938 00000043809ceb00 000000000000000b 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.278992] NIP [00007ec638a194f4] 0x7ec638a194f4
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279020] LR [0000000000000000] 0x0
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279038] --- interrupt: 3000
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279054] Instruction dump:
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279072] f9210020 41820098 2e1cffff 3b200001 2c2a0000 41820088 41920010 894a0007 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279110] 7c1c5000 40820078 815f0028 e97f00b8 <7ce9502a> 7c095214 886d0988 9b2d0988 
>>    Dec 25 06:52:52 romulus-server kernel: [28835.279141] ---[ end trace fe7ee98d0b7beb6a ]---
>
> Perhaps slab corruption, but the 'D' taint flag (TAINT_DIE) means the
> kernel oopsed at least once before this. Probably best to look at that
> one first.

You also have the 'P' taint for a proprietary module loaded, so we
(upstream) can't really help with that, you're better off reporting to
your distro.

If it's easily reproducible you could boot with slub_debug=FZP and see
if that catches the slab corruption earlier, that might help us identify
the actual problem.

cheers


More information about the Linuxppc-dev mailing list