[powerpc] Lockups seen during/just after boot (bisected)

Sachin Sant sachinp at linux.ibm.com
Thu Nov 23 22:27:56 AEDT 2023


While booting recent -next kernel on IBM Power server, I have observed lockups
either during boot or just after.

[ 3631.015775] watchdog: CPU 3 self-detected hard LOCKUP @ __update_freelist_slow+0x74/0x90
[ 3631.015783] watchdog: CPU 3 TB:7766577908812231, last heartbeat TB:7766572528409444 (10508ms ago)
[ 3631.015784] Modules linked in: rpadlpar_io(E) rpaphp(E) xsk_diag(E) nft_fib_inet(E) nft_fib_ipv4(E) nft_fib_ipv6(E) nft_fib(E) nft_reject_inet(E) nf_reject_ipv4(E) nf_reject_ipv6(E) nft_reject(E) nft_ct(E) nft_chain_nat(E) nf_nat(E) nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) bonding(E) tls(E) rfkill(E) ip_set(E) nf_tables(E) nfnetlink(E) sunrpc(E) binfmt_misc(E) pseries_rng(E) aes_gcm_p10_crypto(E) drm(E) drm_panel_orientation_quirks(E) xfs(E) libcrc32c(E) sd_mod(E) sr_mod(E) t10_pi(E) crc64_rocksoft_generic(E) cdrom(E) crc64_rocksoft(E) crc64(E) sg(E) ibmvscsi(E) scsi_transport_srp(E) ibmveth(E) vmx_crypto(E) fuse(E)
[ 3631.015811] CPU: 3 PID: 167427 Comm: sed Kdump: loaded Tainted: G E 6.7.0-rc2-next-20231122 #1
[ 3631.015813] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.20 (NH1030_058) hv:phyp pSeries
[ 3631.015814] NIP: c000000000561f34 LR: c00000000056b108 CTR: c0000000004f4c50
[ 3631.015816] REGS: c000000e87743d60 TRAP: 0900 Tainted: G E (6.7.0-rc2-next-20231122)
[ 3631.015817] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 42042222 XER: 20040000
[ 3631.015822] CFAR: 0000000000000000 IRQMASK: 1 
[ 3631.015822] GPR00: c000000000096968 c000000e87b43ca0 c000000001522100 c00c00000001d700 
[ 3631.015822] GPR04: c0000000075f2000 0000000000200009 c0000000075f0000 0000000000200008 
[ 3631.015822] GPR08: 0000000000001000 0000000000000001 003ffff800000a41 c0000000b189d000 
[ 3631.015822] GPR12: c0000000004f4c50 c000000effffcb00 0000000000000000 0000000000000000 
[ 3631.015822] GPR16: 0000000000000001 c000000002b82a80 0000000000000000 0000000000000000 
[ 3631.015822] GPR20: c000000003016c00 0000000000000000 0000000000210d00 c000000e81273978 
[ 3631.015822] GPR24: 0000000000000000 0000000000000001 c0000000075f0000 c0000000075f0000 
[ 3631.015822] GPR28: c000000e81273900 0000000000200008 c00c00000001d700 c0000000075f2000 
[ 3631.015840] NIP [c000000000561f34] __update_freelist_slow+0x74/0x90
[ 3631.015842] LR [c00000000056b108] __slab_free+0x138/0x4a0
[ 3631.015845] Call Trace:
[ 3631.015845] [c000000e87b43ca0] [c00c00000001d700] 0xc00c00000001d700 (unreliable)
[ 3631.015849] [c000000e87b43d80] [c000000000096968] __tlb_remove_table+0xe8/0x150
[ 3631.015853] [c000000e87b43db0] [c0000000004f4cac] tlb_remove_table_rcu+0x5c/0xa0
[ 3631.015856] [c000000e87b43de0] [c000000000243314] rcu_do_batch+0x234/0x680
[ 3631.015859] [c000000e87b43e90] [c000000000247a80] rcu_core+0x170/0x2d0
[ 3631.015862] [c000000e87b43ee0] [c00000000102054c] __do_softirq+0x15c/0x3c0
[ 3631.015866] [c000000e87b43fe0] [c0000000000182d0] do_softirq_own_stack+0x40/0x60
[ 3631.015869] [c0000000672f7610] [c000000000170668] __irq_exit_rcu+0x128/0x150
[ 3631.015872] [c0000000672f7640] [c0000000001711a0] irq_exit+0x20/0x40
[ 3631.015874] [c0000000672f7660] [c00000000002bb58] timer_interrupt+0x128/0x310
[ 3631.015876] [c0000000672f76c0] [c000000000009ffc] decrementer_common_virt+0x28c/0x290
[ 3631.015879] --- interrupt: 900 at smp_call_function_many_cond+0x1d4/0x6a0
[ 3631.015883] NIP: c000000000298a34 LR: c0000000002989e4 CTR: c0000000000c9f50
[ 3631.015884] REGS: c0000000672f76f0 TRAP: 0900 Tainted: G E (6.7.0-rc2-next-20231122)
[ 3631.015885] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 44042822 XER: 20040000
[ 3631.015888] CFAR: 0000000000000000 IRQMASK: 0 
[ 3631.015888] GPR00: c000000000298eb0 c0000000672f7990 c000000001522100 0000000000000012 
[ 3631.015888] GPR04: 0000000000000012 0000000000000000 0000000000000000 0000000000000012 
[ 3631.015888] GPR08: c000000002b92058 0000000000000001 c000000e81bdc060 c0000000070d0c88 
[ 3631.015888] GPR12: c0000000000c9f50 c000000effffcb00 0000000000000000 0000000000000000 
[ 3631.015888] GPR16: 0000000000000000 0000000000000003 0000000000000000 c000000008270ff0 
[ 3631.015888] GPR20: c000000002b961f8 c0000000000a8270 60000000000000e0 c0000000070d0680 
[ 3631.015888] GPR24: 0000000000000000 0000000000000090 c000000e81273d88 c0000000070d0680 
[ 3631.015888] GPR28: c000000e81273d80 c0000000000a9470 c000000e81273d88 c000000002b968b0 
[ 3631.015906] NIP [c000000000298a34] smp_call_function_many_cond+0x1d4/0x6a0
[ 3631.015908] LR [c0000000002989e4] smp_call_function_many_cond+0x184/0x6a0
[ 3631.015911] --- interrupt: 900
[ 3631.015912] [c0000000672f7990] [c000000000298eb0] smp_call_function_many_cond+0x650/0x6a0 (unreliable)
[ 3631.015915] [c0000000672f7a50] [c0000000000a8270] flush_type_needed+0x1d0/0x260
[ 3631.015917] [c0000000672f7a90] [c0000000000a94ec] radix__flush_tlb_page_psize+0x5c/0x300
[ 3631.015919] [c0000000672f7b00] [c0000000004fd7f4] ptep_clear_flush+0xa4/0x160
[ 3631.015921] [c0000000672f7b50] [c0000000004d9218] wp_page_copy+0x348/0xa40
[ 3631.015924] [c0000000672f7c00] [c0000000004e55b0] __handle_mm_fault+0x470/0x8a0
[ 3631.015927] [c0000000672f7d10] [c0000000004e5af4] handle_mm_fault+0x114/0x3b0
[ 3631.015929] [c0000000672f7d60] [c0000000000900ac] ___do_page_fault+0x3ec/0x8c0
[ 3631.015931] [c0000000672f7e20] [c000000000090670] do_page_fault+0x30/0xc0
[ 3631.015933] [c0000000672f7e50] [c000000000008be0] data_access_common_virt+0x210/0x220
[ 3631.015935] --- interrupt: 300 at 0x7fff98f81c18
[ 3631.015936] NIP: 00007fff98f81c18 LR: 00007fff98f81c08 CTR: 00007fff98c53380
[ 3631.015937] REGS: c0000000672f7e80 TRAP: 0300 Tainted: G E (6.7.0-rc2-next-20231122)
[ 3631.015938] MSR: 800000000280f033 <SF,VEC,VSX,EE,PR,FP,ME,IR,DR,RI,LE> CR: 24002422 XER: 20040000
[ 3631.015943] CFAR: 00007fff98f81b0c DAR: 00007fff98fa0000 DSISR: 0a000000 IRQMASK: 0 
[ 3631.015943] GPR00: 00007fff98ff7d44 00007fffe2816a10 00007fff98fa7f00 00007fff98fa0000 
[ 3631.015943] GPR04: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
[ 3631.015943] GPR08: 0000000000000000 0000000000000001 0000000000000001 0000000000002000 
[ 3631.015943] GPR12: 00007fff98c53380 00007fff9905bdc0 0000000000000000 0000000000000000 
[ 3631.015943] GPR16: 0000000000000000 0000000000000000 0000000000000000 00007fff98f9fb30 
[ 3631.015943] GPR20: 00007fffe2816aa8 00007fffe2816a80 00007fffe2816a50 00007fff99050000 
[ 3631.015943] GPR24: 0000000000000000 00007fff9904f668 0000000000000000 00007fff99050988 
[ 3631.015943] GPR28: 00007fff99052040 0000000000000000 00007fff99050000 00007fffe2816a50 
[ 3631.015958] NIP [00007fff98f81c18] 0x7fff98f81c18
[ 3631.015959] LR [00007fff98f81c08] 0x7fff98f81c08
[ 3631.015960] --- interrupt: 300
[ 3631.015960] Code: 60000000 60000000 60000000 e9230028 7c292800 4082ffd4 39400001 f8c30020 f8e30028 4bffffc4 60000000 7c40003c <60000000> e9230000 71290001 4082fff0

Git bisect points to following patch

commit c8d312e039030edab25836a326bcaeb2a3d4db14
    slub: Delay freezing of partial slabs

Bisect log:

git bisect start
# status: waiting for both good and bad commits
# bad: [288736c822de7fd3b69be317c11eaa8dfb78bf6f] Add linux-next specific files for 20231122
git bisect bad 288736c822de7fd3b69be317c11eaa8dfb78bf6f
# status: waiting for good commit(s), bad commit known
# good: [98b1cc82c4affc16f5598d4fa14b1858671b2263] Linux 6.7-rc2
git bisect good 98b1cc82c4affc16f5598d4fa14b1858671b2263
# good: [9540131d5721e24c00b118ce852c761285515b26] Merge branch 'main' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git
git bisect good 9540131d5721e24c00b118ce852c761285515b26
# good: [99444230e9595fc7050292ce284003d7e7d4b53e] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect good 99444230e9595fc7050292ce284003d7e7d4b53e
# good: [a95502fad0ab45767b263f46719bba3885e9597c] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/westeri/thunderbolt.git
git bisect good a95502fad0ab45767b263f46719bba3885e9597c
# good: [5317edbc82dbfac690f2ff720667291ecf6ccee0] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/andy/linux-gpio-intel.git
git bisect good 5317edbc82dbfac690f2ff720667291ecf6ccee0
# good: [602bf18307981f3bfd9ebf19921791a4256d3fd1] Merge branch 'for-6.7' into for-next
git bisect good 602bf18307981f3bfd9ebf19921791a4256d3fd1
# good: [d5bf2252dd17fa9fa87206862500884e9a342c9b] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/pinctrl/samsung.git
git bisect good d5bf2252dd17fa9fa87206862500884e9a342c9b
# good: [48b10dee6a7a26cc695bf4932f091d423b94b429] Merge branch 'zstd-next' of https://github.com/terrelln/linux.git
git bisect good 48b10dee6a7a26cc695bf4932f091d423b94b429
# good: [f6a4a72703c035882d7595198ce83d021b6b1c96] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/krisman/unicode.git
git bisect good f6a4a72703c035882d7595198ce83d021b6b1c96
# bad: [dd374e220ba492f95344a638b1efe5b2744fdd73] slub: Update frozen slabs documentations in the source
git bisect bad dd374e220ba492f95344a638b1efe5b2744fdd73
# good: [a3058965bb35490454953aa2c87ea51004839f2f] slub: Prepare __slab_free() for unfrozen partial slab out of node partial list
git bisect good a3058965bb35490454953aa2c87ea51004839f2f
# bad: [c8d312e039030edab25836a326bcaeb2a3d4db14] slub: Delay freezing of partial slabs
git bisect bad c8d312e039030edab25836a326bcaeb2a3d4db14
# good: [00b15a19ee543f0117cb217fcbab8b7b3fd50677] slub: Introduce freeze_slab()
git bisect good 00b15a19ee543f0117cb217fcbab8b7b3fd50677
# first bad commit: [c8d312e039030edab25836a326bcaeb2a3d4db14] slub: Delay freezing of partial slabs

- Sachin



More information about the Linuxppc-dev mailing list