[6.1.0-rc3-next-20221104] Boot failure - kernel BUG at mm/memblock.c:519

Yajun Deng yajun.deng at linux.dev
Wed Nov 9 21:25:09 AEDT 2022


November 9, 2022 6:03 PM, "Yajun Deng" <yajun.deng at linux.dev> wrote:

> Hey Mike,
> 
Sorry, this email should be sent to Sachin but not Mike. 
Please forgive my confusion. So:

Hey Sachin,
Can you help me test the attached file? 
Please use this new patch instead of the one in memblock tree.

> Can you help me test the attached file? 
> Please use this new patch instead of the one in memblock tree.
> 
> November 8, 2022 3:55 PM, "Mike Rapoport" <rppt at linux.ibm.com> wrote:
> 
>> Hi Yajun,
>> 
>> On Tue, Nov 08, 2022 at 02:27:53AM +0000, Yajun Deng wrote:
>> 
>>> Hi Sachin,
>>> I didn't have a powerpc architecture machine. I don't know why this happened.
>>> 
>>> Hi Mike,
>>> Do you have any suggestions?
>> 
>> You can try reproducing the bug qemu or work with Sachin to debug the
>> issue.
>> 
>>> I tested in tools/testing/memblock, and it was successful.
>> 
>> Memblock tests provide limited coverage still and they don't deal with all
>> possible cases.
>> 
>> For now I'm dropping this patch from the memblock tree until the issue is
>> fixed.
>> 
>>> November 6, 2022 8:07 PM, "Sachin Sant" <sachinp at linux.ibm.com> wrote:
>>> 
>>> While booting recent linux-next on a IBM Power10 Server LPAR
>>> following crash is observed:
>>> 
>>> [ 0.000000] numa: Partition configured for 32 NUMA nodes.
>>> [ 0.000000] ------------[ cut here ]------------
>>> [ 0.000000] kernel BUG at mm/memblock.c:519!
>>> [ 0.000000] Oops: Exception in kernel mode, sig: 5 [#1]
>>> [ 0.000000] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>> [ 0.000000] Modules linked in:
>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.1.0-rc3-next-20221104 #1
>>> [ 0.000000] Hardware name: IBM,9080-HEX POWER10 (raw) 0x800200 0xf000006 of:IBM,FW1030.00
>>> (NH1030_026) hv:phyp pSeries
>>> [ 0.000000] NIP: c0000000004ba240 LR: c0000000004bb240 CTR: c0000000004ba210
>>> [ 0.000000] REGS: c000000002a8b7b0 TRAP: 0700 Not tainted (6.1.0-rc3-next-20221104)
>>> [ 0.000000] MSR: 8000000000021033 <SF,ME,IR,DR,RI,LE> CR: 24042424 XER: 00000001
>>> [ 0.000000] CFAR: c0000000004ba290 IRQMASK: 1
>>> [ 0.000000] GPR00: c0000000004bb240 c000000002a8ba50 c00000000136ee00 c0000010f3ac00a8
>>> [ 0.000000] GPR04: 0000000000000000 c0000010f3ac0090 00000010f3ac0000 0000000000000d00
>>> [ 0.000000] GPR08: 0000000000000001 0000000000000007 0000000000000001 0000000000000081
>>> [ 0.000000] GPR12: c0000000004ba210 c000000002e10000 0000000000000000 000000000000000d
>>> [ 0.000000] GPR16: 000000000f6be620 000000000f6be8e8 000000000f6be788 000000000f6bed58
>>> [ 0.000000] GPR20: 000000000f6f6d58 c0000000029a8de8 00000010f3ad8800 0000000000000080
>>> [ 0.000000] GPR24: 00000010f3ad7b00 0000000000000000 0000000000000100 0000000000000d00
>>> [ 0.000000] GPR28: 00000010f3ad7b00 c0000000029a8de8 c0000000029a8e00 0000000000000006
>>> [ 0.000000] NIP [c0000000004ba240] memblock_merge_regions.isra.12+0x40/0x130
>>> [ 0.000000] LR [c0000000004bb240] memblock_add_range+0x190/0x300
>>> [ 0.000000] Call Trace:
>>> [ 0.000000] [c000000002a8ba50] [0000000000000100] 0x100 (unreliable)
>>> [ 0.000000] [c000000002a8ba90] [c0000000004bb240] memblock_add_range+0x190/0x300
>>> [ 0.000000] [c000000002a8bb10] [c0000000004bb5e0] memblock_reserve+0x70/0xd0
>>> [ 0.000000] [c000000002a8bba0] [c000000002045234] memblock_alloc_range_nid+0x11c/0x1e8
>>> [ 0.000000] [c000000002a8bc60] [c0000000020453a4] memblock_alloc_internal+0xa4/0x110
>>> [ 0.000000] [c000000002a8bcb0] [c0000000020456cc] memblock_alloc_try_nid+0x94/0xcc
>>> [ 0.000000] [c000000002a8bd40] [c00000000200b570] alloc_paca_data+0x7c/0xcc
>>> [ 0.000000] [c000000002a8bdb0] [c00000000200b770] allocate_paca+0x8c/0x28c
>>> [ 0.000000] [c000000002a8be50] [c00000000200a26c] setup_arch+0x1c4/0x4d8
>>> [ 0.000000] [c000000002a8bed0] [c000000002004378] start_kernel+0xb4/0xa84
>>> [ 0.000000] [c000000002a8bf90] [c00000000000da90] start_here_common+0x1c/0x20
>>> [ 0.000000] Instruction dump:
>>> [ 0.000000] 7c0802a6 fba1ffe8 fbc1fff0 fbe1fff8 7c7d1b78 7c9e2378 3be00000 f8010010
>>> [ 0.000000] f821ffc1 e9230000 3969ffff 4800000c <0b0a0000> 7d3f4b78 393f0001 7fbf5840
>>> [ 0.000000] ---[ end trace 0000000000000000 ]---
>>> [ 0.000000]
>>> [ 0.000000] Kernel panic - not syncing: Fatal exception
>>> [ 0.000000] Rebooting in 180 seconds..
>>> 
>>> This problem was introduced with next-20221101. Git bisect points to
>>> following patch
>>> 
>>> commit 3f82c9c4ac377082e1230f5299e0ccce07b15e12
>>> Date: Tue Oct 25 15:09:43 2022 +0800
>>> memblock: don't run loop in memblock_add_range() twice
>>> 
>>> Reverting this patch helps boot the kernel to login prompt.
>>> 
>>> Have attached .config
>>> 
>>> - Sachin
>> 
>> --
>> Sincerely yours,
>> Mike.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-memblock-don-t-run-loop-in-memblock_add_range-twice-.patch
Type: application/octet-stream
Size: 4191 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20221109/9ed0deba/attachment.obj>


More information about the Linuxppc-dev mailing list