Fragmented physical memory on powerpc/32

Sat Sep 10 19:39:20 AEST 2022

+ Adding Mike who might help if the problem is around memblock.

Le 08/09/2022 à 22:17, Pali Rohár a écrit :
> On Thursday 08 September 2022 17:35:11 Pali Rohár wrote:
>> On Thursday 08 September 2022 15:25:14 Christophe Leroy wrote:
>>> Le 08/08/2022 à 20:40, Pali Rohár a écrit :
>>>> On Friday 10 June 2022 00:24:20 Pali Rohár wrote:
>>>>> On Friday 20 May 2022 14:30:02 Pali Rohár wrote:
>>>>>> + linux-mm
>>>>>>
>>>>>> Do you know what are requirements for kernel to support non-contiguous
>>>>>> memory support and what is needed to enable it for 32-bit powerpc?
>>>>>
>>>>> Any hints?
>>>>
>>>> PING?
>>>>
>>>
>>> The tree following patches landed in powerpc/next branch, so they should
>>> soon be visible in linux-next too:
>>>
>>> fc06755e2562 ("powerpc/32: Drop a stale comment about reservation of
>>> gigantic pages")
>>> b0e0d68b1c52 ("powerpc/32: Allow fragmented physical memory")
>>> 0115953dcebe ("powerpc/32: Remove wii_memory_fixups()")
>>
>> Ou, nice! I will try to test it if it allows me to access more than 2GB
>> of RAM from 4GB DDR3 module with 32-bit addressing mode on P2020 CPU.
> 
> Hello! Ok, I have tried it from powerpc/next branch, but seems it does
> not work. I'm getting just early kernel crash.
> 
> [    0.000000] CPU maps initialized for 1 thread per core
> [    0.000000]  (thread shift is 0)
> [    0.000000] -----------------------------------------------------
> [    0.000000] phys_mem_size     = 0xbe500000
> [    0.000000] dcache_bsize      = 0x20
> [    0.000000] icache_bsize      = 0x20
> [    0.000000] cpu_features      = 0x0000000010010108
> [    0.000000]   possible        = 0x0000000010010108
> [    0.000000]   always          = 0x0000000010010108
> [    0.000000] cpu_user_features = 0x84e08000 0x08000000
> [    0.000000] mmu_features      = 0x00020010
> [    0.000000] -----------------------------------------------------
> mpc85xx_rdb_setup_arch()
> [    0.000000] ioremap() called early from of_iomap+0x48/0x80. Use early_ioremap() instead
> [    0.000000] MPC85xx RDB board from Freescale Semiconductor
> [    0.000000] barrier-nospec: using isync; sync as speculation barrier
> [    0.000000] barrier-nospec: patched 182 locations
> [    0.000000] Top of RAM: 0xff700000, Total RAM: 0xbe500000
> [    0.000000] Memory hole size: 1042MB
> [    0.000000] Zone ranges:
> [    0.000000]   Normal   [mem 0x0000000000000000-0x000000002fffffff]
> [    0.000000]   HighMem  [mem 0x0000000030000000-0x00000000ff6fffff]
> [    0.000000] Movable zone start for each node
> [    0.000000] Early memory node ranges
> [    0.000000]   node   0: [mem 0x0000000000000000-0x000000007fffffff]
> [    0.000000]   node   0: [mem 0x00000000c0200000-0x00000000eeffffff]
> [    0.000000]   node   0: [mem 0x00000000f0000000-0x00000000ff6fffff]
> [    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x00000000ff6fffff]
> [    0.000000] MMU: Allocated 1088 bytes of context maps for 255 contexts
> [    0.000000] percpu: Embedded 11 pages/cpu s14196 r8192 d22668 u45056
> [    0.000000] pcpu-alloc: s14196 r8192 d22668 u45056 alloc=11*4096
> [    0.000000] pcpu-alloc: [0] 0 [0] 1
> [    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 777792
> [    0.000000] Kernel command line: root=ubi0:rootfs rootfstype=ubifs ubi.mtd=rootfs,2048 rootflags=chk_data_crc rw console=ttyS0,115200
> [    0.000000] Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear)
> [    0.000000] Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear)
> [    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
> [    0.000000] Kernel attempted to read user page (7df58) - exploit attempt? (uid: 0)
> [    0.000000] BUG: Unable to handle kernel data access on read at 0x0007df58
> [    0.000000] Faulting instruction address: 0xc01c8348
> [    0.000000] Oops: Kernel access of bad area, sig: 11 [#1]
> [    0.000000] BE PAGE_SIZE=4K SMP NR_CPUS=2 P2020RDB-PC
> [    0.000000] Modules linked in:
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 6.0.0-rc2-0caacb197b677410bdac81bc34f05235+ #121
> [    0.000000] NIP:  c01c8348 LR: c01cb2bc CTR: 0000000a
> [    0.000000] REGS: c10d7e20 TRAP: 0300   Not tainted  (6.0.0-rc2-0caacb197b677410bdac81bc34f05235+)
> [    0.000000] MSR:  00021000 <CE,ME>  CR: 48044224  XER: 00000000
> [    0.000000] DEAR: 0007df58 ESR: 00000000
> [    0.000000] GPR00: c01cb294 c10d7f10 c1045340 00000001 00000004 c112bcc0 00000015 eedf1000
> [    0.000000] GPR08: 00000003 0007df58 00000000 f0000000 28044228 00000200 00000000 00000000
> [    0.000000] GPR16: 00000000 00000000 00000000 0275cb7a c0000000 00000001 0000075f 00000000
> [    0.000000] GPR24: c1031004 00000000 00000000 00000001 c10f0000 eedf1000 00080000 00080000
> [    0.000000] NIP [c01c8348] free_unref_page_prepare.part.93+0x48/0x60
> [    0.000000] LR [c01cb2bc] free_unref_page+0x84/0x4b8
> [    0.000000] Call Trace:
> [    0.000000] [c10d7f10] [eedf1000] 0xeedf1000 (unreliable)
> [    0.000000] [c10d7f20] [c01cb294] free_unref_page+0x5c/0x4b8
> [    0.000000] [c10d7f70] [c1007644] mem_init+0xd0/0x194
> [    0.000000] [c10d7fa0] [c1000e4c] start_kernel+0x4c0/0x6d0
> [    0.000000] [c10d7ff0] [c00003e0] set_ivor+0x13c/0x178
> [    0.000000] Instruction dump:
> [    0.000000] 552817be 5509103a 7d294214 55293830 7d4a4a14 812a003c 814a0038 5529002a
> [    0.000000] 7c892050 5484c23a 5489eafa 548406fe <7d2a482e> 7d242430 5484077e 90870010
> [    0.000000] ---[ end trace 0000000000000000 ]---
> [    0.000000]
> [    0.000000] Kernel panic - not syncing: Fatal exception
> [    0.000000] Rebooting in 1 seconds..
> [    0.000000] System Halted, OK to turn off power
> 
> 4GB DDR3 SODIMM module is set via Freescale LBC to the whole 4 GB
> address range. And on ranges:
> 0x0000_0000 - 0x7fff_ffff
> 0xc020_0000 - 0xeeff_ffff
> 0xf000_0000 - 0xff6f_ffff
> there is no peripheral device, they are free for DRAM. Between these
> physical ranges are mapped peripheral devices (PCIe and NOR).
> 
> Any idea if I'm doing something wrong or there can be a bug in memory code?
> 
> Quite suspicious is that "Initmem setup node 0" prints one range where
> are also peripherals, not just DRAM. Crash is on address 0xc01c8348
> which belongs to PCIe.
> 

Yes I also find that "Initmem setup node 0" suspicious.

However the crash address 0xc01c8348 is valid kernel address. That's a 
virtual address, not a physical address, so that's not PCIe. That's 
kernel linear mapping, so that's likely physical address 0x001c8348 
offseted by PAGE_OFFSET which is 0xc0000000.

Do you have a way to reproduce this problem under QEMU ?

Thanks
Christophe