[Skiboot] [PATCH] external/mambo: Disable MEMORY_OVERFLOW

Thu Jul 2 17:03:22 AEST 2020

Gustavo Romero <gromero at linux.vnet.ibm.com> writes:
> On 6/28/20 10:27 PM, Michael Ellerman wrote:
>> Gustavo Romero <gromero at linux.vnet.ibm.com> writes:
>>> On 6/25/20 8:56 AM, Michael Ellerman wrote:
>>>> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
>>>> default, which causes some accesses to non-existent memory addresses
>>>> to transparently "create" memory.
>>>>
>>>> This can be confusing when debugging, eg:
>>>>
>>>>     systemsim % mysim cpu 0 display spr pc
>>>>     0xC0000000000246B8
>>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>>     0x0000000000000000
>>>>
>>>> Appears to show that the memory at pc (NIP) is currently zeroes.
>>>>
>>>> The astute observer will note that "mysim memory display" takes
>>>> physical addresses, not effective addresses. So unless this machine
>>>> has > 12XB of RAM, this access should have failed as there is no
>>>> memory at that address.
>>>>
>>>> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
>>>>
>>>>     systemsim % mysim memory display 0xC0000000000246B8 8
>>>>     Illegal Address 0xC0000000000246B8
>>>>
>>>> It doesn't appear to have any effect on accesses done from Linux, with
>>>> the setting enabled or disabled we still get a machine check for bad
>>>> accesses in real mode:
>>>
>>> With that change applied, on mambo P10 running on a POWER8 I'm getting
>>> the following mambo exception that forbids the kernel to continue booting:
>> 
>> This looks like exactly the kind of thing we want to catch, so that's
>> "good" :)
>> 
>>> [...]
>>> 142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
>>> 142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
>>> 142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
>>> 142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>>> 142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
>>> 143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
>>> 143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
>>> 143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
>>> 143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
>>> WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
>>> FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
>>> INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
>>> 145271326: ** finished running 548660032 instructions **
>> 
>> Can you see where the bad store came from, "bt" should give you a backtrace.
>> 
>> Then "p pc" should give you the PC and "di <that value>" should show us
>> what instruction it was.
>
> PC itself is parked, believe it or not, at a 'nop' instruction:

Hmm, maybe we're looking at the wrong CPU/thread?

> 151630615: (157638813): [    0.040390] pstore: Registered nvram as persistent store backend
> 152359208: (158398639): [    0.041813] PCI: Probing PCI hardware
> 152666145: (158718713): [    0.042412] audit: type=2000 audit(1024007219.010:1): state=initialized audit_enabled=0 res=1
> 153055303: (159124575): [    0.043172] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
> 153074775: (159144887): [    0.043210] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
> 153093510: (159164435): [    0.043247] cpuidle-powernv: First stop level that may lose SPRs = 0x10
> 153108589: (159180151): [    0.043276] cpuidle-powernv: First stop level that may lose timebase = 0x10
> WARNING: 156265663: (162472569): Write_Mapped_Memory_Reg: Unknown address: 0x00000BEE18C96D60, length=8
> FATAL ERROR: 156265663: (162472569): Attempt to store non-existent address 0x00000BEE18C96D60
> INFO: 156265663: (162472570): ** Execution stopped: Mambo Error,  **
> 156265663: ** finished running 162472570 instructions **

I notice this doesn't tell us what CPU caused the bad access.

> systemsim % p pc
> 0xC0000000003A2EC4

"p" implicitly shows CPU 0 unless you've used the "target" command.

> systemsim % di 0xC0000000003A2EC4
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC0 RADDR:0x003A2EC0 Enc:0x2C4A007C : dcbt    r0,r9,0
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC4 RADDR:0x003A2EC4 Enc:0x00000060 : nop
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
> EADDR:0xC0000000003A2EC8 RADDR:0x003A2EC8 Enc:0x00000060 : nop
> WARNING: 156265663: (162472570): Need to define a CPU
> WARNING: 156265663: (162472570): Need to define a CPU
>
> But just before the 'nop' there is a dcbt. But address passed to the dcbt,
> in GPR 9, doesn't contain anything close to the address displayed by mambo:
>
> systemsim % mysim cpu 0:0:0 display gpr 9
> 0x0000000000000240
> systemsim % mysim mcm 0 cpu 0 thread 0 dtranslate 0x240
> data address translation for 0x0000000000000240 failed
> systemsim %
>
> The dcbt instruction is in mm/slub.c; more context:
>
> 1241956         prefetch(object + s->offset);
> 1241957 c0000000003a2eb4:       20 00 3f 81     lwz     r9,32(r31)
> 1241958         if (unlikely(!x))
> 1241959 c0000000003a2eb8:       15 4a 3a 7d     add.    r9,r26,r9
> 1241960 c0000000003a2ebc:       08 00 82 41     beq     c0000000003a2ec4 <kmem_cache_alloc+0x114>
> 1241961         __asm__ __volatile__ ("dcbt 0,%0" : : "r" (x));
> 1241962 c0000000003a2ec0:       2c 4a 00 7c     dcbt    0,r9
> 1241963 c0000000003a2ec4:       00 00 00 60     nop
> 1241964 c0000000003a2ec8:       00 00 00 60     nop
>
> so probably from a prefetch_freepointer() in
> https://github.com/torvalds/linux/blob/master/mm/slub.c#L2815

But we shouldn't be prefetching 0x240, that's a userspace address. So I
suspect something has gone wrong with the debug here.

>>> systemsim % c
>>> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
>>> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
>>> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3
>> 
>> Does it still happen with a single CPU?
>
> It still happens with maxcpus=0 or =1. However if I disable radix passing
> disable_radix=1 to command line I'm able to boot.

Hmm OK.

> Today I've built upstream mambo and the same issue happens. I'm clueless
> yet what's happening... so if you have additional things to try let
> me know. It might be an issue with Mambo P10 running on P8.

Just to be clear it doesn't happen with mambo simulating P9 right?

cheers