[Skiboot] [PATCH] external/mambo: Disable MEMORY_OVERFLOW

Michael Ellerman mpe at ellerman.id.au
Mon Jun 29 11:27:09 AEST 2020


Gustavo Romero <gromero at linux.vnet.ibm.com> writes:
> Hi Michael,
>
> On 6/25/20 8:56 AM, Michael Ellerman wrote:
>> Mambo has a strange feature called MEMORY_OVERFLOW, enabled by
>> default, which causes some accesses to non-existent memory addresses
>> to transparently "create" memory.
>> 
>> This can be confusing when debugging, eg:
>> 
>>    systemsim % mysim cpu 0 display spr pc
>>    0xC0000000000246B8
>>    systemsim % mysim memory display 0xC0000000000246B8 8
>>    0x0000000000000000
>> 
>> Appears to show that the memory at pc (NIP) is currently zeroes.
>> 
>> The astute observer will note that "mysim memory display" takes
>> physical addresses, not effective addresses. So unless this machine
>> has > 12XB of RAM, this access should have failed as there is no
>> memory at that address.
>> 
>> Turning MEMORY_OVERFLOW off gives us a much more sensible result:
>> 
>>    systemsim % mysim memory display 0xC0000000000246B8 8
>>    Illegal Address 0xC0000000000246B8
>> 
>> It doesn't appear to have any effect on accesses done from Linux, with
>> the setting enabled or disabled we still get a machine check for bad
>> accesses in real mode:
>
> With that change applied, on mambo P10 running on a POWER8 I'm getting
> the following mambo exception that forbids the kernel to continue booting:

This looks like exactly the kind of thing we want to catch, so that's
"good" :)

> [...]
> 142233280: (536372251): [    0.001554] printk: bootconsole [udbg0] disabled
> 142387801: (537126772): [    0.001870] mempolicy: Enabling automatic NUMA balancing. Configure with numa_balancing= or the kernel.numa_balancing sysctl
> 142412629: (537151600): [    0.001919] pid_max: default: 32768 minimum: 301
> 142549799: (537888770): [    0.002187] Mount-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
> 142570582: (537909553): [    0.002228] Mountpoint-cache hash table entries: 16384 (order: 1, 131072 bytes, linear)
> 143544723: (541883694): [    0.004130] EEH: PowerNV platform initialized
> 143557364: (541896335): [    0.004155] POWER9 performance monitor hardware support registered
> 143629574: (541968545): [    0.004296] rcu: Hierarchical SRCU implementation.
> 143904253: (543143224): [    0.004833] smp: Bringing up secondary CPUs ...
> WARNING: 145271326: (548660031): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
> FATAL ERROR: 145271326: (548660031): Attempt to store non-existent address 0x00000E995A3AF7B0
> INFO: 145271326: (548660032): ** Execution stopped: Mambo Error,  **
> 145271326: ** finished running 548660032 instructions **

Can you see where the bad store came from, "bt" should give you a backtrace.

Then "p pc" should give you the PC and "di <that value>" should show us
what instruction it was.

> systemsim % c
> 145785082: (550674532): [    0.008506] smp: Brought up 2 nodes, 4 CPUs
> 145798101: (550687551): [    0.008527] numa: Node 0 CPUs: 0-1
> 145810514: (550799964): [    0.008551] numa: Node 1 CPUs: 2-3

Does it still happen with a single CPU?

> 145820419: (550809869): [    0.008575] Using standard scheduler topology
> 159140153: (604064242): [    0.034590] node 0 initialised, 118050 pages in 10ms
> 159167293: (604195562): [    0.034643] pgdatinit0 (31) used greatest stack depth: 13408 bytes left
> 160000000: [0:0:0]: (PC:0xC0000000001C164C) :      7.1 Mega-Inst/Sec :      7.1 Mega-Cycles/Sec [17 Zaps  0 PA-Zaps] *ON*  [0:0:0] pri=4 extra=0
> 160000000: [0:0:1]: (PC:0xC0000000003A162C) :      7.1 Mega-Inst/Sec *ON*  [0:0:1] pri=4 extra=0
> 160000000: [0:1:0]: (PC:0xC00000000019AA58) :      7.1 Mega-Inst/Sec *ON*  [0:1:0] pri=2 extra=0
> 160000000: [0:1:1]: (PC:0xC00000000019A5F4) :      7.1 Mega-Inst/Sec *ON*  [0:1:1] pri=2 extra=0
> 160134492: (608065254): [    0.036532] devtmpfs: initialized
> 163550958: (621787960): [    0.043205] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
> 163574637: (621811639): [    0.043251] futex hash table entries: 1024 (order: 1, 131072 bytes, linear)
> 164292187: (624696519): [    0.044653] NET: Registered protocol family 16
> 164770384: (626692640): [    0.045587] audit: initializing netlink subsys (disabled)
> 165130649: (628049925): [    0.046290] audit: type=2000 audit(1024007219.020:1): state=initialized audit_enabled=0 res=1
> 165359745: (629069307): [    0.046738] cpuidle: using governor menu
> 165482259: (629491821): [    0.046977] pstore: Registered nvram as persistent store backend
> 167333983: (636947373): [    0.050594] PCI: Probing PCI hardware
> 168184808: (640240704): [    0.052255] cpuidle-powernv: Default stop: psscr = 0x0000000000000300,mask=0x00000000003003ff
> 168204280: (640260428): [    0.052293] cpuidle-powernv: Deepest stop: psscr = 0x0000000000300322,mask=0x00000000003003ff
> 168223015: (640279240): [    0.052330] cpuidle-powernv: First stop level that may lose SPRs = 0x10
> 168238094: (640294710): [    0.052359] cpuidle-powernv: First stop level that may lose timebase = 0x10
> 175060241: (647321380): [    0.065684] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
> 175077028: (647338167): [    0.065717] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
> 177879965: (651314704): [    0.071191] iommu: Default domain type: Translated
> 177936270: (651386119): [    0.071302] vgaarb: loaded
> 178219574: (651669423): [    0.071855] SCSI subsystem initialized
> 178331796: (651781645): [    0.072074] usbcore: registered new interface driver usbfs
> 178364558: (651814407): [    0.072138] usbcore: registered new interface driver hub
> 178397036: (651981802): [    0.072201] usbcore: registered new device driver usb
> 179169296: (653522982): [    0.073710] clocksource: Switched to clocksource timebase
> WARNING: 179516247: (653875633): Write_Mapped_Memory_Reg: Unknown address: 0x00000E995A3AF7B0, length=8
> FATAL ERROR: 179516247: (653875633): Attempt to store non-existent address 0x00000E995A3AF7B0
> INFO: 179516247: (653875634): ** Execution stopped: Mambo Error,  **
> 179516247: ** finished running 653875634 instructions **
> systemsim %
>
> I don't understand yet what's happening. Gleen does not hit it either
> (he is running mambo on a POWER9 tho). Which mambo / host combination
> have you tried it?

I tested Power8 / Power9 emulated on x86-64 and ppc64le.

cheers


More information about the Skiboot mailing list