ppc64 oops..
Linus Torvalds
torvalds at osdl.org
Tue Nov 15 16:27:20 EST 2005
On Tue, 15 Nov 2005, Paul Mackerras wrote:
>
> How much RAM do you have? That address is in the I/O hole (from 2G to
> 4G).
Hmm. I _thought_ I had just 2GB (possibly 4GB) in this machine, but the
bootup says
...
[boot]0100 MM Init
IO Hole assumed to be 80000000 -> ffffffff
[boot]0100 MM Init Done
Linux version 2.6.15-rc1-g4060994c (torvalds at g5.osdl.org) (gcc version 4.0.1 200..
[boot]0012 Setup Arch
Top of RAM: 0x180000000, Total RAM: 0x100000000
Memory hole size: 2048MB
...
On node 0 totalpages: 1572864
DMA zone: 1572864 pages, LIFO batch:64
DMA32 zone: 0 pages, LIFO batch:2
Normal zone: 0 pages, LIFO batch:2
HighMem zone: 0 pages, LIFO batch:2
(I'm now running a newer kernel that has a DMA32 zone, I wasn't running
that when the oops happened).
Which looks like it thinks I have 6GB. That's what "free" thinks too.
Cool. I just got 4GB extra memory without even opening the machine!
Magic kernel.
And I just found out how I can instantly crash the kernel again:
int main(int argc, char **argv)
{
char * buf = malloc(1024*1024*1024);
memset(buf, 0, 1024*1024*1024);
sleep(100);
}
I run two of those programs, and on the second one I get an oops again:
Unable to handle kernel paging request for data at address 0xc0000000ff000000
Faulting instruction address: 0xc000000000030800
Oops: Kernel access of bad area, sig: 11 [#1]
SMP NR_CPUS=2 NUMA POWERMAC
Modules linked in: autofs
NIP: C000000000030800 LR: C0000000000971F0 CTR: 0000000000000020
REGS: c0000001023a38d0 TRAP: 0300 Not tainted (2.6.15-rc1-g4060994c)
MSR: 9000000000009032 <EE,ME,IR,DR> CR: 88000448 XER: 00000000
DAR: C0000000FF000000, DSISR: 0000000042010000
TASK = c00000015af957c0[19554] 'a.out' THREAD: c0000001023a0000 CPU: 1
GPR00: 0000000000000080 C0000001023A3B50 C0000000006C8EF0 C0000000FF000000
GPR04: 00000000BADB9000 C0000000040E6000 C0000000005B6C00 9000000000009032
GPR08: C00000017BFB8A00 C0000000006CAD30 C0000000006CDCA0 0000000000000020
GPR12: 0000000088000442 C0000000005B6C00 0000000000000000 000000001016D918
GPR16: 00000000100D0000 0000000000000000 00000000100D0000 0000000000000000
GPR20: C00000007EC566B0 C00000017BC13980 C00000016F836590 00000000BADB9000
GPR24: 0000000002000000 0000000000000000 0000000000000DC8 C0000000040E6000
GPR28: C000000006D08DC8 C000000006D08000 C0000000005D2EB8 0000000000000000
NIP [C000000000030800] .clear_user_page+0x10/0x60
LR [C0000000000971F0] .__handle_mm_fault+0xda0/0xf10
Call Trace:
[C0000001023A3B50] [C000000000097184] .__handle_mm_fault+0xd34/0xf10 (unreliable)
[C0000001023A3C60] [C000000000496D3C] .do_page_fault+0x4ec/0x7f0
[C0000001023A3E30] [C000000000004760] .handle_page_fault+0x20/0x54
Instruction dump:
4d820020 7c0018a8 7c004878 7c0019ad 40c2fff4 4e800020 60000000 60000000
e922a810 8169000c 80090004 7d6903a6 <7c001fec> 7c630214 4320fff8 e922a808
ie it seems to have set up the mem_map[] to point all the way down from
6GB to 0, and then when I've used up the two high GB of memory (the _real_
memory in this machine) it starts allocating memory that it doesn't have,
and that it doesn't have TLB mappings for.
> > (There are other reports of VM-induced problems on -rc1, this is probably
> > not ppc64-related).
>
> Looks that way to me...
No, looks like a ppc64 memory setup bug, altough it's quite possibly
brought on by the PageReserved() removal in the VM layer.
Andrew, Nick, Hugh, I really think that removing that "PageReserved()"
test from the page freeing functions was a mistake. I think I'm going to
add it back in.
I bet this happens on all the other architectures too. The bootup has
marked pages reserved, and then frees them all. It used to be that the VM
just silently skipped the reserved pages, now it will add them to the free
lists..
Linus
More information about the Linuxppc64-dev
mailing list