crash in kmem_cache_init

Wed Jan 23 09:12:06 EST 2008

On 1/22/08, Olaf Hering <olaf at aepfle.de> wrote:
> On Tue, Jan 22, Mel Gorman wrote:
>
> > http://www.csn.ul.ie/~mel/postings/slab-20080122/partial-revert-slab-changes.patch
> > .. Can you please check on your machine if it fixes your problem?
>
> It does not fix or change the nature of the crash.
>
> > Olaf, please confirm whether you need the patch below as well as the
> > revert to make your machine boot.
>
> It crashes now in a different way if the patch below is applied:

Was this with the revert Mel mentioned applied as well? I get the
feeling both patches are needed to fix up the memoryless SLAB issue.

> Linux version 2.6.24-rc8-ppc64 (olaf at lingonberry) (gcc version 4.1.2 20070115 (prerelease) (SUSE Linux)) #43 SMP Tue Jan 22 22:39:05 CET 2008

<snip>

> early_node_map[1] active PFN ranges
>     1:        0 ->   892928

<snip>

> Unable to handle kernel paging request for data at address 0x00000058
> Faulting instruction address: 0xc0000000000fe018
> cpu 0x0: Vector: 300 (Data Access) at [c00000000075bac0]
>     pc: c0000000000fe018: .setup_cpu_cache+0x184/0x1f4
>     lr: c0000000000fdfa8: .setup_cpu_cache+0x114/0x1f4
>     sp: c00000000075bd40
>    msr: 8000000000009032
>    dar: 58
>  dsisr: 42000000
>   current = 0xc000000000665a50
>   paca    = 0xc000000000666380
>     pid   = 0, comm = swapper
> enter ? for help
> [c00000000075bd40] c0000000000fb368 .kmem_cache_create+0x3c0/0x478 (unreliable)
> [c00000000075be20] c0000000005e6780 .kmem_cache_init+0x284/0x4f4
> [c00000000075bee0] c0000000005bf8ec .start_kernel+0x2f8/0x3fc
> [c00000000075bf90] c000000000008590 .start_here_common+0x60/0xd0
> 0:mon>
>
> 0xc0000000000fe018 is in setup_cpu_cache (/home/olaf/kernel/git/linux-2.6-numa/mm/slab.c:2111).
> 2106                                    BUG_ON(!cachep->nodelists[node]);
> 2107                                    kmem_list3_init(cachep->nodelists[node]);

I might be barking up the wrong tree, but this block above is supposed
to set up the cachep->nodeslists[*] that are used immediately below.
But if the loop wasn't changed from N_NORMAL_MEMORY to N_ONLINE or
whatever, you might get a bad access right below for node 0 that has
no memory, if that's the node we're running on...

> 2108                            }
> 2109                    }
> 2110            }
> 2111            cachep->nodelists[numa_node_id()]->next_reap =
> 2112                            jiffies + REAPTIMEOUT_LIST3 +
> 2113                            ((unsigned long)cachep) % REAPTIMEOUT_LIST3;
> 2114
> 2115            cpu_cache_get(cachep)->avail = 0;

Thanks,
Nish