Next May 11 : BUG during scsi initialization

Matthew Wilcox matthew at wil.cx
Mon May 11 21:52:33 EST 2009


On Mon, May 11, 2009 at 05:16:10PM +0530, Sachin Sant wrote:
> Today's Next tree failed to boot on a Power6 box with following BUG :

This doesn't actually appear to be a SCSI bug ... it looks like SCSI tried
to allocate memory and things went wrong in the memory allocator:

[c0000000c7d038b0] [c0000000005d67d8] ._spin_lock+0x10/0x24
[c0000000c7d03920] [c00000000013fbdc] .__slab_alloc_page+0x344/0x3cc
[c0000000c7d039e0] [c000000000141168] .kmem_cache_alloc+0x13c/0x21c
[c0000000c7d03aa0] [c000000000141b04] .kmem_cache_create+0x294/0x2a8
[c0000000c7d03b90] [d000000000ea14cc] .scsi_init_queue+0x38/0x170 [scsi_mod]

Which memory allocator did you have selected (SLAB, SLUB, SLOB, SLQB)?

> BUG: spinlock bad magic on CPU#1, modprobe/63
> Unable to handle kernel paging request for data at address 0xffffc994838
> Faulting instruction address: 0xc00000000035f5a8
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 DEBUG_PAGEALLOC NUMA pSeries
> Modules linked in: scsi_mod(+)
> NIP: c00000000035f5a8 LR: c00000000035f58c CTR: 0000000000136f8c
> REGS: c0000000c7d03500 TRAP: 0300   Not tainted  (2.6.30-rc5-autotest-next-20090511)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28222484  XER: 0000000f
> DAR: 00000ffffc994838, DSISR: 0000000040000000
> TASK = c0000000c7cf0a80[63] 'modprobe' THREAD: c0000000c7d00000 CPU: 1
> GPR00: c00000000035f58c c0000000c7d03780 c000000000aaeed8 
> 0000000000000031 GPR04: 0000000000000000 00000000585cf4e0 
> 0000000000673580 80000000565a6cc0 GPR08: 0000000000000000 
> c0000000009ebf50 0000000000000000 c0000000009ebf38 GPR12: 
> 0000000028222482 c000000000b82600 0000000000000000 0000000000000000  
> GPR16: 0000000000000000 0000000000000000 0000000000000000 
> 0000000000000000 GPR20: 0000000000000018 ffffffffffffffff 
> c0000000009bbe40 0000000000000010 GPR24: 0000000000210d00 
> c0000000c6caff80 c0000000dfc732a0 c000000000f61380 GPR28: 
> c0000000007c8350 c0000000008a4280 c000000000a2f928 00000ffffc994550 NIP 
> [c00000000035f5a8] .spin_bug+0x90/0xd4
> LR [c00000000035f58c] .spin_bug+0x74/0xd4
> Call Trace:
> [c0000000c7d03780] [c00000000035f58c] .spin_bug+0x74/0xd4 (unreliable)
> [c0000000c7d03810] [c00000000035f890] ._raw_spin_lock+0x48/0x184
> [c0000000c7d038b0] [c0000000005d67d8] ._spin_lock+0x10/0x24
> [c0000000c7d03920] [c00000000013fbdc] .__slab_alloc_page+0x344/0x3cc
> [c0000000c7d039e0] [c000000000141168] .kmem_cache_alloc+0x13c/0x21c
> [c0000000c7d03aa0] [c000000000141b04] .kmem_cache_create+0x294/0x2a8
> [c0000000c7d03b90] [d000000000ea14cc] .scsi_init_queue+0x38/0x170 [scsi_mod]
> [c0000000c7d03c20] [d000000000ea13c8] .init_scsi+0x1c/0xe8 [scsi_mod]
> [c0000000c7d03ca0] [c0000000000092c0] .do_one_initcall+0x80/0x19c
> [c0000000c7d03d90] [c0000000000c0540] .SyS_init_module+0xe0/0x244
> [c0000000c7d03e30] [c000000000008534] syscall_exit+0x0/0x40
> Instruction dump:
> 7f84e378 e87e8020 38c604d0 e8e902ea 4827fced 60000000 2fbf0000 80bd0004  
> 409e0010 e8de8028 38e0ffff 4800000c <e8ff02ea> 38df04d0 7fa4eb78 811d0008 
> ---[ end trace f725820a6fa9dbb7 ]---
> /init: line 21:    63 Segmentation fault      modprobe $file
>
> I have attached the dmesg log here. Let me know if any other information  
> is required.
>
> Thanks
> -Sachin
>
> -- 
>
> ---------------------------------
> Sachin Sant
> IBM Linux Technology Center
> India Systems and Technology Labs
> Bangalore, India
> ---------------------------------
>

> Using 007bc904 bytes for initrd buffer
> Please wait, loading kernel...
> Allocated 01100000 bytes for kernel @ 02300000
>    Elf64 kernel loaded...
> Loading ramdisk...
> ramdisk loaded 007bc904 @ 03400000
> OF stdout device is: /vdevice/vty at 30000000
> Preparing to boot Linux version 2.6.30-rc5-autotest-next-20090511 (root at mpower6lp5) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Mon May 11 16:01:58 IST 2009
> Calling ibm,client-architecture... done
> command line: root=/dev/sda3 sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M IDENT=1242039097 
> memory layout at init:
>   alloc_bottom : 0000000003bc0000
>   alloc_top    : 0000000008000000
>   alloc_top_hi : 0000000008000000
>   rmo_top      : 0000000008000000
>   ram_top      : 0000000008000000
> instantiating rtas at 0x00000000074e0000... done
> boot cpu hw idx 0000000000000000
> copying OF device tree...
> Building dt strings...
> Building dt structure...
> Device tree strings 0x0000000003bd0000 -> 0x0000000003bd15c2
> Device tree struct  0x0000000003be0000 -> 0x0000000003c00000
> Calling quiesce...
> returning from prom_init
> Crash kernel location must be 0x2000000
> Reserving 256MB of memory at 32MB for crashkernel (System RAM: 4096MB)
> Phyp-dump disabled at boot time
> Using pSeries machine description
> Using 1TB segments
> Found initrd at 0xc000000003400000:0xc000000003bbc904
> console [udbg0] enabled
> Partition configured for 2 cpus.
> CPU maps initialized for 2 threads per core
> Starting Linux PPC64 #1 SMP Mon May 11 16:01:58 IST 2009
> -----------------------------------------------------
> ppc64_pft_size                = 0x1a
> physicalMemorySize            = 0x100000000
> htab_hash_mask                = 0x7ffff
> -----------------------------------------------------
> Initializing cgroup subsys cpuset
> Initializing cgroup subsys cpu
> Linux version 2.6.30-rc5-autotest-next-20090511 (root at mpower6lp5) (gcc version 4.3.2 [gcc-4_3-branch revision 141291] (SUSE Linux) ) #1 SMP Mon May 11 16:01:58 IST 2009
> [boot]0012 Setup Arch
> EEH: No capable adapters found
> PPC64 nvram contains 15360 bytes
> Zone PFN ranges:
>   DMA      0x00000000 -> 0x00010000
>   Normal   0x00010000 -> 0x00010000
> Movable zone start PFN for each node
> early_node_map[2] active PFN ranges
>     2: 0x00000000 -> 0x0000e000
>     3: 0x0000e000 -> 0x00010000
> Could not find start_pfn for node 0
> [boot]0015 Setup Done
> Built 3 zonelists in Node order, mobility grouping on.  Total pages: 65472
> Policy zone: DMA
> Kernel command line: root=/dev/sda3 sysrq=1 insmod=sym53c8xx insmod=ipr crashkernel=512M-:256M IDENT=1242039097 
> Experimental hierarchical RCU implementation.
> RCU-based detection of stalled CPUs is enabled.
> Experimental hierarchical RCU init done.
> NR_IRQS:512
> [boot]0020 XICS Init
> [boot]0021 XICS Done
> PID hash table entries: 4096 (order: 12, 32768 bytes)
> clocksource: timebase mult[7d0000] shift[22] registered
> Console: colour dummy device 80x25
> console handover: boot [udbg0] -> real [hvc0]
> allocated 2621440 bytes of page_cgroup
> please try cgroup_disable=memory option if you don't want
> freeing bootmem node 2
> freeing bootmem node 3
> Memory: 3881920k/4194304k available (8896k kernel code, 312384k reserved, 2048k data, 4287k bss, 448k init)
> Calibrating delay loop... 1022.36 BogoMIPS (lpj=5111808)
> Security Framework initialized
> SELinux:  Disabled at boot.
> Dentry cache hash table entries: 524288 (order: 6, 4194304 bytes)
> Inode-cache hash table entries: 262144 (order: 5, 2097152 bytes)
> Mount-cache hash table entries: 4096
> Initializing cgroup subsys ns
> Initializing cgroup subsys cpuacct
> Initializing cgroup subsys memory
> Initializing cgroup subsys devices
> Initializing cgroup subsys freezer
> Processor 1 found.
> Brought up 2 CPUs
> net_namespace: 1936 bytes
> NET: Registered protocol family 16
> IBM eBus Device Driver
> PCI: Probing PCI hardware
> bio: create slab <bio-0> at 0
> usbcore: registered new interface driver usbfs
> usbcore: registered new interface driver hub
> usbcore: registered new device driver usb
> Failed to register trace events module notifier
> NET: Registered protocol family 2
> IP route cache hash table entries: 32768 (order: 2, 262144 bytes)
> TCP established hash table entries: 131072 (order: 5, 2097152 bytes)
> TCP bind hash table entries: 65536 (order: 5, 2097152 bytes)
> TCP: Hash tables configured (established 131072 bind 65536)
> TCP reno registered
> NET: Registered protocol family 1
> Unpacking initramfs...
> IOMMU table initialized, virtual merging enabled
> audit: initializing netlink socket (disabled)
> type=2000 audit(1242039145.533:1): initialized
> Kprobe smoke test started
> Kprobe smoke test passed successfully
> HugeTLB registered 16 MB page size, pre-allocated 0 pages
> HugeTLB registered 16 GB page size, pre-allocated 0 pages
> VFS: Disk quotas dquot_6.5.2
> Dquot-cache hash table entries: 8192 (order 0, 65536 bytes)
> Btrfs loaded
> msgmni has been set to 7580
> alg: No test for stdrng (krng)
> Block layer SCSI generic (bsg) driver version 0.4 loaded (major 254)
> io scheduler noop registered
> io scheduler anticipatory registered
> io scheduler deadline registered
> io scheduler cfq registered (default)
> pci_hotplug: PCI Hot Plug PCI Core version: 0.5
> rpaphp: RPA HOT Plug PCI Controller Driver version: 0.1
> Generic RTC Driver v1.07
> Serial: 8250/16550 driver, 4 ports, IRQ sharing disabled
> pmac_zilog: 0.6 (Benjamin Herrenschmidt <benh at kernel.crashing.org>)
> input: Macintosh mouse button emulation as /devices/virtual/input/input0
> Uniform Multi-Platform E-IDE driver
> ide-gd driver 1.18
> IBM eHEA ethernet device driver (Release EHEA_0101)
> ehea: eth0: Jumbo frames are disabled
> ehea: eth0 -> logical port id #2
> ehci_hcd: USB 2.0 'Enhanced' Host Controller (EHCI) Driver
> ohci_hcd: USB 1.1 'Open' Host Controller (OHCI) Driver
> mice: PS/2 mouse device common for all mice
> EDAC MC: Ver: 2.1.0 May 11 2009
> usbcore: registered new interface driver hiddev
> usbcore: registered new interface driver usbhid
> usbhid: v2.6:USB HID core driver
> TCP cubic registered
> NET: Registered protocol family 15
> registered taskstats version 1
> Freeing unused kernel memory: 448k freed
> doing fast boot
> SysRq : Changing Loglevel
> Loglevel set to 1
> BUG: spinlock bad magic on CPU#1, modprobe/63
> Unable to handle kernel paging request for data at address 0xffffc994838
> Faulting instruction address: 0xc00000000035f5a8
> Oops: Kernel access of bad area, sig: 11 [#1]
> SMP NR_CPUS=1024 DEBUG_PAGEALLOC NUMA pSeries
> Modules linked in: scsi_mod(+)
> NIP: c00000000035f5a8 LR: c00000000035f58c CTR: 0000000000136f8c
> REGS: c0000000c7d03500 TRAP: 0300   Not tainted  (2.6.30-rc5-autotest-next-20090511)
> MSR: 8000000000009032 <EE,ME,IR,DR>  CR: 28222484  XER: 0000000f
> DAR: 00000ffffc994838, DSISR: 0000000040000000
> TASK = c0000000c7cf0a80[63] 'modprobe' THREAD: c0000000c7d00000 CPU: 1
> GPR00: c00000000035f58c c0000000c7d03780 c000000000aaeed8 0000000000000031 
> GPR04: 0000000000000000 00000000585cf4e0 0000000000673580 80000000565a6cc0 
> GPR08: 0000000000000000 c0000000009ebf50 0000000000000000 c0000000009ebf38 
> GPR12: 0000000028222482 c000000000b82600 0000000000000000 0000000000000000 
> GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 
> GPR20: 0000000000000018 ffffffffffffffff c0000000009bbe40 0000000000000010 
> GPR24: 0000000000210d00 c0000000c6caff80 c0000000dfc732a0 c000000000f61380 
> GPR28: c0000000007c8350 c0000000008a4280 c000000000a2f928 00000ffffc994550 
> NIP [c00000000035f5a8] .spin_bug+0x90/0xd4
> LR [c00000000035f58c] .spin_bug+0x74/0xd4
> Call Trace:
> [c0000000c7d03780] [c00000000035f58c] .spin_bug+0x74/0xd4 (unreliable)
> [c0000000c7d03810] [c00000000035f890] ._raw_spin_lock+0x48/0x184
> [c0000000c7d038b0] [c0000000005d67d8] ._spin_lock+0x10/0x24
> [c0000000c7d03920] [c00000000013fbdc] .__slab_alloc_page+0x344/0x3cc
> [c0000000c7d039e0] [c000000000141168] .kmem_cache_alloc+0x13c/0x21c
> [c0000000c7d03aa0] [c000000000141b04] .kmem_cache_create+0x294/0x2a8
> [c0000000c7d03b90] [d000000000ea14cc] .scsi_init_queue+0x38/0x170 [scsi_mod]
> [c0000000c7d03c20] [d000000000ea13c8] .init_scsi+0x1c/0xe8 [scsi_mod]
> [c0000000c7d03ca0] [c0000000000092c0] .do_one_initcall+0x80/0x19c
> [c0000000c7d03d90] [c0000000000c0540] .SyS_init_module+0xe0/0x244
> [c0000000c7d03e30] [c000000000008534] syscall_exit+0x0/0x40
> Instruction dump:
> 7f84e378 e87e8020 38c604d0 e8e902ea 4827fced 60000000 2fbf0000 80bd0004 
> 409e0010 e8de8028 38e0ffff 4800000c <e8ff02ea> 38df04d0 7fa4eb78 811d0008 
> ---[ end trace f725820a6fa9dbb7 ]---
> /init: line 21:    63 Segmentation fault      modprobe $file
> Creating device nodes with udev
> udevd version 128 started
> 
> 
> After the udevadm settle timeout, the events queue contains:
> 
> 632: /devices/vio/30000007
> 
> 
> Boot logging started on /dev/hvc0(/dev/console) at Mon May 11 10:52:56 2009
> 
> 
> After the udevadm settle timeout, the events queue contains:
> 
> 632: /devices/vio/30000007
> 
> 
> 
> 
> After the udevadm settle timeout, the events queue contains:
> 
> 632: /devices/vio/30000007
> 
> 
> 
> 
> After the udevadm settle timeout, the events queue contains:
> 
> 632: /devices/vio/30000007
> 
> 
> Waiting for device /dev/sda3 to appear: ..............................Could not find /dev/sda3.
> Want me to fall back to /dev/sda3? (Y/n) 
> 


-- 
Matthew Wilcox				Intel Open Source Technology Centre
"Bill, look, we understand that you're interested in selling us this
operating system, but compare it to ours.  We can't possibly take such
a retrograde step."



More information about the Linuxppc-dev mailing list