Wait Queue bug triggered on EST SBC8260

diekema_jon diekema at bucks.si.com
Tue May 23 02:43:14 EST 2000


>From Dan Malek:
May 20, 00 12:47:56 AM -0400
Re: EST SBC8260 Linux memory mapping rules

> diekema_jon wrote:

> > I have loaded the zImage bits at the link address on the SBC8260,
> > and that works just fine.

> Well, you must have some pretty damn magical tools, because that
> certainly will not work based upon the way the code is written.
> What do consider the "link address" and "works"?


The works definition would be able to run /bin/sash.


zvmlinux is being linked at 0x00400000, and its entry point
is also at this same address.

dell 121} powerpc-linux-nm arch/ppc/mbxboot/zvmlinux | grep ' start$'
00400000 T start

dell 108} powerpc-linux-objdump -h arch/ppc/mbxboot/zvmlinux
arch/ppc/mbxboot/zvmlinux:     file format elf32-powerpc

Sections:
Idx Name          Size      VMA       LMA       File off  Algn
  0 .text         000044d4  00400000  00400000  00010000  2**2
                  CONTENTS, ALLOC, LOAD, READONLY, CODE
  1 .rodata       00000470  004044e0  004044e0  000144e0  2**4
                  CONTENTS, ALLOC, LOAD, READONLY, DATA
  2 .data         0000030c  00405000  00405000  00015000  2**2
                  CONTENTS, ALLOC, LOAD, DATA
  3 .data.init    00000000  00406000  00406000  0008ce71  2**0
                  CONTENTS
  4 .bss          00005270  00406000  00406000  00016000  2**2
                  ALLOC
  5 .gzimage      00071c01  0040b270  0040b270  0001b270  2**0
                  CONTENTS, ALLOC, LOAD, READONLY, DATA

We are using the vxWorks boot rom on the EST SBC8260 board, and it
understandsa ELF files.  This boot rom is loading zvmlinux at the
address is was linked at.  Here is an example:


                            VxWorks System Boot

Copyright 1984-1998  Wind River Systems, Inc.

CPU: EST Corp. est8260 -- MPC8260 PowerQUICC II SBC
Version: 5.4
BSP version: 1.2/3
Creation date: Apr 19 2000, 10:24:59

Press any key to stop auto-boot...

Attached TCP/IP interface to motfcc0.
Subnet Mask: 0xff000000
Attaching network interface lo0... done.
Loading... 45680 + 465921
Starting at 0x400000...

loaded at:     00400000 0040B270
board data at: 00FFFFC0 00FFFFE4
relocated to:  00200100 00200124
zimage at:     0040B270 0047CE71
avail ram:     0047D000 01000000

Linux/PPC load: root=/dev/nfs rw nfsroot=126.28.1.117:/target nfsaddrs=126.1.4.5:126.28.1.117::255.0.0.0
Uncompressing Linux...done.
Now booting the kernel
Total memory = 16MB; using 0kB for hash table (at 00000000)
Linux version 2.3.99-pre9 (diekema at dell) (gcc version 2.95.2 19991024 (release)) #45 Sat May 20 21:08:00 EDT 2000
Boot arguments:  root=/dev/nfs rw nfsroot=126.28.1.117:/target nfsaddrs=126.1.4.5:126.28.1.117::255.0.0.0
On node 0 totalpages: 4096
zone(0): 4096 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Calibrating delay loop... 164.66 BogoMIPS
Memory: 14736k available (860k kernel code, 416k data, 48k init) [c0000000,c1000000]
Dentry-cache hash table entries: 2048 (order: 2, 16384 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
kmem_create: Poisoning requested, but con given - bdev_cache
Inode-cache hash table entries: 1024 (order: 1, 8192 bytes)
kmem_create: Poisoning requested, but con given - inode_cache
POSIX conformance testing by UNIFIX
Linux NET4.0 for Linux 2.3
Based upon Swansea University Computer Society NET3.039
kmem_create: Poisoning requested, but con given - skbuff_head_cache
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 1024 bind 1024)
Starting kswapd v1.6
CPM UART driver version 0.01
ttyS00 at 0x0000 is a SMC
ttyS01 at 0x0040 is a SMC
ttyS02 at 0x8100 is a SCC
ttyS03 at 0x8200 is a SCC
pty: 256 Unix98 ptys configured
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: registered device at major 7
loop: enabling 8 loop devices
eth0: SCC ENET Version 0.1, 00:a0:1e:01:04:05
kmem_create: Forcing size word alignment - nfs_fh
Looking up port of RPC 100003/2 on 126.28.1.117
Looking up port of RPC 100005/2 on 126.28.1.117
VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 48k init
bad magic 0 (should be c01fb2e0, creator 0), wq bug, forcing oops.
kernel BUG at sched.c:656!
NIP: C000FB5C XER: 00000000 LR: C000FB5C REGS: c01adc00 TRAP: 0700
MSR: 00081032 EE: 0 PR: 0 FP: 0 ME: 1 IR/DR: 11
TASK = c01ac000[1] 'init' Last syscall: 6
last math 00000000 last altivec 00000000
GPR00: C000FB5C C01ADCB0 C01AC000 0000001B 00001032 C010EF80 C01FB260 C0128502
GPR08: 0000001B C0110000 F00000B8 C01ADBF0 24444028 1001EEB4 00000000 00000000
GPR16: 00000000 00000000 00000000 00000000 00009032 C01DA060 00000000 00000000
GPR24: 00000021 00000001 C01DA060 C010A3E0 C0125000 C0110000 C01FB2D4 C01ADCB0
Call backtrace:
C000FB5C C008CB8C C007F560 C007FE8C C0033CE4 C0033D80 C0032818
C00328CC C00328FC C00048F0 10005548 10005A20 0FF09E78 00000000
Kernel panic: Exception in kernel pc c000fb5c signal 4
Rebooting in 180 seconds..


The root partition gets mounted via NFS, but we die with a scheduling
related problem.

dell 138} ./backtrace < z
0xc000fb5c -- 0xc000fad4 + 0x0088   __wake_up
0xc008cb8c -- 0xc008c8bc + 0x02d0   rs_8xx_close
0xc007f560 -- 0xc007f30c + 0x0254   release_dev
0xc007fe8c -- 0xc007fe78 + 0x0014   tty_release
0xc0033ce4 -- 0xc0033c9c + 0x0048   __fput
0xc0033d80 -- 0xc0033d60 + 0x0020   _fput
0xc0032818 -- 0xc0032784 + 0x0094   filp_close
0xc00328cc -- 0xc0032830 + 0x009c   do_close
0xc00328fc -- 0xc00328e8 + 0x0014   sys_close
0xc00048f0 -- 0xc00048f0 + 0x0000   ret_from_syscall_1
0x10005548 -- 0xc0125d84 + 0x4fedf7c4   packet_proto_init
0x10005a20 -- 0xc0125d84 + 0x4fedfc9c   packet_proto_init
0x0ff09e78 -- 0xc0125d84 + 0x4fde40f4   packet_proto_init
0x00000000 -- 0xc0125d84 + 0x3feda27c   packet_proto_init


dell 106} search '*.[hcsS]' | xargs  grep 'wq bug'
./include/linux/wait.h: printk("wq bug, forcing oops.\n"); \

"wq bug" is used in the WQ_BUG macro

#define WQ_BUG() do { \
        printk("wq bug, forcing oops.\n"); \
        BUG(); \
} while (0)

The WQ_BUG is used int the CHECK_MAGIC_WQHEAD macro.

#define CHECK_MAGIC_WQHEAD(x) do { \
        if (x->__magic != (long)&(x->__magic)) { \
                printk("bad magic %lx (should be %lx, creator %lx), ", \
                        x->__magic, (long)&(x->__magic), x->__creator); \
                WQ_BUG(); \
        } \
} while (0)



>From kernel/sched.c:

static inline void __wake_up_common(wait_queue_head_t *q, unsigned int mode, con
st int sync)
{
        struct list_head *tmp, *head;
        struct task_struct *p;
        unsigned long flags;

        if (!q)
                goto out;

        wq_write_lock_irqsave(&q->lock, flags);

#if WAITQUEUE_DEBUG
        CHECK_MAGIC_WQHEAD(q);  <<<<<<<<<<<<<<<-- Magic numbers are wrong!!!
#endif

        head = &q->task_list;
#if WAITQUEUE_DEBUG
        if (!head->next || !head->prev)
                WQ_BUG();
#endif
        list_for_each(tmp, head) {
                unsigned int state;
                wait_queue_t *curr = list_entry(tmp, wait_queue_t, task_list);

#if WAITQUEUE_DEBUG
                CHECK_MAGIC(curr->__magic);
#endif
                p = curr->task;
                state = p->state;
                if (state & (mode & ~TASK_EXCLUSIVE)) {
#if WAITQUEUE_DEBUG
                        curr->__waker = (long)__builtin_return_address(0);
#endif
                        if (sync)
                                wake_up_process_synchronous(p);
                        else
                                wake_up_process(p);
                        if (state & mode & TASK_EXCLUSIVE)
                                break;
                }
        }
        wq_write_unlock_irqrestore(&q->lock, flags);
out:
        return;
}


The last message before we die is "Freeing unused kernel memory: 48k init".
This is generated from the free_initmem() routine in arch/ppc/mm/init.c.
free_initmem() gets call from init() in init/main.c.

static int init(void * unused)
{
        lock_kernel();
        do_basic_setup();

        /*
         * Ok, we have completed the initial bootup, and
         * we're essentially up and running. Get rid of the
         * initmem segments and start the user-mode stuff..
         */
        free_initmem();  <<<<<<<<<<<<<<<-- We go this far w/o probems
        unlock_kernel();

        if (open("/dev/console", O_RDWR, 0) < 0)
                printk("Warning: unable to open an initial console.\n");

        (void) dup(0);
        (void) dup(0);

        /*
         * We try each of these until one succeeds.
         *
         * The Bourne shell can be used instead of init if we are
         * trying to recover a really broken machine.
         */

        if (execute_command)
                execve(execute_command,argv_init,envp_init);
        execve("/sbin/init",argv_init,envp_init);
        execve("/etc/init",argv_init,envp_init);
        execve("/bin/init",argv_init,envp_init);
        execve("/bin/sh",argv_init,envp_init);
        panic("No init found.  Try passing init= option to kernel.");
}


Does anybody have any hints on how I might try to debug this problem?
Options that I have thought about:

- Boot sash instead of init

Ok, I have modifified the boot params to include init=/bin/sash.
I am able to run /bin/sash, but init is giving me grief.

Note: The root file system is from the MontaVista Hard Hat Linux
      version 1.1.

./ppc_8xx/RPMS/hhl-ppc_8xx-sysvinit-2.77-6.noarch.rpm

Attached TCP/IP interface to motfcc0.
Subnet Mask: 0xff000000
Attaching network interface lo0... done.
Loading... 45680 + 465921
Starting at 0x400000...

loaded at:     00400000 0040B270
board data at: 00FFFFC0 00FFFFE4
relocated to:  00200100 00200124
zimage at:     0040B270 0047CE71
avail ram:     0047D000 01000000

Linux/PPC load: root=/dev/nfs rw nfsroot=126.28.1.117:/target nfsaddrs=126.1.4.5:126.28.1.117::255.0.0.0 init=/bin/sash
Uncompressing Linux...done.
Now booting the kernel
Total memory = 16MB; using 0kB for hash table (at 00000000)
Linux version 2.3.99-pre9 (diekema at dell) (gcc version 2.95.2 19991024 (release)) #45 Sat May 20 21:08:00 EDT 2000
Boot arguments: root=/dev/nfs rw nfsroot=126.28.1.117:/target nfsaddrs=126.1.4.5:126.28.1.117::255.0.0.0 init=/bin/sash
On node 0 totalpages: 4096
zone(0): 4096 pages.
zone(1): 0 pages.
zone(2): 0 pages.
Calibrating delay loop... 164.66 BogoMIPS
Memory: 14736k available (860k kernel code, 416k data, 48k init) [c0000000,c1000000]
Dentry-cache hash table entries: 2048 (order: 2, 16384 bytes)
Buffer-cache hash table entries: 1024 (order: 0, 4096 bytes)
Page-cache hash table entries: 4096 (order: 2, 16384 bytes)
kmem_create: Poisoning requested, but con given - bdev_cache
Inode-cache hash table entries: 1024 (order: 1, 8192 bytes)
kmem_create: Poisoning requested, but con given - inode_cache
POSIX conformance testing by UNIFIX
Linux NET4.0 for Linux 2.3
Based upon Swansea University Computer Society NET3.039
kmem_create: Poisoning requested, but con given - skbuff_head_cache
NET4: Unix domain sockets 1.0/SMP for Linux NET4.0.
NET4: Linux TCP/IP 1.0 for NET4.0
IP Protocols: ICMP, UDP, TCP
IP: routing cache hash table of 512 buckets, 4Kbytes
TCP: Hash tables configured (established 1024 bind 1024)
Starting kswapd v1.6
CPM UART driver version 0.01
ttyS00 at 0x0000 is a SMC
ttyS01 at 0x0040 is a SMC
ttyS02 at 0x8100 is a SCC
ttyS03 at 0x8200 is a SCC
pty: 256 Unix98 ptys configured
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
loop: registered device at major 7
loop: enabling 8 loop devices
eth0: SCC ENET Version 0.1, 00:a0:1e:01:04:05
kmem_create: Forcing size word alignment - nfs_fh
Looking up port of RPC 100003/2 on 126.28.1.117
Looking up port of RPC 100005/2 on 126.28.1.117
VFS: Mounted root (nfs filesystem).
Freeing unused kernel memory: 48k init
Stand-alone shell (version 3.4)
> /etc/rc*
+ /sbin/ifconfig lo 127.0.0.1
+
+ mount /proc
+ ifconfig -a
eth0      Link encap:Ethernet  HWaddr 00:A0:1E:01:04:05
          inet addr:126.1.4.5  Bcast:126.255.255.255  Mask:255.0.0.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1227 errors:0 dropped:0 overruns:0 frame:0
          TX packets:490 errors:0 dropped:0 overruns:0 carrier:0
          collisions:4 txqueuelen:100
          Base address:0x8000

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:3904  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0

+ mount -a
+ mount -o rsize=8192,wsize=8192,rw,remount /

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/





More information about the Linuxppc-embedded mailing list