xsysace.c driver oops on reboot (PPC405 on Virtex-II pro)

Mon Jan 28 16:55:01 EST 2008

Hi,

I am using a Xilinx ML-310 board and a BEE2 development system and
both configurations show the same problem.  I use the System ACE IP
core (opb_sysace version 1.00.c on EDK9.1.02i).  If I compile my
kernel 2.6.24-rc3 (I use a very recent pull from git.secretlab.ca)
with the "old System ACE" driver from Xilinx, I can issue the reboot
command on busybox and I get a proper reboot.

If I compile my kernel with the new SystemACE driver
(drivers/block/xsysace.c), the driver works great and I can read/write
to the CompactFlash card without problems.   However, when I issue the
reboot command, I get a kernel oops.  Strangely, the oops does not
happen when I do a shutdown command (kernel halt).  I further
investigated the issue by building a RAM filesystem and building both
system ACE drivers as loadable modules.

If I don't load any systemACE module and call reboot => I get the oops message.
If I load the new systemACE module and call reboot => I get the oops message.
If I load the "Old SystemAce" module and call reboot => The processor
reboots properly - GOOD
If I load the "Old SystemAce" module, unload it and then call reboot
=> I get the oops message

So there is some code that gets called with the Old SystemACE driver
which makes it handle the reboot correctly that seems to be missing
from the new systemACE driver.  The hardware IP may be misbehaving
like sending an interrupt or something like that at some point when
the processor is going for reboot and unable to handle it.  The Xilinx
(old) driver seems to plug up this problem somehow...

Note that I did have to tweak the new systemACE driver slightly to
make it work on the ML-310 (because of the 8-bit bus of the
SystemACE):
@@ -1126,7 +1126,7 @@ static void __devexit ace_free(struct device *dev)
 static int __devinit ace_probe(struct platform_device *dev)
 {
        unsigned long physaddr = 0;
-       int bus_width = ACE_BUS_WIDTH_16; /* FIXME: should not be hard coded */
+       int bus_width = ACE_BUS_WIDTH_8; /* FIXME: should not be hard coded */
        int id = dev->id;
        int irq = NO_IRQ;

Here is an oops trace that I got when I don't load any systemACE
driver (I show a few of the last console messages before the oops).  I
got this message from my ML-310 board, but the same happens on the
BEE2:
==
The system is going down NOW!
Sending SIGTERM to all processes
Requesting system reboot
[   25.862440] Restarting system.
[   25.8�[   25.866467] Oops: Exception in kernel mode, sig: 8 [#1]
[   25.870382] NIP: c7d21e00 LR: c7d21df0 CTR: 00000001
[   25.875312] REGS: c7d21d40 TRAP: 2001df0   Not tainted
(2.6.24-rc3-jsc-seclab-NETWORK-g59558dc1-dirty)
[   25.884630] MSR: c00e873c <EE,IR,DR>  CR: 00000010  XER: 001200d2
[   25.890684] TASK = c7c37ba0[149] 'linuxrc' THREAD: c7d20000
[   25.896034] GPR00: 3c303e5b 20202032 352e3836 32343430 5d2000a0
c00f5858 00000700 7fb78f0c
[   25.904327] GPR08: 00000000 fee1dead 28121969 00000000 c7d21df0
c001f150 07ffdda8 00000000
[   25.912620] GPR16: 00000001 00000000 c0190000 c03cfcfc c000b54c
00000cb2 00000000 00000000
[   25.920914] GPR24: 00000000 ffffffff 00000000 0002d030 00000001
c002dc54 c7d21e84 c7d21e60
[   25.929381] NIP [c7d21e00] 0xc7d21e00
[   25.933009] LR [c7d21df0] 0xc7d21df0
[   25.936550] Call Trace:
[   25.938971] Instruction dump:
[   25.941908] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX
[   25.949595] XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
XXXXXXXX XXXXXXXX
==

Note that I have enabled CONFIG_DEBUG_BUGVERBOSE, CONFIG_DEBUG_INFO
and CONFIG_DEBUG_KERNEL.  However, those addresses given at NIP don't
seem to match anything in my kernel memory map.

I am a bit green to embedded linux and especially the debugging of
kernel errors...  Did any other ML-310 users encounter this problem
before ?

The easy fix is just to use the old driver from Xilinx, but the new
driver seems faster and is now part of the mainline tree...

Let me know if you know some tricks on how I could debug this problem.
 Note that at this time, I don't have a debugging cable for my ML-310,
so I rely on printing for debugging.

Thanks,

Jean-Samuel
-- 
Integrated Microsystems Laboratory
McGill University, Montréal, QC, CANADA
Web Page: http://chaos.ece.mcgill.ca