Help Reqd on Strange kernel panic with PPC-405 running Linux.

Thu Sep 29 02:00:54 EST 2005

Hi All,

I am seeking help for a custom kernel module. If this is not the
appropriate
list, I apologize and could you point me to the right list.

I am seeing a strange kernel panic in a tasklet that gets
scheduled by an ISR. The panic happens when it tries to write to
a memory mapped register. A function is called to write to the
register in the tasklet with the base address, offset and value,
as arguments(in this order). The function adds the base and offset
and uses writel() to do the actual write.

This usually works fine. But when the panic happens, register r3
(base) and r4 (offset) have the same value. r3/base is the expected
ioremaped address, but r4 somehow gets the value of base. This is
strange as we always use #defined values for the offset (After the
panic, I have verified that the load immediate instruction has the
correct value and is not corrupted). When r3 and r4 are added, we
get an invalid virtual address and the kernel panics. In the oops
message it is clear that r0 is r3 + r4.

This is what the disassembled code looks like.
000005f8 <write_to_register>:
     5f8:   7c 03 22 14     add r0,r3,r4
     5fc:   7c a0 05 2c     stwbrx  r5,r0,r0
     600:   7c 00 06 ac     eieio
     604:   4e 80 00 20     blr

The most puzzling part I felt is that when I modified the code to loop
infinitely if base and offset are equal (See below for the new code),
we don't see the panic at all. In the above case the panic
happens in just a few minutes when I increase the frequency of the
interrupts, While in this case it works for hours.

000005f8 <write_to_register>:
     5f8:   7c 03 20 00     cmpw    r3,r4
     5fc:   41 a2 00 00     beq+    5fc <write_to_register+0x4>
     600:   7c 03 22 14     add r0,r3,r4
     604:   7c a0 05 2c     stwbrx  r5,r0,r0
     608:   7c 00 06 ac     eieio
     60c:   4e 80 00 20     blr

I don't see the problem when I modify in the following ways.
1. Add two nops before the add r0,r3,r4 instruction. (With one nop the
   problem happens).
2. Inlining the register write function.

Basically, I have observed that even the slightest change in the
instruction
sequence in and around the call to the register write function obviates
the
problem.

In the tasklet this register write routine is called a lot of times,
sometimes in big loops, to program different registers. But the crash
always happens at the same place.

There seems to be no data corruption. I can't say for sure for a stack
corruption,
but then why does the nops fix (or at least appears to have fixed) the
issue.

Could this be due to some errors in instruction re-ordering/branch
prediction
by the processor?

Hoping that the problem is something obvious to the PPC gurus.
I would appreciate any help/tips on what could be the problem.

I am running MontaVista Linux on a Xilinx PPC405.

Thanks in advance,
Arun.

PS. Please cc me as I am not subscribed to the group.

Confidentiality Notice

The information contained in this electronic message and any attachments to this message are intended
for the exclusive use of the addressee(s) and may contain confidential or privileged information. If
you are not the intended recipient, please notify the sender at Wipro or Mailadmin at wipro.com immediately
and destroy all copies of this message and any attachments.