dcache BUG()
Gabriel Paubert
paubert at iram.es
Tue May 8 09:06:32 EST 2001
On Mon, 7 May 2001, Eli Chen wrote:
>
> > What version of silicon do you have, and what platform are you using?
>
> I am using the 405GP core, rev D. My tree is based off of the February 26th
> source from MontaVista.
>
> > Is there some simple test I can use to trigger this problem?
>
> Besides Brian's one-liner test, you can try flood pinging your 405GP. I
> have been consistently receiving these error messages after letting it run a
> while:
>
> Freeing alive device (cxxxxxxx), ethx
>
> and
>
> Attempt to release alive inet socket cxxxxxxx
>
> I have also occasionaly received other messages, which I have yet to receive
> after changing atomic.h.
Hmm, consider what happens if a down_trylock in an interrupt handler
fails. Actually dec_if_positive will leave a dangling reservation, since
it will skip the stwcx. instruction.
I had not looked at the code for very long, so I may misss something or be
completely wrong but I see a stwcx. instruction in transfer_to_handler
which I think is useless since the handler will always execute a lwarx
before attempting a stwcx., thereby making the state of the reservation at
the interrupt enty irrelevant.
On the other hand, when an interrupt handler dows a down_trylock and
then returns because it failed, it will leave the reservation active
until returing to the caller (once again if I did not miss anything in the
return path).
So the sequence of events which can cause corruption is the following:
1) lwarx atomic_var,
reservation set
2) interrrupt taken,
reservation set
3) stwcx. in interrupt prologue (transfer_to_handler),
reservation cleared
4) interrupt handler executes, talks to hardware
5) interupt handler modifies atomic_var,
reservation set and cleared (hence step 3 was not necessary)
6) down_trylock() fails,
reservation set
8) interrupt handler returns,
reservation still set
9) interrupt epilogue restores state and returns between lwarx and stwcx.,
reservation is still set!
10) swtcx. atomic_var, succeeds, but the variable has been modified in the
meantime, chaos ensues
In short, I think that step 3) should be moved to the epilogue(s),
ret_from_intercept, etc... Note that spin_trylock() could produce the same
effect in step 6), but it's SMP only.
What do you think, am I completely off base ?
I try to avoid looking at entry.S/head.S/misc.S and had not done it for a
long time since I think it's dangerous for my mental health, so, once
again, I might be completely wrong.
So I'd suggest the following one liner:
===== arch/ppc/kernel/entry.S 1.7 vs edited =====
--- 1.7/arch/ppc/kernel/entry.S Fri Apr 13 20:44:42 2001
+++ edited/arch/ppc/kernel/entry.S Tue May 8 01:02:47 2001
@@ -382,6 +382,7 @@
CLR_TOP32(r8)
mtspr SPRG2,r8 /* phys exception stack pointer */
1:
+ stwcx. r0,0,r1 /* Clear reservation - Gabriel. */
lwz r3,_CTR(r1)
lwz r0,_LINK(r1)
mtctr r3
You can also try to remove the
li r22,RESULT
stwcx. r22,r22,r21
or similar lines in head.S, head_4xx.S, and head_8xx.S to check that my
theory is correct.
Regards,
Gabriel.
** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/
More information about the Linuxppc-embedded
mailing list