[PROBLEM] Soft lockup on Linux 2.6.27, 2 patches, Cell/PPC64

Geert Uytterhoeven Geert.Uytterhoeven at sonycom.com
Wed Oct 15 20:46:54 EST 2008


On Wed, 15 Oct 2008, Benjamin Herrenschmidt wrote:
> On Wed, 2008-10-15 at 11:25 +0200, Geert Uytterhoeven wrote:
> > On Wed, 15 Oct 2008, Benjamin Herrenschmidt wrote:
> > > On Tue, 2008-10-14 at 11:32 +0200, Geert Uytterhoeven wrote:
> > > > which points again to smp_call_function_single...
> > > 
> > > Yup, it doesn't bring more information. At this stage, your 'other' CPU
> > > is stuck with interrupts disabled. Hard to tell what's happening without
> > > some HW assist. Do you have ways to trigger a non-maskable interrupt
> > > such as a 0x100 ? That would allow to catch the other guy in xmon and
> > > see what it was doing...
> > 
> > Interrupts are not disabled on the other CPU thread, at least not according to
> > the irqs_disabled() check I added to the printing of the `spinlock lockup'
> > message in __spin_lock_debug().
> > 
> > As the log also said
> > 
> > | hardirqs last  enabled at (5018779): [<c000000000007c1c>] restore+0x1c/0xe4
> > | hardirqs last disabled at (5018780): [<c000000000003600>] decrementer_common+0x100/0x180
> > 
> > I started blinking the LEDs on decrementer interupts, which do arrive on both
> > CPU threads.
> 
> Hrm, ok I though the log shows the decrementer interrupt of the thread
> that's still working. If you are confident they are both taking
> interrupts, then there's indeed something to track down.
> 
> > However, I'm a bit puzzled by these `hardirqs last enabled/disabled' messages,
> > as they do indicate interrupts are off...
> 
> Well, at the time of the sample, the other CPU indeed -seems- to be in
> an IRQ disabled section yes. 

This is not really a sample. The hardirqs enable/disable is actually tracked
using the TRACE_{EN,DIS}ABLE_INTS macros.

For the decrementer, the interrupt code is generated by the
STD_EXCEPTION_COMMON_LITE() macro.

Aha, none of the PPC interrupt handlers actually us TRACE_ENABLE_INTS (they do
use TRACE_DISABLE_INTS). So that's why it thinks decrementer_common disabled
interrupts, without enabling them again...

With kind regards,

Geert Uytterhoeven
Software Architect

Sony Techsoft Centre Europe
The Corporate Village · Da Vincilaan 7-D1 · B-1935 Zaventem · Belgium

Phone:    +32 (0)2 700 8453
Fax:      +32 (0)2 700 8622
E-mail:   Geert.Uytterhoeven at sonycom.com
Internet: http://www.sony-europe.com/

A division of Sony Europe (Belgium) N.V.
VAT BE 0413.825.160 · RPR Brussels
Fortis · BIC GEBABEBB · IBAN BE41293037680010


More information about the Linuxppc-dev mailing list