Tasks stuck in futex code (in 3.14-rc6)

Davidlohr Bueso davidlohr at hp.com
Thu Mar 20 16:56:59 EST 2014


On Thu, 2014-03-20 at 11:03 +0530, Srikar Dronamraju wrote:
> > > Joy,.. let me look at that with ppc in mind.
> > 
> > OK; so while pretty much all the comments from that patch are utter
> > nonsense (what was I thinking), I cannot actually find a real bug.
> > 
> > But could you try the below which replaces a control dependency with a
> > full barrier. The control flow is plenty convoluted that I think the
> > control barrier isn't actually valid anymore and that might indeed
> > explain the fail.
> > 
> 
> Unfortunately the patch didnt help. Still seeing tasks stuck
> 
> # ps -Ao pid,tt,user,fname,tmout,f,wchan | grep futex
> 14680 pts/0    root     java         - 0 futex_wait_queue_me
> 14797 pts/0    root     java         - 0 futex_wait_queue_me
> # :> /var/log/messages
> # echo t > /proc/sysrq-trigger 
> # grep futex_wait_queue_me /var/log/messages | wc -l 
> 334
> #
> 
> [ 6904.211478] Call Trace:
> [ 6904.211481] [c000000fa1f1b4d0] [0000000000000020] 0x20 (unreliable)
> [ 6904.211486] [c000000fa1f1b6a0] [c000000000015208] .__switch_to+0x1e8/0x330
> [ 6904.211491] [c000000fa1f1b750] [c000000000702f00] .__schedule+0x360/0x8b0
> [ 6904.211495] [c000000fa1f1b9d0] [c000000000147348] .futex_wait_queue_me+0xf8/0x1a0
> [ 6904.211500] [c000000fa1f1ba60] [c0000000001486dc] .futex_wait+0x17c/0x2a0
> [ 6904.211505] [c000000fa1f1bc10] [c00000000014a614] .do_futex+0x254/0xd80
> [ 6904.211510] [c000000fa1f1bd60] [c00000000014b25c] .SyS_futex+0x11c/0x1d0
> [ 6904.238874] [c000000fa1f1be30] [c00000000000a0fc] syscall_exit+0x0/0x7c
> [ 6904.238879] java            S 00003fff825f6044     0 14682  14076 0x00000080
> 
> Is there any other information that I provide that can help?

This problem suggests that we missed a wakeup for a task that was adding
itself to the queue in a wait path. And the only place that can happen
is with the hb spinlock check for any pending waiters. Just in case we
missed some assumption about checking the hash bucket spinlock as a way
of detecting any waiters (powerpc?), could you revert this commit and
try the original atomic operations variant:

https://lkml.org/lkml/2013/12/19/630



More information about the Linuxppc-dev mailing list