softirqs are invoked while bottom halves are masked (was: Re: [PATCH] [PATCH] Fix deadlock in af_packet while stressing raw ethernet socket interface)

Wed Jul 13 05:52:04 EST 2011

On Tue, Jul 12, 2011 at 5:27 PM, Eric Dumazet <eric.dumazet at gmail.com> wrote:
> Le mardi 12 juillet 2011 à 14:03 +0200, Ronny Meeus a écrit :
>
>> Sorry for not mentioning we were using a patched kernel.
>> I was not aware that the code involved was patched by the FreeScale
>> patches we applied. The code found in the stack dumps is not
>> implemented in FSL specific files.
>>
>> While reading the code of af_packet I saw that the spin_lock_bh is
>> used in several places while this is not the case in the tpacket_rcv
>> function. Since we had a locking issue in that code, I thought that my
>> patch would be OK.
>> I was not aware that for that specific function (tpacket_rcv) a
>> different lock primitive must be used. A suggestion for improvement:
>> it would be better to document this pre-condition in the code.
>>
>> After doing the change you proposed our code now looks like:
>>
>> >---if (dev->features & NETIF_F_HW_QDISC) {
>> >--->---txq = dev_pick_tx(dev, skb);
>> >--->---local_bh_disable();
>> >--->---rc = dev_hard_start_xmit(skb, dev, txq);
>> >--->---local_bh_enable();
>> >--->---return rc;
>> >---}
>>
>> >---/* Disable soft irqs for various locks below. Also
>> >--- * stops preemption for RCU.
>> >--- */
>> >---rcu_read_lock_bh();
>>
>> but we still see the issue "BUG: sleeping function called from invalid context":
>
> Of course you are if this is the only change you did.
>
>>
>> [   91.015989] BUG: sleeping function called from invalid context at
>> include/linux/skbuff.h:786
>> [   91.117096] in_atomic(): 1, irqs_disabled(): 0, pid: 1865, name: NMTX_T1842
>> [   91.200461] Call Trace:
>> [   91.229672] [ec58bbd0] [c000789c] show_stack+0x78/0x18c (unreliable)
>> [   91.305791] [ec58bc10] [c0022900] __might_sleep+0x100/0x118
>> [   91.372524] [ec58bc20] [c029f8d8] dpa_tx+0x128/0x758
>
>
> Please read again my mail :
>
> I said : "doing GFP_KERNEL allocations in dpa_tx() is wrong, for sure."
>
> I dont have this code, but I suspect it's using : skb_copy(skb,
> GFP_KERNEL)
>
> Just say no, use GFP_ATOMIC instead.
>
> Real question is : why skb_copy() is done, since its slow as hell.
>
>> [   91.431957] [ec58bc80] [c02d78ec] dev_hard_start_xmit+0x424/0x588
>> [   91.504952] [ec58bcc0] [c02d7ab0] dev_queue_xmit+0x60/0x3ac
>> [   91.571692] [ec58bcf0] [c0338d54] packet_sendmsg+0x8c4/0x988
>> [   91.639457] [ec58bd70] [c02c3838] sock_sendmsg+0x90/0xb4
>> [   91.703066] [ec58be40] [c02c4420] sys_sendto+0xdc/0x120
>> [   91.765646] [ec58bf10] [c02c57d0] sys_socketcall+0x148/0x210
>> [   91.833420] [ec58bf40] [c001084c] ret_from_syscall+0x0/0x3c
>> [   91.900153] --- Exception: c01 at 0x4824df00
>> [   91.900157]     LR = 0x4828a030
>>
>
>
>

I have identified the piece of code, it was a call to skb_unshare. I
have changed it into GFP_ATOMIC.

>---if (skb_cloned(skb))
>--->---skb = skb_unshare(skb, GFP_ATOMIC);

After doing this change, I do not see the issue anymore. At least not
for the test I'm doing right now.
After seeing all your comments today, it might be that other issues popup later.

BTW Are there any good sites (or books) that document this part of the
Linux kernel?

Best regards,
Ronny