[NET]: Fix possible dev_deactivate race condition

Fri Oct 19 17:35:19 EST 2007

On Fri, 2007-10-19 at 13:36 +0800, Herbert Xu wrote:
> On Fri, Oct 19, 2007 at 12:20:25PM +0800, Herbert Xu wrote:
> >
> > In fact this bug exists elsewhere too.  For example, the network
> > stack does this in net/sched/sch_generic.c:
> > 
> >         /* Wait for outstanding qdisc_run calls. */
> >       while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
> >               yield();
> > 
> > This has the same problem as the current synchronize_irq code.
> 

> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index e01d576..b3b7420 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -556,6 +556,7 @@ void dev_deactivate(struct net_device *dev)
>  {
>         struct Qdisc *qdisc;
>         struct sk_buff *skb;
> +       int running;
>  
>         spin_lock_bh(&dev->queue_lock);
>         qdisc = dev->qdisc;
> @@ -571,12 +572,31 @@ void dev_deactivate(struct net_device *dev)
>  
>         dev_watchdog_down(dev);
>  
> -       /* Wait for outstanding dev_queue_xmit calls. */
> +       /* Wait for outstanding qdisc-less dev_queue_xmit calls. */
>         synchronize_rcu();
>  
>         /* Wait for outstanding qdisc_run calls. */
> -       while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
> -               yield();
> +       do {
> +               while (test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state))
> +                       yield();
> +

Ouch!, is there really no sane locking alternative? Hashed waitqueues
like for the page lock come to mind.

> +               /*
> +                * Double-check inside queue lock to ensure that all effects
> +                * of the queue run are visible when we return.
> +                */
> +               spin_lock_bh(&dev->queue_lock);
> +               running = test_bit(__LINK_STATE_QDISC_RUNNING, &dev->state);
> +               spin_unlock_bh(&dev->queue_lock);
> +
> +               /*
> +                * The running flag should never be set at this point because
> +                * we've already set dev->qdisc to noop_qdisc *inside* the same
> +                * pair of spin locks.  That is, if any qdisc_run starts after
> +                * our initial test it should see the noop_qdisc and then
> +                * clear the RUNNING bit before dropping the queue lock.  So
> +                * if it is set here then we've found a bug.
> +                */
> +       } while (WARN_ON_ONCE(running));
>  }
>  
>  void dev_init_scheduler(struct net_device *dev)