[PATCH] erofs: fix use-after-free of on-stack io[]

Fri Apr 1 22:15:36 AEDT 2022

On Fri, Apr 01, 2022 at 05:36:23PM +0800, Henry King wrote:
> Gao Xiang <hsiangkao at linux.alibaba.com> 于2022年4月1日周五 14:55写道：
> 
> >
> > On Fri, Apr 01, 2022 at 02:33:01PM +0800, Hongyu Jin wrote:
> > > From: Hongyu Jin <hongyu.jin at unisoc.com>
> > >
> > > The root cause is the race as follows(k5.4):
> > > Thread #1                               Thread #2(irq ctx)
> > >
> > > z_erofs_submit_and_unzip()
> > >   struct z_erofs_vle_unzip_io io_A[]
> > >   submit bio A
> > >                                         z_erofs_vle_read_endio() // bio A
> > >                                         z_erofs_vle_unzip_kickoff()
> > >                                         spin_lock_irqsave()
> > >                                         atomic_add_return()
> > >   wait_event()
> > >   [end of function]
> > > z_erofs_submit_and_unzip() // bio B
> > >                                         wake_up_locked(io_A[]) // crash
> > >   struct z_erofs_vle_unzip_io io_B[]
> > >   submit bio B
> > >   wait_event()
> >
> > Thanks, good catch!
> > Yet could you turn the race above into the current function names?
> ok, I will chang it.
> >
> > >
> > > Backtrace in kernel5.4:
> > > [   10.129413] 8<--- cut here ---
> > > [   10.129422] Unable to handle kernel paging request at virtual address eb0454a4
> > > [   10.364157] CPU: 0 PID: 709 Comm: getprop Tainted: G        WC O      5.4.147-ab09225 #1
> > > [   11.556325] [<c01b33b8>] (__wake_up_common) from [<c01b3300>] (__wake_up_locked+0x40/0x48)
> > > [   11.565487] [<c01b3300>] (__wake_up_locked) from [<c044c8d0>] (z_erofs_vle_unzip_kickoff+0x6c/0xc0)
> > > [   11.575438] [<c044c8d0>] (z_erofs_vle_unzip_kickoff) from [<c044c854>] (z_erofs_vle_read_endio+0x16c/0x17c)
> > > [   11.586082] [<c044c854>] (z_erofs_vle_read_endio) from [<c06a80e8>] (clone_endio+0xb4/0x1d0)
> > > [   11.595428] [<c06a80e8>] (clone_endio) from [<c04a1280>] (blk_update_request+0x150/0x4dc)
> > > [   11.604516] [<c04a1280>] (blk_update_request) from [<c06dea28>] (mmc_blk_cqe_complete_rq+0x144/0x15c)
> > > [   11.614640] [<c06dea28>] (mmc_blk_cqe_complete_rq) from [<c04a5d90>] (blk_done_softirq+0xb0/0xcc)
> > > [   11.624419] [<c04a5d90>] (blk_done_softirq) from [<c010242c>] (__do_softirq+0x184/0x56c)
> > > [   11.633419] [<c010242c>] (__do_softirq) from [<c01051e8>] (irq_exit+0xd4/0x138)
> > > [   11.641640] [<c01051e8>] (irq_exit) from [<c010c314>] (__handle_domain_irq+0x94/0xd0)
> > > [   11.650381] [<c010c314>] (__handle_domain_irq) from [<c04fde70>] (gic_handle_irq+0x50/0xd4)
> > > [   11.659641] [<c04fde70>] (gic_handle_irq) from [<c0101b70>] (__irq_svc+0x70/0xb0)
> > >
> > > Signed-off-by: Hongyu Jin <hongyu.jin at unisoc.com>
> > > ---
> > >  fs/erofs/zdata.c | 12 ++++--------
> > >  fs/erofs/zdata.h |  2 +-
> > >  2 files changed, 5 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/fs/erofs/zdata.c b/fs/erofs/zdata.c
> > > index 11c7a1aaebad..4c26faa817a3 100644
> > > --- a/fs/erofs/zdata.c
> > > +++ b/fs/erofs/zdata.c
> > > @@ -782,12 +782,9 @@ static void z_erofs_decompress_kickoff(struct z_erofs_decompressqueue *io,
> > >
> > >       /* wake up the caller thread for sync decompression */
> > >       if (sync) {
> > > -             unsigned long flags;
> > > -
> > > -             spin_lock_irqsave(&io->u.wait.lock, flags);
> > >               if (!atomic_add_return(bios, &io->pending_bios))
> > > -                     wake_up_locked(&io->u.wait);
> > > -             spin_unlock_irqrestore(&io->u.wait.lock, flags);
> > > +                     complete(&io->u.done);
> > > +
> > >               return;
> > >       }
> > >
> > > @@ -1207,7 +1204,7 @@ jobqueue_init(struct super_block *sb,
> > >       } else {
> > >  fg_out:
> > >               q = fgq;
> > > -             init_waitqueue_head(&fgq->u.wait);
> > > +             init_completion(&fgq->u.done);
> > >               atomic_set(&fgq->pending_bios, 0);
> > >       }
> > >       q->sb = sb;
> > > @@ -1370,8 +1367,7 @@ static void z_erofs_runqueue(struct super_block *sb,
> > >               return;
> > >
> > >       /* wait until all bios are completed */
> > > -     io_wait_event(io[JQ_SUBMIT].u.wait,
> > > -                   !atomic_read(&io[JQ_SUBMIT].pending_bios));
> > > +     wait_for_completion_io(&io[JQ_SUBMIT].u.done);
> >
> > Thanks, good catch!
> >
> > What if pending_bios is always 0 (nr_bios == 0), is it possible?
> The pending_bios isn't always 0.  If bio is completed faster before
> io_wait_event() called, the value of pending_bios from 1 to 0,
> when enter io_wait_event(), it will not acquire lock and return immediately.

nope, IMO, if no io submission, we could run into this case.

Thanks,
Gao Xiang

> >
> > Thanks,
> > Gao Xiang