[PATCH] ocxl: Fix concurrent AFU open and device removal
Greg Kurz
groug at kaod.org
Tue Jun 25 01:50:25 AEST 2019
On Mon, 24 Jun 2019 17:39:26 +0200
Frederic Barrat <fbarrat at linux.ibm.com> wrote:
> Le 24/06/2019 à 17:24, Greg Kurz a écrit :
> > On Mon, 24 Jun 2019 16:41:48 +0200
> > Frederic Barrat <fbarrat at linux.ibm.com> wrote:
> >
> >> If an ocxl device is unbound through sysfs at the same time its AFU is
> >> being opened by a user process, the open code may dereference freed
> >> stuctures, which can lead to kernel oops messages. You'd have to hit a
> >> tiny time window, but it's possible. It's fairly easy to test by
> >> making the time window bigger artificially.
> >>
> >> Fix it with a combination of 2 changes:
> >> - when an AFU device is found in the IDR by looking for the device
> >> minor number, we should hold a reference on the device until after the
> >> context is allocated. A reference on the AFU structure is kept when
> >> the context is allocated, so we can release the reference on the
> >> device after the context allocation.
> >> - with the fix above, there's still another even tinier window,
> >> between the time the AFU device is found in the IDR and the reference
> >> on the device is taken. We can fix this one by removing the IDR entry
> >> earlier, when the device setup is removed, instead of waiting for the
> >> 'release' device callback. With proper locking around the IDR.
> >>
> >> Fixes: 75ca758adbaf ("ocxl: Create a clear delineation between ocxl backend & frontend")
> >> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> >> ---
> >> mpe: this fixes a commit merged in v5.2-rc1. It's late, and I don't think it's that important. If it's for the next merge window, I would add:
> >> Cc: stable at vger.kernel.org # v5.2
> >>
> >>
> >> drivers/misc/ocxl/file.c | 23 +++++++++++------------
> >> 1 file changed, 11 insertions(+), 12 deletions(-)
> >>
> >> diff --git a/drivers/misc/ocxl/file.c b/drivers/misc/ocxl/file.c
> >> index 2870c25da166..4d1b44de1492 100644
> >> --- a/drivers/misc/ocxl/file.c
> >> +++ b/drivers/misc/ocxl/file.c
> >> @@ -18,18 +18,15 @@ static struct class *ocxl_class;
> >> static struct mutex minors_idr_lock;
> >> static struct idr minors_idr;
> >>
> >> -static struct ocxl_file_info *find_file_info(dev_t devno)
> >> +static struct ocxl_file_info *find_and_get_file_info(dev_t devno)
> >> {
> >> struct ocxl_file_info *info;
> >>
> >> - /*
> >> - * We don't declare an RCU critical section here, as our AFU
> >> - * is protected by a reference counter on the device. By the time the
> >> - * info reference is removed from the idr, the ref count of
> >> - * the device is already at 0, so no user API will access that AFU and
> >> - * this function can't return it.
> >> - */
> >> + mutex_lock(&minors_idr_lock);
> >> info = idr_find(&minors_idr, MINOR(devno));
> >> + if (info)
> >> + get_device(&info->dev);
> >> + mutex_unlock(&minors_idr_lock);
> >> return info;
> >> }
> >>
> >> @@ -58,14 +55,16 @@ static int afu_open(struct inode *inode, struct file *file)
> >>
> >> pr_debug("%s for device %x\n", __func__, inode->i_rdev);
> >>
> >> - info = find_file_info(inode->i_rdev);
> >> + info = find_and_get_file_info(inode->i_rdev);
> >> if (!info)
> >> return -ENODEV;
> >>
> >> rc = ocxl_context_alloc(&ctx, info->afu, inode->i_mapping);
> >> - if (rc)
> >> + if (rc) {
> >> + put_device(&info->dev);
> >
> > You could have a single call site for put_device() since it's
> > needed for both branches. No big deal.
>
>
> Agreed. Will fix if I end up respinning, but won't if it's the only
> complaint :-)
>
>
>
> >> return rc;
> >> -
> >> + }
> >> + put_device(&info->dev);
> >> file->private_data = ctx;
> >> return 0;
> >> }
> >> @@ -487,7 +486,6 @@ static void info_release(struct device *dev)
> >> {
> >> struct ocxl_file_info *info = container_of(dev, struct ocxl_file_info, dev);
> >>
> >> - free_minor(info);
> >> ocxl_afu_put(info->afu);
> >> kfree(info);
> >> }
> >> @@ -577,6 +575,7 @@ void ocxl_file_unregister_afu(struct ocxl_afu *afu)
> >>
> >> ocxl_file_make_invisible(info);
> >> ocxl_sysfs_unregister_afu(info);
> >> + free_minor(info);
> >
> > Since the IDR entry is added by ocxl_file_register_afu(), it seems to make
> > sense to undo that in ocxl_file_unregister_afu(). Out of curiosity, was there
> > any historical reason to do this in info_release() in the first place ?
>
>
> Yeah, it makes a lot of sense to remove the IDR entry in
> ocxl_file_unregister_afu(), that's where we undo the device. I wish I
> had noticed during the code reviews.
> I don't think there was any good reason to have it in info_release() in
> the first place. I remember the code went through many iterations to get
> the reference counting on the AFU structure and device done correctly,
> but we let that one slip.
>
> I now think the pre-5.2 ocxl code was also exposed to the 2nd window
> mentioned in the commit log (but the first window is new with the
> refactoring introduced in 5.2-rc1).
>
This calls for two separate patches then IMHO.
> Fred
>
>
>
> >
> > Reviewed-by: Greg Kurz <groug at kaod.org>
> >
> >> device_unregister(&info->dev);
> >> }
> >>
> >
>
More information about the Linuxppc-dev
mailing list