[PATCH] vfs: Fix potential circular locking through setxattr() and removexattr()
Christian Brauner
brauner at kernel.org
Wed Jul 24 01:40:00 AEST 2024
On Tue, Jul 23, 2024 at 02:57:46PM GMT, David Howells wrote:
> Jan Kara <jack at suse.cz> wrote:
>
> > Well, it seems like you are trying to get rid of the dependency
> > sb_writers->mmap_sem. But there are other places where this dependency is
> > created, in particular write(2) path is a place where it would be very
> > difficult to get rid of it (you take sb_writers, then do all the work
> > preparing the write and then you copy user data into page cache which
> > may require mmap_sem).
> >
> > ...
> >
> > This is the problematic step - from quite deep in the locking chain holding
> > invalidate_lock and having PG_Writeback set you suddently jump to very outer
> > locking context grabbing sb_writers. Now AFAICT this is not a real deadlock
> > problem because the locks are actually on different filesystems, just
> > lockdep isn't able to see this. So I don't think you will get rid of these
> > lockdep splats unless you somehow manage to convey to lockdep that there's
> > the "upper" fs (AFS in this case) and the "lower" fs (the one behind
> > cachefiles) and their locks are different.
>
> I'm not sure you're correct about that. If you look at the lockdep splat:
>
> > -> #2 (sb_writers#14){.+.+}-{0:0}:
>
> The sb_writers lock is "personalised" to the filesystem type (the "#14"
> annotation) which is set here:
>
> for (i = 0; i < SB_FREEZE_LEVELS; i++) {
> if (__percpu_init_rwsem(&s->s_writers.rw_sem[i],
> sb_writers_name[i],
> &type->s_writers_key[i])) <----
> goto fail;
> }
>
> in fs/super.c.
>
> I think the problem is (1) that on one side, you've got, say, sys_setxattr()
> taking an sb_writers lock and then accessing a userspace buffer, which (a) may
> take mm->mmap_lock and vma->vm_lock and (b) may cause reading or writeback
> from the netfs-based filesystem via an mmapped xattr name buffer].
>
> Then (2) on the other side, you have a read or a write to the network
> filesystem through netfslib which may invoke the cache, which may require
> cachefiles to check the xattr on the cache file and maybe set/remove it -
> which requires the sb_writers lock on the cache filesystem.
>
> So if ->read_folio(), ->readahead() or ->writepages() can ever be called with
> mm->mmap_lock or vma->vm_lock held, netfslib may call down to cachefiles and
> ultimately, it should[*] then take the sb_writers lock on the backing
> filesystem to perform xattr manipulation.
>
> [*] I say "should" because at the moment cachefiles calls vfs_set/removexattr
> functions which *don't* take this lock (which is a bug). Is this an error
> on the part of vfs_set/removexattr()? Should they take this lock
> analogously with vfs_truncate() and vfs_iocb_iter_write()?
It's not bug per se. We have a few vfs_*() functions that don't take
write access iirc. The reason often is that e.g., callers like overlayfs
call vfs_*() helpers in various locations where write access to the
underlying and overlayfs layer is taken some time before calling into
vs_*() helper (see ovl_do_setxattr() during various copy up operation
iirc.).
Fwiw, I suspect that e.g., ecryptfs doesn't claim write access at all to
the underlying filesystem - not just xattrs.
(What would probably be good is if we could add lockdep asserts so that
we get warnings about missing locks taken.)
>
> However, as it doesn't it manages to construct a locking chain via the
> mapping.invalidate_lock, the afs vnode->validate_lock and something in execve
> that I don't exactly follow.
>
>
> I wonder if this is might be deadlockable by a multithreaded process (ie. so
> they share the mm locks) where one thread is writing to a cached file whilst
> another thread is trying to set/remove the xattr on that file.
>
> David
>
More information about the Linux-erofs
mailing list