[PATCH v4 00/54] tree-in-dcache stuff
Samuel Wu
wusamuel at google.com
Tue Jan 27 11:56:42 AEDT 2026
On Mon, Nov 17, 2025 at 9:15 PM Al Viro <viro at zeniv.linux.org.uk> wrote:
>
> Some filesystems use a kinda-sorta controlled dentry refcount leak to pin
> dentries of created objects in dcache (and undo it when removing those).
> Reference is grabbed and not released, but it's not actually _stored_
> anywhere. That works, but it's hard to follow and verify; among other
> things, we have no way to tell _which_ of the increments is intended
> to be an unpaired one. Worse, on removal we need to decide whether
> the reference had already been dropped, which can be non-trivial if
> that removal is on umount and we need to figure out if this dentry is
> pinned due to e.g. unlink() not done. Usually that is handled by using
> kill_litter_super() as ->kill_sb(), but there are open-coded special
> cases of the same (consider e.g. /proc/self).
>
> Things get simpler if we introduce a new dentry flag (DCACHE_PERSISTENT)
> marking those "leaked" dentries. Having it set claims responsibility
> for +1 in refcount.
>
> The end result this series is aiming for:
>
> * get these unbalanced dget() and dput() replaced with new primitives that
> would, in addition to adjusting refcount, set and clear persistency flag.
> * instead of having kill_litter_super() mess with removing the remaining
> "leaked" references (e.g. for all tmpfs files that hadn't been removed
> prior to umount), have the regular shrink_dcache_for_umount() strip
> DCACHE_PERSISTENT of all dentries, dropping the corresponding
> reference if it had been set. After that kill_litter_super() becomes
> an equivalent of kill_anon_super().
>
> Doing that in a single step is not feasible - it would affect too many places
> in too many filesystems. It has to be split into a series.
>
> This work has really started early in 2024; quite a few preliminary pieces
> have already gone into mainline. This chunk is finally getting to the
> meat of that stuff - infrastructure and most of the conversions to it.
>
> Some pieces are still sitting in the local branches, but the bulk of
> that stuff is here.
>
> Compared to v3:
> * fixed a functionfs braino around ffs_epfiles_destroy() (in #40/54,
> used to be #36/50).
> * added fixes for a couple of UAF in functionfs (##36--39); that
> does *NOT* include any fixes for dmabuf bugs Chris posted last week, though.
>
> The branch is -rc5-based; it lives in
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #work.persistency
> individual patches in followups.
>
> Please, help with review and testing. If nobody objects, in a few days it
> goes into #for-next.
>
> Shortlog:
> fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> tracefs: fix a leak in eventfs_create_events_dir()
> new helper: simple_remove_by_name()
> new helper: simple_done_creating()
> introduce a flag for explicitly marking persistently pinned dentries
> primitives for maintaining persisitency
> convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
> convert ramfs and tmpfs
> procfs: make /self and /thread_self dentries persistent
> configfs, securityfs: kill_litter_super() not needed
> convert xenfs
> convert smackfs
> convert hugetlbfs
> convert mqueue
> convert bpf
> convert dlmfs
> convert fuse_ctl
> convert pstore
> convert tracefs
> convert debugfs
> debugfs: remove duplicate checks in callers of start_creating()
> convert efivarfs
> convert spufs
> convert ibmasmfs
> ibmasmfs: get rid of ibmasmfs_dir_ops
> convert devpts
> binderfs: use simple_start_creating()
> binderfs_binder_ctl_create(): kill a bogus check
> convert binderfs
> autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> convert autofs
> convert binfmt_misc
> selinuxfs: don't stash the dentry of /policy_capabilities
> selinuxfs: new helper for attaching files to tree
> convert selinuxfs
> functionfs: don't abuse ffs_data_closed() on fs shutdown
> functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> functionfs: need to cancel ->reset_work in ->kill_sb()
> functionfs: fix the open/removal races
> functionfs: switch to simple_remove_by_name()
> convert functionfs
> gadgetfs: switch to simple_remove_by_name()
> convert gadgetfs
> hypfs: don't pin dentries twice
> hypfs: switch hypfs_create_str() to returning int
> hypfs: swich hypfs_create_u64() to returning int
> convert hypfs
> convert rpc_pipefs
> convert nfsctl
> convert rust_binderfs
> get rid of kill_litter_super()
> convert securityfs
> kill securityfs_recursive_remove()
> d_make_discardable(): warn if given a non-persistent dentry
>
> Diffstat:
> Documentation/filesystems/porting.rst | 7 ++
> arch/powerpc/platforms/cell/spufs/inode.c | 17 ++-
> arch/s390/hypfs/hypfs.h | 6 +-
> arch/s390/hypfs/hypfs_diag_fs.c | 60 ++++------
> arch/s390/hypfs/hypfs_vm_fs.c | 21 ++--
> arch/s390/hypfs/inode.c | 82 +++++--------
> drivers/android/binder/rust_binderfs.c | 121 ++++++-------------
> drivers/android/binderfs.c | 82 +++----------
> drivers/base/devtmpfs.c | 2 +-
> drivers/misc/ibmasm/ibmasmfs.c | 24 ++--
> drivers/usb/gadget/function/f_fs.c | 144 +++++++++++++----------
> drivers/usb/gadget/legacy/inode.c | 49 ++++----
> drivers/xen/xenfs/super.c | 2 +-
> fs/autofs/inode.c | 2 +-
> fs/autofs/root.c | 11 +-
> fs/binfmt_misc.c | 69 ++++++-----
> fs/configfs/dir.c | 10 +-
> fs/configfs/inode.c | 3 +-
> fs/configfs/mount.c | 2 +-
> fs/dcache.c | 111 +++++++++++-------
> fs/debugfs/inode.c | 32 ++----
> fs/devpts/inode.c | 57 ++++-----
> fs/efivarfs/inode.c | 7 +-
> fs/efivarfs/super.c | 5 +-
> fs/fuse/control.c | 38 +++---
> fs/hugetlbfs/inode.c | 12 +-
> fs/internal.h | 1 -
> fs/libfs.c | 52 +++++++--
> fs/nfsd/nfsctl.c | 18 +--
> fs/ocfs2/dlmfs/dlmfs.c | 8 +-
> fs/proc/base.c | 6 +-
> fs/proc/internal.h | 1 +
> fs/proc/root.c | 14 +--
> fs/proc/self.c | 10 +-
> fs/proc/thread_self.c | 11 +-
> fs/pstore/inode.c | 7 +-
> fs/ramfs/inode.c | 8 +-
> fs/super.c | 8 --
> fs/tracefs/event_inode.c | 7 +-
> fs/tracefs/inode.c | 13 +--
> include/linux/dcache.h | 4 +-
> include/linux/fs.h | 6 +-
> include/linux/proc_fs.h | 2 -
> include/linux/security.h | 2 -
> init/do_mounts.c | 2 +-
> ipc/mqueue.c | 12 +-
> kernel/bpf/inode.c | 15 +--
> mm/shmem.c | 38 ++----
> net/sunrpc/rpc_pipe.c | 27 ++---
> security/apparmor/apparmorfs.c | 13 ++-
> security/inode.c | 35 +++---
> security/selinux/selinuxfs.c | 185 +++++++++++++-----------------
> security/smack/smackfs.c | 2 +-
> 53 files changed, 649 insertions(+), 834 deletions(-)
>
> Overview:
>
> First two commits are bugfixes (fusectl and tracefs resp.)
>
> [1/54] fuse_ctl_add_conn(): fix nlink breakage in case of early failure
> [2/54] tracefs: fix a leak in eventfs_create_events_dir()
>
> Next, two commits adding a couple of useful helpers, the next three adding
> the infrastructure and the rest consists of per-filesystem conversions.
>
> [3/54] new helper: simple_remove_by_name()
> [4/54] new helper: simple_done_creating()
> end_creating_path() analogue for internal object creation; unlike
> end_creating_path() no mount is passed to it (or guaranteed to exist, for
> that matter - it might be used during the filesystem setup, before the
> superblock gets attached to any mounts).
>
> Infrastructure:
> [5/54] introduce a flag for explicitly marking persistently pinned dentries
> * introduce the new flag
> * teach shrink_dcache_for_umount() to handle it (i.e. remove
> and drop refcount on anything that survives to umount with that flag
> still set)
> * teach kill_litter_super() that anything with that flag does
> *not* need to be unpinned.
> [6/54] primitives for maintaining persisitency
> * d_make_persistent(dentry, inode) - bump refcount, mark persistent
> and make hashed positive. Return value is a borrowed reference to dentry;
> it can be used until something removes persistency (at the very least,
> until the parent gets unlocked, but some filesystems may have stronger
> exclusion).
> * d_make_discardable() - remove persistency mark and drop reference.
>
> NOTE: at that stage d_make_discardable() does not reject dentries not
> marked persistent - it acts as if the mark been set.
>
> Rationale: less noise in series splitup that way. We want (and on the
> next commit will get) simple_unlink() to do the right thing - remove
> persistency, if it's there. However, it's used by many filesystems.
> We would have either to convert them all at once or split simple_unlink()
> into "want persistent" and "don't want persistent" versions, the latter
> being the old one. In the course of the series almost all callers
> would migrate to the replacement, leaving only two pathological cases
> with the old one. The same goes for simple_rmdir() (two callers left in
> the end), simple_recursive_removal() (all callers gone in the end), etc.
> That's a lot of noise and it's easier to start with d_make_discardable()
> quietly accepting non-persistent dentries, then, in the end, add private
> copies of simple_unlink() and simple_rmdir() for two weird users (configfs
> and apparmorfs) and have those use dput() instead of d_make_discardable().
> At that point we'd be left with all callers of d_make_discardable()
> always passing persistent dentries, allowing to add a warning in it.
>
> [7/54] convert simple_{link,unlink,rmdir,rename,fill_super}() to new primitives
> See above re quietly accepting non-peristent dentries in
> simple_unlink(), simple_rmdir(), etc.
>
> Converting filesystems:
> [8/54] convert ramfs and tmpfs
> [9/54] procfs: make /self and /thread_self dentries persistent
> [10/54] configfs, securityfs: kill_litter_super() not needed
> [11/54] convert xenfs
> [12/54] convert smackfs
> [13/54] convert hugetlbfs
> [14/54] convert mqueue
> [15/54] convert bpf
> [16/54] convert dlmfs
> [17/54] convert fuse_ctl
> [18/54] convert pstore
> [19/54] convert tracefs
> [20/54] convert debugfs
> [21/54] debugfs: remove duplicate checks in callers of start_creating()
> [22/54] convert efivarfs
> [23/54] convert spufs
> [24/54] convert ibmasmfs
> [25/54] ibmasmfs: get rid of ibmasmfs_dir_ops
> [26/54] convert devpts
> [27/54] binderfs: use simple_start_creating()
> [28/54] binderfs_binder_ctl_create(): kill a bogus check
> [29/54] convert binderfs
> [30/54] autofs_{rmdir,unlink}: dentry->d_fsdata->dentry == dentry there
> [31/54] convert autofs
> [32/54] convert binfmt_misc
> [33/54] selinuxfs: don't stash the dentry of /policy_capabilities
> [34/54] selinuxfs: new helper for attaching files to tree
> [35/54] convert selinuxfs
>
> Several functionfs fixes, before converting it, to make life
> simpler for backporting:
> [36/54] functionfs: don't abuse ffs_data_closed() on fs shutdown
> [37/54] functionfs: don't bother with ffs->ref in ffs_data_{opened,closed}()
> [38/54] functionfs: need to cancel ->reset_work in ->kill_sb()
> [39/54] functionfs: fix the open/removal races
>
> ... and back to filesystems conversions:
>
> [40/54] functionfs: switch to simple_remove_by_name()
> [41/54] convert functionfs
> [42/54] gadgetfs: switch to simple_remove_by_name()
> [43/54] convert gadgetfs
> [44/54] hypfs: don't pin dentries twice
> [45/54] hypfs: switch hypfs_create_str() to returning int
> [46/54] hypfs: swich hypfs_create_u64() to returning int
> [47/54] convert hypfs
> [48/54] convert rpc_pipefs
> [49/54] convert nfsctl
> [50/54] convert rust_binderfs
>
> ... and no kill_litter_super() callers remain, so we
> can take it out:
> [51/54] get rid of kill_litter_super()
>
> Followups:
> [52/54] convert securityfs
> That was the last remaining user of simple_recursive_removal()
> that did *not* mark things persistent. Now the only places where
> d_make_discardable() is still called for dentries that are not marked
> persistent are the calls of simple_{unlink,rmdir}() in configfs and
> apparmorfs.
>
> [53/54] kill securityfs_recursive_remove()
> Unused macro...
>
> [54/54] d_make_discardable(): warn if given a non-persistent dentry
>
> At this point there are very few call chains that might lead to
> d_make_discardable() on a dentry that hadn't been made persistent:
> calls of simple_unlink() and simple_rmdir() in configfs and
> apparmorfs.
>
> Both filesystems do pin (part of) their contents in dcache, but
> they are currently playing very unusual games with that. Converting
> them to more usual patterns might be possible, but it's definitely
> going to be a long series of changes in both cases.
>
> For now the easiest solution is to have both stop using simple_unlink()
> and simple_rmdir() - that allows to make d_make_discardable() warn
> when given a non-persistent dentry.
>
> Rather than giving them full-blown private copies (with calls of
> d_make_discardable() replaced with dput()), let's pull the parts of
> simple_unlink() and simple_rmdir() that deal with timestamps and link
> counts into separate helpers (__simple_unlink() and __simple_rmdir()
> resp.) and have those used by configfs and apparmorfs.
>
Hi Al, when I apply this patchset my Pixel 6 no longer enumerates on
lsusb or ADB. It was quite hard to bisect to this point, as this is
non-deterministic and seems to be setup specific. Note, I am using
android-mainline, but my understanding is that this build does not
have any out-of-tree USB patches, and that there are no vendor hooks
in the build.
My apologies as I can't offer any other clues; there are no obviously
bad dmesg logs and I'm still working on narrowing down the exact
commit(s) that started this, but just wanted to send a FYI in case
something stands out as obvious.
Thanks!
Sam
More information about the Linuxppc-dev
mailing list