[PATCH] erofs: don't bother with s_stack_depth increasing for now

Sun Jan 4 21:01:10 AEDT 2026

[+fsdevel][+overlayfs]

On Sun, Jan 4, 2026 at 4:56 AM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
>
> Hi Amir,
>
> On 2026/1/1 23:52, Amir Goldstein wrote:
> > On Wed, Dec 31, 2025 at 9:42 PM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
> >>
> >> Previously, commit d53cd891f0e4 ("erofs: limit the level of fs stacking
> >> for file-backed mounts") bumped `s_stack_depth` by one to avoid kernel
> >> stack overflow, but it breaks composefs mounts, which need erofs+ovl^2
> >> sometimes (and such setups are already used in production for quite long
> >> time) since `s_stack_depth` can be 3 (i.e., FILESYSTEM_MAX_STACK_DEPTH
> >> needs to change from 2 to 3).
> >>
> >> After a long discussion on GitHub issues [1] about possible solutions,
> >> it seems there is no need to support nesting file-backed mounts as one
> >> conclusion (especially when increasing FILESYSTEM_MAX_STACK_DEPTH to 3).
> >> So let's disallow this right now, since there is always a way to use
> >> loopback devices as a fallback.
> >>
> >> Then, I started to wonder about an alternative EROFS quick fix to
> >> address the composefs mounts directly for this cycle: since EROFS is the
> >> only fs to support file-backed mounts and other stacked fses will just
> >> bump up `FILESYSTEM_MAX_STACK_DEPTH`, just check that `s_stack_depth`
> >> != 0 and the backing inode is not from EROFS instead.
> >>
> >> At least it works for all known file-backed mount use cases (composefs,
> >> containerd, and Android APEX for some Android vendors), and the fix is
> >> self-contained.
> >>
> >> Let's defer increasing FILESYSTEM_MAX_STACK_DEPTH for now.
> >>
> >> Fixes: d53cd891f0e4 ("erofs: limit the level of fs stacking for file-backed mounts")
> >> Closes: https://github.com/coreos/fedora-coreos-tracker/issues/2087 [1]
> >> Closes: https://lore.kernel.org/r/CAFHtUiYv4+=+JP_-JjARWjo6OwcvBj1wtYN=z0QXwCpec9sXtg@mail.gmail.com
> >> Cc: Amir Goldstein <amir73il at gmail.com>
> >> Cc: Alexander Larsson <alexl at redhat.com>
> >> Cc: Christian Brauner <brauner at kernel.org>
> >> Cc: Miklos Szeredi <mszeredi at redhat.com>
> >> Signed-off-by: Gao Xiang <hsiangkao at linux.alibaba.com>
> >> ---
> >
> > Acked-by: Amir Goldstein <amir73il at gmail.com>
> >
> > But you forgot to include details of the stack usage analysis you ran
> > with erofs+ovl^2 setup.
> >
> > I am guessing people will want to see this information before relaxing
> > s_stack_depth in this case.
>
> Sorry I didn't check emails these days, I'm not sure if posting
> detailed stack traces are useful, how about adding the following
> words:

Didn't mean detailed stack traces, but you did some tests with the
new possible setup and you reached stack usage < 8K so  I think this is
something worth mentioning.

>
> Note: There are some observations while evaluating the erofs + ovl^2
> setup with an XFS backing fs:
>
>   - Regular RW workloads traverse only one overlayfs layer regardless of
>     the value of FILESYSTEM_MAX_STACK_DEPTH, because `upperdir=` cannot
>     point to another overlayfs.  Therefore, for pure RW workloads, the
>     typical stack is always just:
>       overlayfs + upper fs + underlay storage
>
>   - For read-only workloads and the copy-up read part (ovl_splice_read),
>     the difference can lie in how many overlays are nested.
>     The stack just looks like either:
>       ovl + ovl [+ erofs] + backing fs + underlay storage
>     or
>       ovl [+ erofs] + ext4/xfs + underlay storage
>
>   - The fs reclaim path should be entered only once, so the writeback
>     path will not re-enter.
>
> Sorry about my English, and I'm not sure if it's enough (e.g. FUSE
> passthrough part).  I will look for your further inputs (and other
> acks) before sending this patch upstream.
>

I think that most people will have problems understanding this
rationale not because of the English, but because of the tech ;)
this is a bit too hand wavy IMO.

> (Also btw, i'm not sure if it's possible to optimize read_iter and
>   splice_read stack usage even further in overlayfs, e.g. just
>   recursive handling real file/path directly in the top overlayfs
>   since the permission check is already done when opening the file.)

Maybe so, but LSM permission to open hook is not the same hook
as permission to read/write.

Thanks,
Amir.