fscache vs fanotify behavior difference

Mon Nov 10 18:54:53 AEDT 2025

Hi David,

On 2025/11/10 14:59, David Reiss wrote:
> Hi,
> 
> I've been using erofs with the (deprecated) fscache integration in a project. I recently tried to switch it to use the new fanotify pre-content mechanism, but I'm running into differences in behavior.

In brief, we currently have very limited development resources.
I had intended to demo fanotify hooks in erofs-utils, but there
are always higher-priority tasks on my side (e.g. containerd
improvements, native microVM support, etc.)
  > 
> Here's the basic architecture: It's very similar to a container image distribution use case, with chunk-based deduplication across images. I have erofs images which contain metadata and small inline data. Larger data uses chunk format inodes, and points to chunks in a different "device". The chunked data device is shared by all images.
> 
> With fscache, I use one fsid per image, and one fsid for all of the chunked data. In the read hook for the images, I write the whole erofs image. In the read hook for the data, I fetch just the requested chunk (plus some readahead) and write that to fscache. Once the data is present on disk, fscache just uses it and never sends another read hook.

Yes, fscache seems more efficient in this regard, but we've mainly
encountered three issues with this approach:

  - Since fscache is not part of the EROFS filesystem, and the
    fscache/cachefiles maintainers pay little attention to "EROFS
    over fscache,” new features or optimizations are often not
    accepted in a timely manner (lagging).  In addition, fscache is
    now tied to netfs (as per the fscache maintainer's plan), which
    makes EROFS further fscache integration more awkward.

  - fscache relies on a fixed tree hierarchy, which makes userspace
    programs inflexible.

  - The fscache maintainer intends to remove sparse detection and
     introduce another mechanism (possibly incompatible), which
     would make this feature even more inflexible.

> 
> With fanotify+pre-content, I'm noticing that my pre-content hook is called any time data is not in the page cache, even if the offset being read is already mapped on disk. This kind of defeats the purpose of on-demand fetching if it has to go to userspace for most reads. The goal would be to keep the read path in the kernel and only go to userspace to fetch data that isn't present on disk.
I think your understanding is correct - if you hook an underlay EROFS
file and use file-backed mounts to mount the file.

   - If the page cache is invalid, it will trigger the underlay fsnotify
   hook anyway, which is different from the previous fscache approach.

   - The reason is that the kernel can't tell whether the exact part of
   the underlay file is valid, so it simply upcalls into userspace
   unconditionally. I wonder if it’s possible to introduce some BPF
   hooks to conditionally notify userspace, but I’ve never found time to
   look into that.

In any case, unless we introduce a new in-house kernel caching mechanism
dedicated for EROFS (of course which could be controversial), those
generic "pre-content fanotify hooks" would really help clean up the
EROFS overall codebase.   But again, I've never evaluated those new
hooks, so fscache interfaces have not been removed yet.

That's the current status.  Also, some off topic: the current mature
approach is to use virtual block devices (such as NBD or UBLK) if
they meet your requirements too.  I know it's not perfect, but at
least it’s an alternative for now.

> 
> Could you advise on how to achieve this goal with the new fanotify mechanism?
> 
> If you're interested you can find all the code here: https://github.com/dnr/styx/

Although I'm very interested in this, my own time is fragmented,
so many TODOs on my own side and I have to resolve our own
erofs-unrelated internal storage/filesystem issues.  I do hope I
could form an official "fanotify support" at least in erofs-utils
later but it needs more my extra personal time.

Thanks,
Gao Xiang

> 
> Thanks,
> David