[PATCH v2 00/20] fscache,erofs: fscache-based demand-read semantics

JeffleXu jefflexu at linux.alibaba.com
Wed Jan 26 16:26:32 AEDT 2022

On 1/26/22 4:27 AM, David Howells wrote:
> David Howells <dhowells at redhat.com> wrote:
>>  (1) Duplicate the cachefiles backend.  You can discard a lot of it, since a
>>      much of it is concerned with managing local modifications - which you're
>>      not going to do since you have a R/O filesystem and you're looking at
>>      importing files into the cache externally to the kernel.
> Take the attached as a start.  It's completely untested.  I've stripped out
> anything to do with writing to the cache, making directories, etc. as that can
> probably be delegated to the on-demand creation.  You could drive on-demand
> creation from the points where it would create files.  I've put some "TODO"
> comments in there as markers.

Thanks for your inspiring work. Some questions below.

> You could also strip out everything to do with invalidation and also make it
> just fail if it encounters a file type that it doesn't like or a file that is
> not correctly labelled for a coherency attribute.
> Also, since you aren't intending to write anything or create new files here,
> there's no need to do the space checking - so I've got rid of all that too.
> I've also made it open the backing files read only and got rid of the trimming
> to I/O blocksize for DIO purposes.  The userspace side can take care of that -
> and, besides, you want to have multiple files within a backing file, right?
> You might want to stop it from marking cache *files* in use (but only mark
> directories).  It doesn't matter so much as you aren't going to get coherency
> issues from having multiple writers to the same file.

> You then need to add a file offset member to the erofscache_object struct, set
> that when the backing file is looked up and add it to the file position in
> erofscache_read().  You also need to look at erofscache_prepare_read().  If
> your files are contiguous complete blobs, that can be a lot simpler.

To be honest, I'm not sure if I get your points correctly. Do you mean
each file in erofs has only one chunk (and thus corresponds to only one
backing blob file), so that netfs lib can work well while given the only
cookie associated with the netfs file?

By the way, let me explain the blob mapping in erofs further. To
implement deduplication, one erofs file can be divided into multiple
chunks, while these chunks can be distributed over several backing blob
files quite randomly (rather than a round-robin style). Each erofs file
maintains an on-disk map describing the mapping relationship between
chunks and backing blob files. Something like the extent map. Thus
there's a multi-to-multi relationship between erofs file and backing
blob file.

Thus each erofs file can correspond to multiple cookies in this way,
i.e. one 'struct netfs_i_context' can correspond to multiple cookies.
Besides, the mapping relationship between chunks and backing blob files
is totally implemented in upper fs (i.e. erofs), I have no idea how we
can "do the blob mapping in the backend" [1]. So I don't think we can
use netfs lib **directly** even with this R/O fscache backend
implemented. Please correct me if I misunderstand it.


Besides, IMHO it may suffer great challenges when implementing a new R/O
backend, since there's quite many code duplication. I know it's just a
starting version from scratch, but I'm not sure if it's worth it.


More information about the Linux-erofs mailing list