[PATCH RFC 2/4] erofs: introduce page cache share feature

Gao Xiang hsiangkao at linux.alibaba.com
Sat Jul 5 07:06:04 AEST 2025


Hi Christian,

On 2025/7/3 20:23, Christian Brauner wrote:

...

> diff --git a/fs/erofs/pagecache_share.c b/fs/erofs/pagecache_share.c
> new file mode 100644
> index 000000000000..309b33cc6c30
> --- /dev/null
> +++ b/fs/erofs/pagecache_share.c
> @@ -0,0 +1,204 @@
> +// SPDX-License-Identifier: GPL-2.0-or-later
> +/*
> + * Copyright (C) 2024, Alibaba Cloud
> + */
> +#include <linux/xxhash.h>
> +#include <linux/refcount.h>
> +#include "pagecache_share.h"
> +#include "internal.h"
> +#include "xattr.h"
> +
> +#define PCS_FPRT_IDX	4
> +#define PCS_FPRT_NAME	"erofs.fingerprint"
> +#define PCS_FPRT_MAXLEN (sizeof(size_t) + 1024)

One thing I told Hongzhen to work on is that I really
don't like a hardcode xattr like this.

Because EROFS can store common long xattr prefixes, see:
https://docs.kernel.org/filesystems/erofs.html#long-extended-attribute-name-prefixes

So it would be nice to just record a name prefix in the
ondisk superblock so that users can use their own xattr
names for this usage.

For example, users could use "overlay.metacopy" xattr
as page cache sharing fingerprint to identify different
inodes if overlayfs fsverity feature is on, see:
https://docs.kernel.org/filesystems/overlayfs.html#fs-verity-support

But if you really don't have more time to know the EROFS
internals here, you could just leave as-is.  I could
handle myself.

> +

...

> +}
> +
> +/*
> + * TODO: Hm, could we leverage our fancy new backing file infrastructure
> + * as for overlayfs and fuse?

If some code can be lifted up as a vfs helper, it would be
much helpful as the backing file infrastructure was lifted
from overlayfs.

But I'm not sure if it's really needed for now anyway
because it's only EROFS-specific, and I only maintain and
can speak of EROFS.

> + */
> +static struct file *erofs_pcs_alloc_file(struct file *file,
> +					 struct inode *ano_inode)
> +{
> +	struct file *ano_file;
> +
> +	ano_file = alloc_file_pseudo(ano_inode, erofs_pcs_mnt, "[erofs_pcs_f]",
> +				     O_RDONLY, &erofs_file_fops);
> +	file_ra_state_init(&ano_file->f_ra, file->f_mapping);
> +	ano_file->private_data = EROFS_I(file_inode(file));
> +	return ano_file;
> +}
> +

...

> +
> +/*
> + * TODO: Amir, you've got some experience in this area due to overlayfs
> + * and fuse. Does that work?
> + */



Hi Amir,

I do think it will work, if you have chance please help
take a quick look too.

It's much similar to overlayfs, the difference is that the real
inodes is not in some other fs, but anon inodes from a pseudo
sb which shares among the whole host to share page cache for
containers.

> +static int erofs_pcs_mmap(struct file *file, struct vm_area_struct *vma)
> +{
> +	struct file *ano_file = file->private_data;
> +
> +	vma_set_file(vma, ano_file);
> +	vma->vm_ops = &generic_file_vm_ops;
> +	return 0;
> +}
> +
> +const struct file_operations erofs_pcs_file_fops = {
> +	.open		= erofs_pcs_file_open,
> +	/*
> +	 * TODO: Why doesn't .llseek require similar treatment as
> +	 * .read_iter?
> +	 */

I don't know some specific reason because it wrote by
Hongzhen.

Hongzhen is still at work until by the end of the month,
I hope he could answer some question.

> +	.llseek		= generic_file_llseek,
> +	.read_iter	= erofs_pcs_file_read_iter,
> +	.mmap		= erofs_pcs_mmap,
> +	.release	= erofs_pcs_file_release,
> +	.get_unmapped_area = thp_get_unmapped_area,
> +	.splice_read	= filemap_splice_read,
> +};
> diff --git a/fs/erofs/pagecache_share.h b/fs/erofs/pagecache_share.h
> new file mode 100644
> index 000000000000..b8111291cf79
> --- /dev/null
> +++ b/fs/erofs/pagecache_share.h
> @@ -0,0 +1,20 @@
> +/* SPDX-License-Identifier: GPL-2.0-or-later */
> +/*
> + * Copyright (C) 2024, Alibaba Cloud
> + */


BTW, it seems that this header is too small, maybe just
fold them into internal.h.

Thanks,
Gao Xiang

> +#ifndef __EROFS_PAGECACHE_SHARE_H
> +#define __EROFS_PAGECACHE_SHARE_H
> +
> +#include <linux/fs.h>
> +#include <linux/mount.h>
> +#include <linux/rwlock.h>
> +#include <linux/mutex.h>
> +#include "internal.h"
> +
> +int erofs_pcs_init_mnt(void);
> +void erofs_pcs_free_mnt(void);
> +void erofs_pcs_fill_inode(struct inode *inode);
> +
> +extern const struct vm_operations_struct generic_file_vm_ops;
> +
> +#endif
> 



More information about the Linux-erofs mailing list