[PATCH RFC 0/4] erofs: allow page cache sharing
Hongzhen Luo
hongzhen at linux.alibaba.com
Sat Jul 5 10:51:45 AEST 2025
On 2025/7/3 20:23, Christian Brauner wrote:
> Hey!
>
> This series is originally from Hongzhen. I'm picking it back up because
> support for page cache sharing is pretty important for container and
> service workloads that want to make use of erofs images. The main
> obstacle currently is the inability to share page cache contents between
> different erofs superblocks.
>
> I think the mechanism that Hongzhen came up with is decent and will
> remove one final obstacle.
>
> However, I have not worked in this area in meaningful ways before so to
> an experienced page cache person this might all look like a little kid
> doodling on a piece of paper.
>
> One obvious question mark I have is around mmap. The current
> implementation mimicks what overlayfs is doing and I'm not sure that
> it's correct or even necessary to mimick overlayfs behavior here at all.
>
> Anyway, I would really appreciate the help!
Hi Christian, glad to hear you're interested in my previous patch – and
please forgive my delayed
response, as I was swamped with other tasks. Finally catching up now
that it's the weekend. Due to
work change, I can no longer continue driving this patch series upstream.
This patch series seems to be outdated, and some of the implementations
are quite hacky. Please
take a look at the latest RFC patch series (v6):
https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/
> [Background]
> ============
> Currently, reading files with different paths (or names) but the same
> content will consume multiple copies of the page cache, even if the
> content of these page caches is the same. For example, reading identical
> files (e.g., *.so files) from two different minor versions of container
> images will cost multiple copies of the same page cache, since different
> containers have different mount points. Therefore, sharing the page cache
> for files with the same content can save memory.
>
> [Implementation]
> ================
> This introduces the page cache share feature in erofs. During the mkfs
> phase, the file content is hashed and the hash value is stored in the
> `trusted.erofs.fingerprint` extended attribute. Inodes of files with the
> same `trusted.erofs.fingerprint` are mapped to the same anonymous inode
> (indicated by the `ano_inode` field). When a read request occurs, the
> anonymous inode serves as a "container" whose page cache is shared. The
> actual operations involving the iomap are carried out by the original
> inode which is mapped to the anonymous inode.
>
> [Effect]
> ========
> I conducted experiments on two aspects across two different minor versions of
> container images:
>
> 1. reading all files in two different minor versions of container images
>
> 2. run workloads or use the default entrypoint within the containers^[1]
>
> Below is the memory usage for reading all files in two different minor
> versions of container images:
>
> +-------------------+------------------+-------------+---------------+
> | Image | Page Cache Share | Memory (MB) | Memory |
> | | | | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> | | No | 241 | - |
> | redis +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 163 | 33% |
> +-------------------+------------------+-------------+---------------+
> | | No | 872 | - |
> | postgres +------------------+-------------+---------------+
> | 16.1 & 16.2 | Yes | 630 | 28% |
> +-------------------+------------------+-------------+---------------+
> | | No | 2771 | - |
> | tensorflow +------------------+-------------+---------------+
> | 1.11.0 & 2.11.1 | Yes | 2340 | 16% |
> +-------------------+------------------+-------------+---------------+
> | | No | 926 | - |
> | mysql +------------------+-------------+---------------+
> | 8.0.11 & 8.0.12 | Yes | 735 | 21% |
> +-------------------+------------------+-------------+---------------+
> | | No | 390 | - |
> | nginx +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 219 | 44% |
> +-------------------+------------------+-------------+---------------+
> | tomcat | No | 924 | - |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> | | Yes | 474 | 49% |
> +-------------------+------------------+-------------+---------------+
>
> Additionally, the table below shows the runtime memory usage of the
> container:
>
> +-------------------+------------------+-------------+---------------+
> | Image | Page Cache Share | Memory (MB) | Memory |
> | | | | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> | | No | 35 | - |
> | redis +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 28 | 20% |
> +-------------------+------------------+-------------+---------------+
> | | No | 149 | - |
> | postgres +------------------+-------------+---------------+
> | 16.1 & 16.2 | Yes | 95 | 37% |
> +-------------------+------------------+-------------+---------------+
> | | No | 1028 | - |
> | tensorflow +------------------+-------------+---------------+
> | 1.11.0 & 2.11.1 | Yes | 930 | 10% |
> +-------------------+------------------+-------------+---------------+
> | | No | 155 | - |
> | mysql +------------------+-------------+---------------+
> | 8.0.11 & 8.0.12 | Yes | 132 | 15% |
> +-------------------+------------------+-------------+---------------+
> | | No | 25 | - |
> | nginx +------------------+-------------+---------------+
> | 7.2.4 & 7.2.5 | Yes | 20 | 20% |
> +-------------------+------------------+-------------+---------------+
> | tomcat | No | 186 | - |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> | | Yes | 98 | 48% |
> +-------------------+------------------+-------------+---------------+
>
> It can be observed that when reading all the files in the image, the reduced
> memory usage varies from 16% to 49%, depending on the specific image.
> Additionally, the container's runtime memory usage reduction ranges from 10%
> to 48%.
>
> [1] Below are the workload for these images:
> - redis: redis-benchmark
> - postgres: sysbench
> - tensorflow: app.py of tensorflow.python.platform
> - mysql: sysbench
> - nginx: wrk
> - tomcat: default entrypoint
>
> Signed-off-by: Christian Brauner<brauner at kernel.org>
> ---
> Hongzhen Luo (4):
> erofs: move `struct erofs_anon_fs_type` to super.c
> erofs: introduce page cache share feature
> erofs: apply the page cache share feature
> erofs: introduce .fadvise for page cache share
>
> fs/erofs/Kconfig | 10 ++
> fs/erofs/Makefile | 1 +
> fs/erofs/data.c | 67 +++++++++++
> fs/erofs/fscache.c | 13 ---
> fs/erofs/inode.c | 15 ++-
> fs/erofs/internal.h | 11 ++
> fs/erofs/pagecache_share.c | 281 +++++++++++++++++++++++++++++++++++++++++++++
> fs/erofs/pagecache_share.h | 22 ++++
> fs/erofs/super.c | 62 ++++++++++
> fs/erofs/zdata.c | 32 ++++++
> 10 files changed, 500 insertions(+), 14 deletions(-)
> ---
> base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
> change-id: 20250703-work-erofs-pcs-f6f3d0722401
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linux-erofs/attachments/20250705/1097a754/attachment.htm>
More information about the Linux-erofs
mailing list