[PATCH RFC 0/4] erofs: allow page cache sharing

Hongzhen Luo hongzhen at linux.alibaba.com
Sat Jul 5 10:51:45 AEST 2025


On 2025/7/3 20:23, Christian Brauner wrote:
> Hey!
>
> This series is originally from Hongzhen. I'm picking it back up because
> support for page cache sharing is pretty important for container and
> service workloads that want to make use of erofs images. The main
> obstacle currently is the inability to share page cache contents between
> different erofs superblocks.
>
> I think the mechanism that Hongzhen came up with is decent and will
> remove one final obstacle.
>
> However, I have not worked in this area in meaningful ways before so to
> an experienced page cache person this might all look like a little kid
> doodling on a piece of paper.
>
> One obvious question mark I have is around mmap. The current
> implementation mimicks what overlayfs is doing and I'm not sure that
> it's correct or even necessary to mimick overlayfs behavior here at all.
>
> Anyway, I would really appreciate the help!

Hi Christian, glad to hear you're interested in my previous patch – and 
please forgive my delayed

response, as I was swamped with other tasks. Finally catching up now 
that it's the weekend. Due to

work change, I can no longer continue driving this patch series upstream.


This patch series seems to be outdated, and some of the implementations 
are quite hacky. Please

take a look at the latest RFC patch series (v6):
https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/


> [Background]
> ============
> Currently, reading files with different paths (or names) but the same
> content will consume multiple copies of the page cache, even if the
> content of these page caches is the same. For example, reading identical
> files (e.g., *.so files) from two different minor versions of container
> images will cost multiple copies of the same page cache, since different
> containers have different mount points. Therefore, sharing the page cache
> for files with the same content can save memory.
>
> [Implementation]
> ================
> This introduces the page cache share feature in erofs. During the mkfs
> phase, the file content is hashed and the hash value is stored in the
> `trusted.erofs.fingerprint` extended attribute. Inodes of files with the
> same `trusted.erofs.fingerprint` are mapped to the same anonymous inode
> (indicated by the `ano_inode` field). When a read request occurs, the
> anonymous inode serves as a "container" whose page cache is shared. The
> actual operations involving the iomap are carried out by the original
> inode which is mapped to the anonymous inode.
>
> [Effect]
> ========
> I conducted experiments on two aspects across two different minor versions of
> container images:
>
> 1. reading all files in two different minor versions of container images
>
> 2. run workloads or use the default entrypoint within the containers^[1]
>
> Below is the memory usage for reading all files in two different minor
> versions of container images:
>
> +-------------------+------------------+-------------+---------------+
> |       Image       | Page Cache Share | Memory (MB) |    Memory     |
> |                   |                  |             | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     241     |       -       |
> |       redis       +------------------+-------------+---------------+
> |   7.2.4 & 7.2.5   |        Yes       |     163     |      33%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     872     |       -       |
> |      postgres     +------------------+-------------+---------------+
> |    16.1 & 16.2    |        Yes       |     630     |      28%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     2771    |       -       |
> |     tensorflow    +------------------+-------------+---------------+
> |  1.11.0 & 2.11.1  |        Yes       |     2340    |      16%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     926     |       -       |
> |       mysql       +------------------+-------------+---------------+
> |  8.0.11 & 8.0.12  |        Yes       |     735     |      21%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     390     |       -       |
> |       nginx       +------------------+-------------+---------------+
> |   7.2.4 & 7.2.5   |        Yes       |     219     |      44%      |
> +-------------------+------------------+-------------+---------------+
> |       tomcat      |        No        |     924     |       -       |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> |                   |        Yes       |     474     |      49%      |
> +-------------------+------------------+-------------+---------------+
>
> Additionally, the table below shows the runtime memory usage of the
> container:
>
> +-------------------+------------------+-------------+---------------+
> |       Image       | Page Cache Share | Memory (MB) |    Memory     |
> |                   |                  |             | Reduction (%) |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |      35     |       -       |
> |       redis       +------------------+-------------+---------------+
> |   7.2.4 & 7.2.5   |        Yes       |      28     |      20%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     149     |       -       |
> |      postgres     +------------------+-------------+---------------+
> |    16.1 & 16.2    |        Yes       |      95     |      37%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     1028    |       -       |
> |     tensorflow    +------------------+-------------+---------------+
> |  1.11.0 & 2.11.1  |        Yes       |     930     |      10%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |     155     |       -       |
> |       mysql       +------------------+-------------+---------------+
> |  8.0.11 & 8.0.12  |        Yes       |     132     |      15%      |
> +-------------------+------------------+-------------+---------------+
> |                   |        No        |      25     |       -       |
> |       nginx       +------------------+-------------+---------------+
> |   7.2.4 & 7.2.5   |        Yes       |      20     |      20%      |
> +-------------------+------------------+-------------+---------------+
> |       tomcat      |        No        |     186     |       -       |
> | 10.1.25 & 10.1.26 +------------------+-------------+---------------+
> |                   |        Yes       |      98     |      48%      |
> +-------------------+------------------+-------------+---------------+
>
> It can be observed that when reading all the files in the image, the reduced
> memory usage varies from 16% to 49%, depending on the specific image.
> Additionally, the container's runtime memory usage reduction ranges from 10%
> to 48%.
>
> [1] Below are the workload for these images:
> 	- redis: redis-benchmark
> 	- postgres: sysbench
> 	- tensorflow: app.py of tensorflow.python.platform
> 	- mysql: sysbench
> 	- nginx: wrk
> 	- tomcat: default entrypoint
>
> Signed-off-by: Christian Brauner<brauner at kernel.org>
> ---
> Hongzhen Luo (4):
>        erofs: move `struct erofs_anon_fs_type` to super.c
>        erofs: introduce page cache share feature
>        erofs: apply the page cache share feature
>        erofs: introduce .fadvise for page cache share
>
>   fs/erofs/Kconfig           |  10 ++
>   fs/erofs/Makefile          |   1 +
>   fs/erofs/data.c            |  67 +++++++++++
>   fs/erofs/fscache.c         |  13 ---
>   fs/erofs/inode.c           |  15 ++-
>   fs/erofs/internal.h        |  11 ++
>   fs/erofs/pagecache_share.c | 281 +++++++++++++++++++++++++++++++++++++++++++++
>   fs/erofs/pagecache_share.h |  22 ++++
>   fs/erofs/super.c           |  62 ++++++++++
>   fs/erofs/zdata.c           |  32 ++++++
>   10 files changed, 500 insertions(+), 14 deletions(-)
> ---
> base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
> change-id: 20250703-work-erofs-pcs-f6f3d0722401
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linux-erofs/attachments/20250705/1097a754/attachment.htm>


More information about the Linux-erofs mailing list