<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p><br>
</p>
<div class="moz-cite-prefix">On 2025/7/3 20:23, Christian Brauner
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:20250703-work-erofs-pcs-v1-0-0ce1f6be28ee@kernel.org">
<pre wrap="" class="moz-quote-pre">Hey!
This series is originally from Hongzhen. I'm picking it back up because
support for page cache sharing is pretty important for container and
service workloads that want to make use of erofs images. The main
obstacle currently is the inability to share page cache contents between
different erofs superblocks.
I think the mechanism that Hongzhen came up with is decent and will
remove one final obstacle.
However, I have not worked in this area in meaningful ways before so to
an experienced page cache person this might all look like a little kid
doodling on a piece of paper.
One obvious question mark I have is around mmap. The current
implementation mimicks what overlayfs is doing and I'm not sure that
it's correct or even necessary to mimick overlayfs behavior here at all.
Anyway, I would really appreciate the help!</pre>
</blockquote>
<br>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;">Hi
Christian, glad to hear you're interested in my previous patch –
and please forgive my delayed</p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;">response,
as I was swamped with other tasks. Finally catching up now that
it's the weekend. Due to</p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;">work
change, I can no longer continue driving this patch series
upstream.</p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;"><br>
</p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;">This
patch series seems to be outdated, and some of the implementations
are quite hacky. Please</p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;">take
a look at the latest RFC patch series (v6):<br
style="box-sizing: border-box; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-size: 16px; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial;">
<a class="moz-txt-link-freetext" href="https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/">https://lore.kernel.org/all/20250301145002.2420830-1-hongzhen@linux.alibaba.com/</a></p>
<p data-spm-anchor-id="idealab.2ef5001f.0.i834.5a4f3d33JHqciL"
style="box-sizing: border-box; font-size: 16px; margin: 7px 0px; line-height: 20px; font-weight: 400; color: rgb(13, 18, 57); font-family: "PingFang SC", "Helvetica Neue", Helvetica, Arial, Tahoma, "Microsoft YaHei"; font-style: normal; font-variant-ligatures: normal; font-variant-caps: normal; letter-spacing: normal; orphans: 2; text-align: start; text-indent: 0px; text-transform: none; widows: 2; word-spacing: 0px; -webkit-text-stroke-width: 0px; white-space: normal; background-color: rgb(255, 255, 255); text-decoration-thickness: initial; text-decoration-style: initial; text-decoration-color: initial; word-break: break-all;"><br>
</p>
<blockquote type="cite"
cite="mid:20250703-work-erofs-pcs-v1-0-0ce1f6be28ee@kernel.org">
<pre wrap="" class="moz-quote-pre">
[Background]
============
Currently, reading files with different paths (or names) but the same
content will consume multiple copies of the page cache, even if the
content of these page caches is the same. For example, reading identical
files (e.g., *.so files) from two different minor versions of container
images will cost multiple copies of the same page cache, since different
containers have different mount points. Therefore, sharing the page cache
for files with the same content can save memory.
[Implementation]
================
This introduces the page cache share feature in erofs. During the mkfs
phase, the file content is hashed and the hash value is stored in the
`trusted.erofs.fingerprint` extended attribute. Inodes of files with the
same `trusted.erofs.fingerprint` are mapped to the same anonymous inode
(indicated by the `ano_inode` field). When a read request occurs, the
anonymous inode serves as a "container" whose page cache is shared. The
actual operations involving the iomap are carried out by the original
inode which is mapped to the anonymous inode.
[Effect]
========
I conducted experiments on two aspects across two different minor versions of
container images:
1. reading all files in two different minor versions of container images
2. run workloads or use the default entrypoint within the containers^[1]
Below is the memory usage for reading all files in two different minor
versions of container images:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 241 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 163 | 33% |
+-------------------+------------------+-------------+---------------+
| | No | 872 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 630 | 28% |
+-------------------+------------------+-------------+---------------+
| | No | 2771 | - |
| tensorflow +------------------+-------------+---------------+
| 1.11.0 & 2.11.1 | Yes | 2340 | 16% |
+-------------------+------------------+-------------+---------------+
| | No | 926 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 735 | 21% |
+-------------------+------------------+-------------+---------------+
| | No | 390 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 219 | 44% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 924 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 474 | 49% |
+-------------------+------------------+-------------+---------------+
Additionally, the table below shows the runtime memory usage of the
container:
+-------------------+------------------+-------------+---------------+
| Image | Page Cache Share | Memory (MB) | Memory |
| | | | Reduction (%) |
+-------------------+------------------+-------------+---------------+
| | No | 35 | - |
| redis +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 28 | 20% |
+-------------------+------------------+-------------+---------------+
| | No | 149 | - |
| postgres +------------------+-------------+---------------+
| 16.1 & 16.2 | Yes | 95 | 37% |
+-------------------+------------------+-------------+---------------+
| | No | 1028 | - |
| tensorflow +------------------+-------------+---------------+
| 1.11.0 & 2.11.1 | Yes | 930 | 10% |
+-------------------+------------------+-------------+---------------+
| | No | 155 | - |
| mysql +------------------+-------------+---------------+
| 8.0.11 & 8.0.12 | Yes | 132 | 15% |
+-------------------+------------------+-------------+---------------+
| | No | 25 | - |
| nginx +------------------+-------------+---------------+
| 7.2.4 & 7.2.5 | Yes | 20 | 20% |
+-------------------+------------------+-------------+---------------+
| tomcat | No | 186 | - |
| 10.1.25 & 10.1.26 +------------------+-------------+---------------+
| | Yes | 98 | 48% |
+-------------------+------------------+-------------+---------------+
It can be observed that when reading all the files in the image, the reduced
memory usage varies from 16% to 49%, depending on the specific image.
Additionally, the container's runtime memory usage reduction ranges from 10%
to 48%.
[1] Below are the workload for these images:
- redis: redis-benchmark
- postgres: sysbench
- tensorflow: app.py of tensorflow.python.platform
- mysql: sysbench
- nginx: wrk
- tomcat: default entrypoint
Signed-off-by: Christian Brauner <a class="moz-txt-link-rfc2396E" href="mailto:brauner@kernel.org"><brauner@kernel.org></a>
---
Hongzhen Luo (4):
erofs: move `struct erofs_anon_fs_type` to super.c
erofs: introduce page cache share feature
erofs: apply the page cache share feature
erofs: introduce .fadvise for page cache share
fs/erofs/Kconfig | 10 ++
fs/erofs/Makefile | 1 +
fs/erofs/data.c | 67 +++++++++++
fs/erofs/fscache.c | 13 ---
fs/erofs/inode.c | 15 ++-
fs/erofs/internal.h | 11 ++
fs/erofs/pagecache_share.c | 281 +++++++++++++++++++++++++++++++++++++++++++++
fs/erofs/pagecache_share.h | 22 ++++
fs/erofs/super.c | 62 ++++++++++
fs/erofs/zdata.c | 32 ++++++
10 files changed, 500 insertions(+), 14 deletions(-)
---
base-commit: 19272b37aa4f83ca52bdf9c16d5d81bdd1354494
change-id: 20250703-work-erofs-pcs-f6f3d0722401
</pre>
</blockquote>
</body>
</html>