[PATCH v2] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Friendy Su friendy.su at sony.com
Fri Aug 22 18:42:41 AEST 2025


Set proper 'dsunit' to let file body align on huge page on blobdev,

where 'dsunit' * 'blocksize' = huge page size (2M).

When do mmap() a file mounted with dax=always, aligning on huge page
makes kernel map huge page(2M) per page fault exception, compared with
mapping normal page(4K) per page fault.

This greatly improves mmap() performance by reducing times of page
fault being triggered.

Considering deduplication, 'chunksize' should not be smaller than
'dsunit', then after dedupliation, still align on dsunit.

Signed-off-by: Friendy Su <friendy.su at sony.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo at sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer at sony.com>
---
 lib/blobchunk.c  | 15 +++++++++++++++
 man/mkfs.erofs.1 | 15 +++++++++++++++
 mkfs/main.c      | 13 +++++++++++++
 3 files changed, 43 insertions(+)

diff --git a/lib/blobchunk.c b/lib/blobchunk.c
index bbc69cf..e47afb5 100644
--- a/lib/blobchunk.c
+++ b/lib/blobchunk.c
@@ -309,6 +309,21 @@ int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd,
 	minextblks = BLK_ROUND_UP(sbi, inode->i_size);
 	interval_start = 0;
 
+	/* Align file on 'dsunit' */
+	if (sbi->bmgr->dsunit > 1) {
+		off_t off = lseek(blobfile, 0, SEEK_CUR);
+
+		erofs_dbg("Try to round up 0x%llx to align on %d blocks (dsunit)",
+				off, sbi->bmgr->dsunit);
+		off = roundup(off, sbi->bmgr->dsunit * erofs_blksiz(sbi));
+		if (lseek(blobfile, off, SEEK_SET) != off) {
+			ret = -errno;
+			erofs_err("lseek to blobdev 0x%llx error", off);
+			goto err;
+		}
+		erofs_dbg("Aligned on 0x%llx", off);
+	}
+
 	for (pos = 0; pos < inode->i_size; pos += len) {
 #ifdef SEEK_DATA
 		off_t offset = lseek(fd, pos + startoff, SEEK_DATA);
diff --git a/man/mkfs.erofs.1 b/man/mkfs.erofs.1
index 63f7a2f..9075522 100644
--- a/man/mkfs.erofs.1
+++ b/man/mkfs.erofs.1
@@ -168,6 +168,21 @@ the output filesystem, with no leading /.
 .TP
 .BI "\-\-dsunit=" #
 Align all data block addresses to multiples of #.
+
+If \fBdsunit\fR and \fBchunksize\fR are both set, \fBdsunit\fR will be ignored
+if it is bigger than \fBchunksize\fR.
+
+This is for keeping alignment after deduplication.
+If \fBdsunit\fR is bigger, it contains several chunks,
+
+E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=4096
+
+Once 1 chunk is deduplicated, the chunks thereafter will not be aligned any
+longer. In order to achieve the best performance, recommend to set \fBdsunit\fR
+same as \fBchunksize\fR.
+
+E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=$((4096*512))
+
 .TP
 .BI "\-\-exclude-path=" path
 Ignore file that matches the exact literal path.
diff --git a/mkfs/main.c b/mkfs/main.c
index 30804d1..fcb2b89 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1098,6 +1098,19 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
 		return -EINVAL;
 	}
 
+	/*
+	 * once align data on dsunit, in order to keep alignment after deduplication
+	 * chunksize should be equal to or bigger than dsunit.
+	 * if chunksize is smaller than dsunit, e.g. chunksize=4k, dsunit=2M,
+	 * once a chunk is deduplicated, all data thereafter will be unaligned.
+	 * so ignore dsunit under such case.
+	 */
+	if (cfg.c_chunkbits && dsunit && 1u << (cfg.c_chunkbits - g_sbi.blkszbits) < dsunit) {
+		erofs_warn("chunksize %u bytes is smaller than dsunit %u blocks, ignore dsunit !",
+				1u << cfg.c_chunkbits, dsunit);
+		dsunit = 0;
+	}
+
 	if (pclustersize_packed) {
 		if (pclustersize_packed < erofs_blksiz(&g_sbi) ||
 		    pclustersize_packed % erofs_blksiz(&g_sbi)) {
-- 
2.34.1



More information about the Linux-erofs mailing list