[PATCH v3] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Friendy Su friendy.su at sony.com
Sat Aug 23 18:34:53 AEST 2025


Set proper 'dsunit' to let file body align on huge page on blobdev,

where 'dsunit' * 'blocksize' = huge page size (2M).

When do mmap() a file mounted with dax=always, aligning on huge page
makes kernel map huge page(2M) per page fault exception, compared with
mapping normal page(4K) per page fault.

This greatly improves mmap() performance by reducing times of page
fault being triggered.

Considering deduplication, 'chunksize' should not be smaller than
'dsunit', then after dedupliation, still align on dsunit.

Signed-off-by: Friendy Su <friendy.su at sony.com>
Reviewed-by: Yuezhang Mo <Yuezhang.Mo at sony.com>
Reviewed-by: Daniel Palmer <daniel.palmer at sony.com>
---
 lib/blobchunk.c  | 18 ++++++++++++++++++
 man/mkfs.erofs.1 | 15 +++++++++++++++
 mkfs/main.c      | 12 ++++++++++++
 3 files changed, 45 insertions(+)

diff --git a/lib/blobchunk.c b/lib/blobchunk.c
index bbc69cf..69c70e9 100644
--- a/lib/blobchunk.c
+++ b/lib/blobchunk.c
@@ -309,6 +309,24 @@ int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd,
 	minextblks = BLK_ROUND_UP(sbi, inode->i_size);
 	interval_start = 0;
 
+	/*
+	 * dsunit <= chunksize, deduplication will not cause unalignment,
+	 * we can do align with confidence
+	 */
+	if (sbi->bmgr->dsunit > 1 &&
+	    sbi->bmgr->dsunit <= 1u << (chunkbits - sbi->blkszbits)) {
+		off_t off = lseek(blobfile, 0, SEEK_CUR);
+
+		off = roundup(off, sbi->bmgr->dsunit * erofs_blksiz(sbi));
+		if (lseek(blobfile, off, SEEK_SET) != off) {
+			ret = -errno;
+			erofs_err("lseek to blobdev 0x%llx error", off);
+			goto err;
+		}
+		erofs_dbg("Align /%s on block #%d (0x%llx)",
+			  erofs_fspath(inode->i_srcpath), erofs_blknr(sbi, off), off);
+	}
+
 	for (pos = 0; pos < inode->i_size; pos += len) {
 #ifdef SEEK_DATA
 		off_t offset = lseek(fd, pos + startoff, SEEK_DATA);
diff --git a/man/mkfs.erofs.1 b/man/mkfs.erofs.1
index 63f7a2f..9075522 100644
--- a/man/mkfs.erofs.1
+++ b/man/mkfs.erofs.1
@@ -168,6 +168,21 @@ the output filesystem, with no leading /.
 .TP
 .BI "\-\-dsunit=" #
 Align all data block addresses to multiples of #.
+
+If \fBdsunit\fR and \fBchunksize\fR are both set, \fBdsunit\fR will be ignored
+if it is bigger than \fBchunksize\fR.
+
+This is for keeping alignment after deduplication.
+If \fBdsunit\fR is bigger, it contains several chunks,
+
+E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=4096
+
+Once 1 chunk is deduplicated, the chunks thereafter will not be aligned any
+longer. In order to achieve the best performance, recommend to set \fBdsunit\fR
+same as \fBchunksize\fR.
+
+E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=$((4096*512))
+
 .TP
 .BI "\-\-exclude-path=" path
 Ignore file that matches the exact literal path.
diff --git a/mkfs/main.c b/mkfs/main.c
index 30804d1..5ca098b 100644
--- a/mkfs/main.c
+++ b/mkfs/main.c
@@ -1098,6 +1098,18 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
 		return -EINVAL;
 	}
 
+	/*
+	 * once align data on dsunit, in order to keep alignment after deduplication
+	 * chunksize should be equal to or bigger than dsunit.
+	 * if chunksize is smaller than dsunit, e.g. chunksize=4k, dsunit=2M,
+	 * once a chunk is deduplicated, all data thereafter will be unaligned.
+	 * so show warning msg here, then NOT do alignment when write file data.
+	 */
+	if (cfg.c_chunkbits && dsunit && 1u << (cfg.c_chunkbits - g_sbi.blkszbits) < dsunit) {
+		erofs_warn("chunksize %u bytes is smaller than dsunit %u blocks, ignore dsunit !",
+			   1u << cfg.c_chunkbits, dsunit);
+	}
+
 	if (pclustersize_packed) {
 		if (pclustersize_packed < erofs_blksiz(&g_sbi) ||
 		    pclustersize_packed % erofs_blksiz(&g_sbi)) {
-- 
2.34.1



More information about the Linux-erofs mailing list