[PATCH v2] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Friendy.Su at sony.com Friendy.Su at sony.com
Fri Aug 22 19:05:55 AEST 2025


Hi, Gao,

> It should be

>        if (sbi->bmgr->dsunit >= 1u << (cfg.c_chunkbits - g_sbi.blkszbits)) {

>        }

In main.c, dsunit is set to 0 if warns.

+       if (cfg.c_chunkbits && dsunit && 1u << (cfg.c_chunkbits - g_sbi.blkszbits) < dsunit) {
+               erofs_warn("chunksize %u bytes is smaller than dsunit %u blocks, ignore dsunit !",
+                               1u << cfg.c_chunkbits, dsunit);
+               dsunit = 0;
+       }

so here sbi->bmgr->dsunit is 0. 



________________________________________
From: Gao Xiang <hsiangkao at linux.alibaba.com>
Sent: Friday, August 22, 2025 16:55
To: Su, Friendy; linux-erofs at lists.ozlabs.org
Cc: Mo, Yuezhang; Palmer, Daniel (SGC)
Subject: Re: [PATCH v2] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Hi Friendy, On 2025/8/22 16: 42, Friendy Su wrote: > Set proper 'dsunit' to let file body align on huge page on blobdev, > > where 'dsunit' * 'blocksize' = huge page size (2M). > > When do mmap() a file mounted with dax=always,


Hi Friendy,

On 2025/8/22 16:42, Friendy Su wrote:
> Set proper 'dsunit' to let file body align on huge page on blobdev,
>
> where 'dsunit' * 'blocksize' = huge page size (2M).
>
> When do mmap() a file mounted with dax=always, aligning on huge page
> makes kernel map huge page(2M) per page fault exception, compared with
> mapping normal page(4K) per page fault.
>
> This greatly improves mmap() performance by reducing times of page
> fault being triggered.
>
> Considering deduplication, 'chunksize' should not be smaller than
> 'dsunit', then after dedupliation, still align on dsunit.
>
> Signed-off-by: Friendy Su <friendy.su at sony.com>
> Reviewed-by: Yuezhang Mo <Yuezhang.Mo at sony.com>
> Reviewed-by: Daniel Palmer <daniel.palmer at sony.com>
> ---
>   lib/blobchunk.c  | 15 +++++++++++++++
>   man/mkfs.erofs.1 | 15 +++++++++++++++
>   mkfs/main.c      | 13 +++++++++++++
>   3 files changed, 43 insertions(+)
>
> diff --git a/lib/blobchunk.c b/lib/blobchunk.c
> index bbc69cf..e47afb5 100644
> --- a/lib/blobchunk.c
> +++ b/lib/blobchunk.c
> @@ -309,6 +309,21 @@ int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd,
>       minextblks = BLK_ROUND_UP(sbi, inode->i_size);
>       interval_start = 0;
>
> +     /* Align file on 'dsunit' */
> +     if (sbi->bmgr->dsunit > 1) {

It should be

        if (sbi->bmgr->dsunit >= 1u << (cfg.c_chunkbits - g_sbi.blkszbits)) {

        }

?


> +             off_t off = lseek(blobfile, 0, SEEK_CUR);
> +
> +             erofs_dbg("Try to round up 0x%llx to align on %d blocks (dsunit)",
> +                             off, sbi->bmgr->dsunit);
> +             off = roundup(off, sbi->bmgr->dsunit * erofs_blksiz(sbi));
> +             if (lseek(blobfile, off, SEEK_SET) != off) {
> +                     ret = -errno;
> +                     erofs_err("lseek to blobdev 0x%llx error", off);
> +                     goto err;
> +             }
> +             erofs_dbg("Aligned on 0x%llx", off);

Could we combine these two debugging messages into one?

> +     }
> +
>       for (pos = 0; pos < inode->i_size; pos += len) {
>   #ifdef SEEK_DATA
>               off_t offset = lseek(fd, pos + startoff, SEEK_DATA);
> diff --git a/man/mkfs.erofs.1 b/man/mkfs.erofs.1
> index 63f7a2f..9075522 100644
> --- a/man/mkfs.erofs.1
> +++ b/man/mkfs.erofs.1
> @@ -168,6 +168,21 @@ the output filesystem, with no leading /.
>   .TP
>   .BI "\-\-dsunit=" #
>   Align all data block addresses to multiples of #.
> +
> +If \fBdsunit\fR and \fBchunksize\fR are both set, \fBdsunit\fR will be ignored
> +if it is bigger than \fBchunksize\fR.
> +
> +This is for keeping alignment after deduplication.
> +If \fBdsunit\fR is bigger, it contains several chunks,
> +
> +E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=4096
> +
> +Once 1 chunk is deduplicated, the chunks thereafter will not be aligned any
> +longer. In order to achieve the best performance, recommend to set \fBdsunit\fR
> +same as \fBchunksize\fR.
> +
> +E.g. \fBblock-size\fR=4096, \fBdsunit\fR=512 (2M), \fBchunksize\fR=$((4096*512))
> +
>   .TP
>   .BI "\-\-exclude-path=" path
>   Ignore file that matches the exact literal path.
> diff --git a/mkfs/main.c b/mkfs/main.c
> index 30804d1..fcb2b89 100644
> --- a/mkfs/main.c
> +++ b/mkfs/main.c
> @@ -1098,6 +1098,19 @@ static int mkfs_parse_options_cfg(int argc, char *argv[])
>               return -EINVAL;
>       }
>
> +     /*
> +      * once align data on dsunit, in order to keep alignment after deduplication
> +      * chunksize should be equal to or bigger than dsunit.
> +      * if chunksize is smaller than dsunit, e.g. chunksize=4k, dsunit=2M,
> +      * once a chunk is deduplicated, all data thereafter will be unaligned.
> +      * so ignore dsunit under such case.
> +      */
> +     if (cfg.c_chunkbits && dsunit && 1u << (cfg.c_chunkbits - g_sbi.blkszbits) < dsunit) {
> +             erofs_warn("chunksize %u bytes is smaller than dsunit %u blocks, ignore dsunit !",
> +                             1u << cfg.c_chunkbits, dsunit);

One tab is not 8 spaces here? it seems indent misalignment.

Thanks,
Gao Xiang



More information about the Linux-erofs mailing list