[PATCH v1] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Wed Aug 20 19:00:14 AEST 2025

Hi, Gao, 

Thanks for your review !

> As for this patch, what if the inode itself is
> chunk-deduplicated, could we apply this if the inode
> only has one new chunk instead at least for now?

Do you mean inode has 3 chunks, chunk#2 and chunk#3 duplicate chunck#1?

This patch only makes the 1st chunk exactly written to blobdev aligned on dsunit. Deduplicated chunks will not be written to blobdev.

In example above, chunk#1 is written to dsunit aligned block addr, chunk#2,#3 are not written. The next file will be written from next dsunit alignment addr.

Best Regards
Friendy Su

________________________________________
From: Gao Xiang <hsiangkao at linux.alibaba.com>
Sent: Wednesday, August 20, 2025 15:44
To: Su, Friendy; linux-erofs at lists.ozlabs.org
Cc: Mo, Yuezhang; Palmer, Daniel (SGC)
Subject: Re: [PATCH v1] erofs-utils: mkfs: Implement 'dsunit' alignment on blobdev

Hi Friendly, On 2025/8/20 15: 23, Friendy Su wrote: > Set proper 'dsunit' to let file body align on huge page on blobdev, > > where 'dsunit' * 'blocksize' = huge page size (2M). > > When do mmap() a file mounted with dax=always,

Hi Friendly,

On 2025/8/20 15:23, Friendy Su wrote:
> Set proper 'dsunit' to let file body align on huge page on blobdev,
>
> where 'dsunit' * 'blocksize' = huge page size (2M).
>
> When do mmap() a file mounted with dax=always, aligning on huge page
> makes kernel map huge page(2M) per page fault exception, compared with
> mapping normal page(4K) per page fault.
>
> This greatly improves mmap() performance by reducing times of page
> fault being triggered.
>
> Signed-off-by: Friendy Su <friendy.su at sony.com>
> Reviewed-by: Yuezhang Mo <Yuezhang.Mo at sony.com>
> Reviewed-by: Daniel Palmer <daniel.palmer at sony.com>

Sigh, thanks for the patch! Actually the blob chunk implementation
(this file) is now an implementation debt since:

  1) In principle, chunks (and deduplicated pclusters) should
     across blobs (considering the main device is also a blob);

  2) Each blob should have its own space allocation context
     which is independent to the in-memory chunk records...

My mid-term plan tends to use the current metabox way
(i.e. use buffer management for all extra blobs too..)

> ---
>   lib/blobchunk.c | 15 +++++++++++++++
>   1 file changed, 15 insertions(+)
>
> diff --git a/lib/blobchunk.c b/lib/blobchunk.c
> index bbc69cf..e47afb5 100644
> --- a/lib/blobchunk.c
> +++ b/lib/blobchunk.c
> @@ -309,6 +309,21 @@ int erofs_blob_write_chunked_file(struct erofs_inode *inode, int fd,
>       minextblks = BLK_ROUND_UP(sbi, inode->i_size);
>       interval_start = 0;
>
> +     /* Align file on 'dsunit' */
> +     if (sbi->bmgr->dsunit > 1) {
> +             off_t off = lseek(blobfile, 0, SEEK_CUR);
> +
> +             erofs_dbg("Try to round up 0x%llx to align on %d blocks (dsunit)",
> +                             off, sbi->bmgr->dsunit);
> +             off = roundup(off, sbi->bmgr->dsunit * erofs_blksiz(sbi));
> +             if (lseek(blobfile, off, SEEK_SET) != off) {
> +                     ret = -errno;
> +                     erofs_err("lseek to blobdev 0x%llx error", off);
> +                     goto err;
> +             }
> +             erofs_dbg("Aligned on 0x%llx", off);
> +     }

But since you have use case on the current chunk
approach, so...

As for this patch, what if the inode itself is
chunk-deduplicated, could we apply this if the inode
only has one new chunk instead at least for now?

Thanks,
Gao Xiang