[PATCH v4] erofs-utils: mkfs: support fragment deduplication

Yue Hu zbestahu at gmail.com
Fri Dec 2 22:10:42 AEDT 2022


On Thu, 01 Dec 2022 20:49:46 +0800
"Yue Hu" <huyue2 at coolpad.com> wrote:

> Add a missing change:
> - change to generate a ctx for duplicate fragment in compression.
> 
> On Thu,  1 Dec 2022 19:16:15 +0800
> Yue Hu <zbestahu at gmail.com> wrote:
> 
> > From: Yue Hu <huyue2 at coolpad.com>
> > 
> > Previously, there's no fragment deduplication when this feature is
> > introduced. Let's support it now.
> > 
> > We intend to dedupe the fragments before compression, so that duplicate
> > parts will not be written into packed inode.
> > 
> > With this patch, for Linux 5.10.1 + 5.10.87 source code:
> > 
> > [before]
> >  32k pcluster + T0 + lz4hc,12 + fragment		450M
> >  64k pcluster + T0 + lz4hc,12 + fragment		434M
> > 128k pcluster + T0 + lz4hc,12 + fragment		426M
> >  32k pcluster + T0 + lz4hc,12 + fragment + dedupe	368M
> >  64k pcluster + T0 + lz4hc,12 + fragment + dedupe	380M
> > 128k pcluster + T0 + lz4hc,12 + fragment + dedupe	395M
> > 
> > [after]
> >  32k pcluster + T0 + lz4hc,12 + fragment		311M
> >  64k pcluster + T0 + lz4hc,12 + fragment		295M
> > 128k pcluster + T0 + lz4hc,12 + fragment		287M
> >  32k pcluster + T0 + lz4hc,12 + fragment + dedupe	286M
> >  64k pcluster + T0 + lz4hc,12 + fragment + dedupe	281M
> > 128k pcluster + T0 + lz4hc,12 + fragment + dedupe	278M
> > 
> > Tested on SquashFS (which uses level 12 by default for lz4hc):
> > 
> >  32k block + lz4hc		332M
> >  64k block + lz4hc		304M
> > 128k block + lz4hc		283M
> > 256k block + lz4hc		273M
> > 256k block + lz4hc + noI	278M
> > 
> > Suggested-by: Gao Xiang <hsiangkao at linux.alibaba.com>
> > Signed-off-by: Yue Hu <huyue2 at coolpad.com>
> > ---
> > v4:
> > - renaming include tofcrc/new_fragmentsize
> > - move fixup into ctx
> > - use may_fixing to check packing fragment or not
> > - move sb/inode flag + 64bits case from erofs_pack_fragments() to new
> >   helper erofs_fragments_commit()
> > - move recompress ahead of may_inline case when compressing succeeds
> > - update commit message/code comments
> > - note that decompress will fail when enable ztailpacking at the same
> >   time, need some time to debug

No need to care may_inline case if we find duplicate fragment.

-               bool may_inline = (cfg.c_ztailpacking && final);
+               bool may_inline = (cfg.c_ztailpacking && final &&
+                                  !inode->fragment_size);

Should be included in v5.

> > 
> > v3:



More information about the Linux-erofs mailing list