[RFC PATCH] erofs-utils: mkfs: introduce multi-thread compression
zhaoyifan at sjtu.edu.cn
zhaoyifan at sjtu.edu.cn
Sun Aug 20 12:44:06 AEST 2023
> -----Original Message-----
> From: Gao Xiang <hsiangkao at linux.alibaba.com>
> Sent: Sunday, August 20, 2023 2:41 AM
> To: Yifan Zhao <zhaoyifan at sjtu.edu.cn>; linux-erofs at lists.ozlabs.org
> Subject: Re: [RFC PATCH] erofs-utils: mkfs: introduce multi-thread
> compression
>
> Hi Yifan,
>
> On 2023/8/20 02:01, Yifan Zhao wrote:
> > This patch introduce multi-thread compression to accelerate image
> > packaging.
> > ---
> > Hi all:
> >
> > This is a very imperfect patch not ready for merging, and any suggestions
> would be appreciated!
> > If it's on track, I'd like to follow up on that.
> >
> > The inefficiency of EROFS compressed image creation is a much
> > criticized problem, and this patch attempts to address by creating
> > multiple threads to run the compression algorithm in parallel.
>
> Many thanks if you could have time following on that.
>
> Yet due to the release process timing, erofs-utils 1.7 will be released in about
> a month, so I think multithreaded support will be supported as part of erofs-
> utils v1.8.
>
OK.
> >
> > Specifically, each input file over 16MB is split into segments, and
> > each thread compresses a segment as if it were a separate file.
> > Finally, the main thread merges all the compressed segments into one file.
> > This process does not involve any data contention.
> >
> > Current issues:
> > 1. For each large file, we create and destroy a batch of worker threads,
> causing unnecessary overhead.
> > Moreover, each worker thread's context is a global variable, making
> the binary bigger.
> > In the future, we can pre-create worker threads when the program
> starts running.
> > Worker threads serve as consumers and the main thread that makes
> the compression request is the producer.
>
> I'd suggest if we could use (or enhance?)
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-
> utils.git/commit/?id=e5b83309b199966cc757cb095d1ff1ebd0923b3e
>
> as a start?
I missed this and I think it's a great base, thanks!
>
> > 2. Fragment/Dedupe together with other advanced features are not
> fully tested
> > due to my poor knowledge of the compression process. Not sure if
> they work well with multithreading.
>
> I have a preliminary design of Fragment/Dedupe, we could talk more details
> later if you'd like to take more time on this, thanks! ;)
>
OK.
> > 3. There is a lot of code redundancy between the
> > erofs_write_compressed_file() and
> erofs_write_compressed_file_single() functions.
> > I don't want to break the original single-threaded execution logic,
> > but erofs_write_compressed_file() has a high complexity and
> > my failed attempt to merge the two functions makes the matter
> worse.
> > I'm not sure if we should merge them together or keep two different
> function entries for single and multi-threaded compression.
> I think we need to merge these finally.
OK.
>
> >
> > Performance:
> > Despite the naive patch, we still see performance gain due to the
> poor baseline performance especially for lz4hc.
> > 1. Packing time of an Arch linux container image [1] provided by
> @wszqkzqk [2].
> > lz4 : 8s(multi-thread) v.s. 10s(single-thread)
> > lz4hc: 48s(multi-thread) v.s. 54s(single-thread)
> > 2. Packint time of Linux v6.4 git repository (with several ~GB git object
> files).
> > lz4 : 14s(multi-thread) v.s. 23s(single-thread)
> > lz4hc: 49s(multi-thread) v.s. 212s(single-thread)
>
> That is reasonable anyway, but in order to make multi-threaded support
> better, some code needs to be refactored first.
>
> Actually I'm have some cleanup patches to prepare for multithreaded
> support on hand, but I will apply these after 1.7 is released, again.
>
OK.
> >
> > BTW, is there any format file (e.g., .clang-format) available for me to
> format erofs-utils project?
>
> Not yet, erofs-utils follows Linux kernel coding style, would you mind submit
> a patch for this?
>
I'll give it a try.
> Thanks,
> Gao Xiang
Thanks,
Yifan Zhao
More information about the Linux-erofs
mailing list