[RFC PATCH] erofs-utils: mkfs: introduce multi-thread compression

zhaoyifan at sjtu.edu.cn zhaoyifan at sjtu.edu.cn
Sun Aug 20 12:44:06 AEST 2023


> -----Original Message-----
> From: Gao Xiang <hsiangkao at linux.alibaba.com>
> Sent: Sunday, August 20, 2023 2:41 AM
> To: Yifan Zhao <zhaoyifan at sjtu.edu.cn>; linux-erofs at lists.ozlabs.org
> Subject: Re: [RFC PATCH] erofs-utils: mkfs: introduce multi-thread
> compression
> 
> Hi Yifan,
> 
> On 2023/8/20 02:01, Yifan Zhao wrote:
> > This patch introduce multi-thread compression to accelerate image
> > packaging.
> > ---
> > Hi all:
> >
> > This is a very imperfect patch not ready for merging, and any suggestions
> would be appreciated!
> > If it's on track, I'd like to follow up on that.
> >
> > The inefficiency of EROFS compressed image creation is a much
> > criticized problem, and this patch attempts to address by creating
> > multiple threads to run the compression algorithm in parallel.
> 
> Many thanks if you could have time following on that.
> 
> Yet due to the release process timing, erofs-utils 1.7 will be released in about
> a month, so I think multithreaded support will be supported as part of erofs-
> utils v1.8.
> 

OK.

> >
> > Specifically, each input file over 16MB is split into segments, and
> > each thread compresses a segment as if it were a separate file.
> > Finally, the main thread merges all the compressed segments into one file.
> > This process does not involve any data contention.
> >
> > Current issues:
> > 1.	For each large file, we create and destroy a batch of worker threads,
> causing unnecessary overhead.
> > 	Moreover, each worker thread's context is a global variable, making
> the binary bigger.
> > 	In the future, we can pre-create worker threads when the program
> starts running.
> > 	Worker threads serve as consumers and the main thread that makes
> the compression request is the producer.
> 
> I'd suggest if we could use (or enhance?)
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-
> utils.git/commit/?id=e5b83309b199966cc757cb095d1ff1ebd0923b3e
> 
> as a start?

I missed this and I think it's a great base, thanks!

> 
> > 2.	Fragment/Dedupe together with other advanced features are not
> fully tested
> > 	due to my poor knowledge of the compression process. Not sure if
> they work well with multithreading.
> 
> I have a preliminary design of Fragment/Dedupe, we could talk more details
> later if you'd like to take more time on this, thanks! ;)
> 

OK.

> > 3.	There is a lot of code redundancy between the
> > 	erofs_write_compressed_file() and
> erofs_write_compressed_file_single() functions.
> > 	I don't want to break the original single-threaded execution logic,
> > 	but erofs_write_compressed_file() has a high complexity and
> > 	my failed attempt to merge the two functions makes the matter
> worse.
> > 	I'm not sure if we should merge them together or keep two different
> function entries for single and multi-threaded compression.
> I think we need to merge these finally.

OK.

> 
> >
> > Performance:
> > 	Despite the naive patch, we still see performance gain due to the
> poor baseline performance especially for lz4hc.
> > 	1. Packing time of an Arch linux container image [1] provided by
> @wszqkzqk [2].
> > 		lz4  : 8s(multi-thread) v.s. 10s(single-thread)
> > 		lz4hc: 48s(multi-thread) v.s. 54s(single-thread)
> > 	2. Packint time of Linux v6.4 git repository (with several ~GB git object
> files).
> > 		lz4  : 14s(multi-thread) v.s. 23s(single-thread)
> > 		lz4hc: 49s(multi-thread) v.s. 212s(single-thread)
> 
> That is reasonable anyway, but in order to make multi-threaded support
> better, some code needs to be refactored first.
> 
> Actually I'm have some cleanup patches to prepare for multithreaded
> support on hand, but I will apply these after 1.7 is released, again.
>

OK.

> >
> > BTW, is there any format file (e.g., .clang-format) available for me to
> format erofs-utils project?
> 
> Not yet, erofs-utils follows Linux kernel coding style, would you mind submit
> a patch for this?
> 

I'll give it a try.

> Thanks,
> Gao Xiang


Thanks, 
Yifan Zhao





More information about the Linux-erofs mailing list