[RFC PATCH] erofs-utils: mkfs: introduce multi-thread compression
Gao Xiang
hsiangkao at linux.alibaba.com
Sun Aug 20 04:41:18 AEST 2023
Hi Yifan,
On 2023/8/20 02:01, Yifan Zhao wrote:
> This patch introduce multi-thread compression to accelerate image
> packaging.
> ---
> Hi all:
>
> This is a very imperfect patch not ready for merging, and any suggestions would be appreciated!
> If it's on track, I'd like to follow up on that.
>
> The inefficiency of EROFS compressed image creation is a much criticized problem,
> and this patch attempts to address by creating multiple threads
> to run the compression algorithm in parallel.
Many thanks if you could have time following on that.
Yet due to the release process timing, erofs-utils 1.7 will be released
in about a month, so I think multithreaded support will be supported as
part of erofs-utils v1.8.
>
> Specifically, each input file over 16MB is split into segments,
> and each thread compresses a segment as if it were a separate file.
> Finally, the main thread merges all the compressed segments into one file.
> This process does not involve any data contention.
>
> Current issues:
> 1. For each large file, we create and destroy a batch of worker threads, causing unnecessary overhead.
> Moreover, each worker thread's context is a global variable, making the binary bigger.
> In the future, we can pre-create worker threads when the program starts running.
> Worker threads serve as consumers and the main thread that makes the compression request is the producer.
I'd suggest if we could use (or enhance?)
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/commit/?id=e5b83309b199966cc757cb095d1ff1ebd0923b3e
as a start?
> 2. Fragment/Dedupe together with other advanced features are not fully tested
> due to my poor knowledge of the compression process. Not sure if they work well with multithreading.
I have a preliminary design of Fragment/Dedupe, we could talk more details
later if you'd like to take more time on this, thanks! ;)
> 3. There is a lot of code redundancy between the
> erofs_write_compressed_file() and erofs_write_compressed_file_single() functions.
> I don't want to break the original single-threaded execution logic,
> but erofs_write_compressed_file() has a high complexity and
> my failed attempt to merge the two functions makes the matter worse.
> I'm not sure if we should merge them together or keep two different function entries for single and multi-threaded compression.
I think we need to merge these finally.
>
> Performance:
> Despite the naive patch, we still see performance gain due to the poor baseline performance especially for lz4hc.
> 1. Packing time of an Arch linux container image [1] provided by @wszqkzqk [2].
> lz4 : 8s(multi-thread) v.s. 10s(single-thread)
> lz4hc: 48s(multi-thread) v.s. 54s(single-thread)
> 2. Packint time of Linux v6.4 git repository (with several ~GB git object files).
> lz4 : 14s(multi-thread) v.s. 23s(single-thread)
> lz4hc: 49s(multi-thread) v.s. 212s(single-thread)
That is reasonable anyway, but in order to make multi-threaded support
better, some code needs to be refactored first.
Actually I'm have some cleanup patches to prepare for multithreaded
support on hand, but I will apply these after 1.7 is released, again.
>
> BTW, is there any format file (e.g., .clang-format) available for me to format erofs-utils project?
Not yet, erofs-utils follows Linux kernel coding style, would you mind
submit a patch for this?
Thanks,
Gao Xiang
More information about the Linux-erofs
mailing list