[RFC PATCH] erofs-utils: mkfs: introduce multi-thread compression

Guo Xuenan guoxuenan at huawei.com
Mon Aug 21 15:11:22 AEST 2023


Hi,Xiang

Is there a develop branch for multi-thread compression,
then we can work together to make it better.

Thanks
Xuenan
On 2023/8/20 2:41, Gao Xiang wrote:
> Hi Yifan,
>
> On 2023/8/20 02:01, Yifan Zhao wrote:
>> This patch introduce multi-thread compression to accelerate image
>> packaging.
>> ---
>> Hi all:
>>
>> This is a very imperfect patch not ready for merging, and any 
>> suggestions would be appreciated!
>> If it's on track, I'd like to follow up on that.
>>
>> The inefficiency of EROFS compressed image creation is a much 
>> criticized problem,
>> and this patch attempts to address by creating multiple threads
>> to run the compression algorithm in parallel.
>
> Many thanks if you could have time following on that.
>
> Yet due to the release process timing, erofs-utils 1.7 will be released
> in about a month, so I think multithreaded support will be supported as
> part of erofs-utils v1.8.
>
>>
>> Specifically, each input file over 16MB is split into segments,
>> and each thread compresses a segment as if it were a separate file.
>> Finally, the main thread merges all the compressed segments into one 
>> file.
>> This process does not involve any data contention.
>>
>> Current issues:
>> 1.    For each large file, we create and destroy a batch of worker 
>> threads, causing unnecessary overhead.
>>     Moreover, each worker thread's context is a global variable, 
>> making the binary bigger.
>>     In the future, we can pre-create worker threads when the program 
>> starts running.
>>     Worker threads serve as consumers and the main thread that makes 
>> the compression request is the producer.
>
> I'd suggest if we could use (or enhance?)
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/commit/?id=e5b83309b199966cc757cb095d1ff1ebd0923b3e 
>
>
> as a start?
>
>> 2.    Fragment/Dedupe together with other advanced features are not 
>> fully tested
>>     due to my poor knowledge of the compression process. Not sure if 
>> they work well with multithreading.
>
> I have a preliminary design of Fragment/Dedupe, we could talk more 
> details
> later if you'd like to take more time on this, thanks! ;)
>
>> 3.    There is a lot of code redundancy between the
>>     erofs_write_compressed_file() and 
>> erofs_write_compressed_file_single() functions.
>>     I don't want to break the original single-threaded execution logic,
>>     but erofs_write_compressed_file() has a high complexity and
>>     my failed attempt to merge the two functions makes the matter worse.
>>     I'm not sure if we should merge them together or keep two 
>> different function entries for single and multi-threaded compression.
> I think we need to merge these finally.
>
>>
>> Performance:
>>     Despite the naive patch, we still see performance gain due to the 
>> poor baseline performance especially for lz4hc.
>>     1. Packing time of an Arch linux container image [1] provided by 
>> @wszqkzqk [2].
>>         lz4  : 8s(multi-thread) v.s. 10s(single-thread)
>>         lz4hc: 48s(multi-thread) v.s. 54s(single-thread)
>>     2. Packint time of Linux v6.4 git repository (with several ~GB 
>> git object files).
>>         lz4  : 14s(multi-thread) v.s. 23s(single-thread)
>>         lz4hc: 49s(multi-thread) v.s. 212s(single-thread)
>
> That is reasonable anyway, but in order to make multi-threaded support
> better, some code needs to be refactored first.
>
> Actually I'm have some cleanup patches to prepare for multithreaded
> support on hand, but I will apply these after 1.7 is released, again.
>
>>
>> BTW, is there any format file (e.g., .clang-format) available for me 
>> to format erofs-utils project?
>
> Not yet, erofs-utils follows Linux kernel coding style, would you mind
> submit a patch for this?
>
> Thanks,
> Gao Xiang
>
-- 
Guo Xuenan [OS Kernel Lab]
-----------------------------
Email: guoxuenan at huawei.com



More information about the Linux-erofs mailing list