[RFC PATCH] erofs-utils: mkfs: introduce multi-thread compression
Guo Xuenan
guoxuenan at huawei.com
Mon Aug 21 15:11:22 AEST 2023
Hi,Xiang
Is there a develop branch for multi-thread compression,
then we can work together to make it better.
Thanks
Xuenan
On 2023/8/20 2:41, Gao Xiang wrote:
> Hi Yifan,
>
> On 2023/8/20 02:01, Yifan Zhao wrote:
>> This patch introduce multi-thread compression to accelerate image
>> packaging.
>> ---
>> Hi all:
>>
>> This is a very imperfect patch not ready for merging, and any
>> suggestions would be appreciated!
>> If it's on track, I'd like to follow up on that.
>>
>> The inefficiency of EROFS compressed image creation is a much
>> criticized problem,
>> and this patch attempts to address by creating multiple threads
>> to run the compression algorithm in parallel.
>
> Many thanks if you could have time following on that.
>
> Yet due to the release process timing, erofs-utils 1.7 will be released
> in about a month, so I think multithreaded support will be supported as
> part of erofs-utils v1.8.
>
>>
>> Specifically, each input file over 16MB is split into segments,
>> and each thread compresses a segment as if it were a separate file.
>> Finally, the main thread merges all the compressed segments into one
>> file.
>> This process does not involve any data contention.
>>
>> Current issues:
>> 1. For each large file, we create and destroy a batch of worker
>> threads, causing unnecessary overhead.
>> Moreover, each worker thread's context is a global variable,
>> making the binary bigger.
>> In the future, we can pre-create worker threads when the program
>> starts running.
>> Worker threads serve as consumers and the main thread that makes
>> the compression request is the producer.
>
> I'd suggest if we could use (or enhance?)
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/commit/?id=e5b83309b199966cc757cb095d1ff1ebd0923b3e
>
>
> as a start?
>
>> 2. Fragment/Dedupe together with other advanced features are not
>> fully tested
>> due to my poor knowledge of the compression process. Not sure if
>> they work well with multithreading.
>
> I have a preliminary design of Fragment/Dedupe, we could talk more
> details
> later if you'd like to take more time on this, thanks! ;)
>
>> 3. There is a lot of code redundancy between the
>> erofs_write_compressed_file() and
>> erofs_write_compressed_file_single() functions.
>> I don't want to break the original single-threaded execution logic,
>> but erofs_write_compressed_file() has a high complexity and
>> my failed attempt to merge the two functions makes the matter worse.
>> I'm not sure if we should merge them together or keep two
>> different function entries for single and multi-threaded compression.
> I think we need to merge these finally.
>
>>
>> Performance:
>> Despite the naive patch, we still see performance gain due to the
>> poor baseline performance especially for lz4hc.
>> 1. Packing time of an Arch linux container image [1] provided by
>> @wszqkzqk [2].
>> lz4 : 8s(multi-thread) v.s. 10s(single-thread)
>> lz4hc: 48s(multi-thread) v.s. 54s(single-thread)
>> 2. Packint time of Linux v6.4 git repository (with several ~GB
>> git object files).
>> lz4 : 14s(multi-thread) v.s. 23s(single-thread)
>> lz4hc: 49s(multi-thread) v.s. 212s(single-thread)
>
> That is reasonable anyway, but in order to make multi-threaded support
> better, some code needs to be refactored first.
>
> Actually I'm have some cleanup patches to prepare for multithreaded
> support on hand, but I will apply these after 1.7 is released, again.
>
>>
>> BTW, is there any format file (e.g., .clang-format) available for me
>> to format erofs-utils project?
>
> Not yet, erofs-utils follows Linux kernel coding style, would you mind
> submit a patch for this?
>
> Thanks,
> Gao Xiang
>
--
Guo Xuenan [OS Kernel Lab]
-----------------------------
Email: guoxuenan at huawei.com
More information about the Linux-erofs
mailing list