[PATCH] fsck.erofs: introduce multi-threaded decompression PoC with pcluster batching

Gao Xiang hsiangkao at linux.alibaba.com
Tue Mar 3 02:52:28 AEDT 2026


Hi Nithurshen,

On 2026/3/2 15:32, Nithurshen wrote:
> This is a Proof of Concept to introduce a scalable, multi-threaded
> decompression framework into fsck.erofs to reduce extraction time.
> 
> Baseline Profiling:
> Using the Linux 6.7 kernel source packed with LZ4HC (4K pclusters),
> perf showed a strictly synchronous execution path. The main thread
> spent ~52% of its time in LZ4_decompress_safe, heavily blocked by
> synchronous I/O (~32% in el0_svc/vfs_read).
> 
> First Iteration (Naive Workqueue):
> A standard producer-consumer workqueue overlapping compute with pwrite()
> suffered massive scheduling overhead. For 4KB LZ4 clusters, workers
> spent ~44% of CPU time spinning on __arm64_sys_futex and try_to_wake_up.
> 
> Current PoC (Dynamic Pcluster Batching):
> To eliminate lock contention, this patch introduces a batching context.
> Instead of queuing 1 pcluster per task, the main thread collects an
> array of sequential pclusters (Z_EROFS_PCLUSTER_BATCH_SIZE = 32) before
> submitting a single erofs_work unit.
> 
> Results:
> - Scheduling overhead (futex) dropped significantly.
> - Workers stay cache-hot, decompressing 32 blocks per wakeup.
> - LZ4_decompress_safe is successfully offloaded to background cores
>    (~18.8% self-execution time), completely decoupled from main thread I/O.

Glad to see the improvements, I think there are more room
to be improved anyway.

Also there are still some follow-up works, I'm busy these
two days, but I will release a formal gsoc page later.

Thanks,
Gao Xiang


More information about the Linux-erofs mailing list