[RFC] fsck.erofs: design discussion for multi-threaded extraction
Utkal Singh
singhutkal015 at gmail.com
Sun Mar 22 16:57:52 AEDT 2026
Hi Gao Xiang and EROFS community,
I have been contributing to erofs-utils since early March [1]. I would like to discuss a design for multi-threaded extraction in fsck.erofs and get feedback before writing more code.
Current state:
- fsck.erofs is strictly single-threaded; erofs_verify_inode_data()
serializes all decompression inside erofsfsck_check_inode()
- lib/workqueue.c already provides erofs_alloc_workqueue(),
erofs_queue_work(), and erofs_destroy_workqueue(), used by mkfs.erofs
for multi-threaded compression (cfg.c_mt_workers, --workers=#)
- Fragment cache was introduced in 1.8.5 (lib/fragments.c)
Proposed design:
Since EROFS pclusters are independent, decompression can be
parallelized while file creation and metadata application (chown,
chmod, utimensat, xattrs) stay serialized in the main thread.
Pipeline sketch:
Main thread: inode walk -> erofs_queue_work() -> collect result
-> write output + apply metadata
Worker N: erofs_verify_inode_data() for one file
I plan to reuse the existing erofs_workqueue infrastructure and
follow the --workers=# convention already used in mkfs.erofs.
Design questions I would appreciate guidance on:
Q1. Is the existing erofs_workqueue sufficient for fsck, or should
max_jobs be bounded more tightly to control memory pressure for
large images?
Q2. For fragment-deduplicated files (fragment cache from 1.8.5),
should workers share a mutex around fragment reads, or should
fragment reads remain in the main thread?
Q3. Is per-file the right parallelism granularity, or would
per-pcluster be better for large single compressed files?
Q4. Should fsck follow --workers=# (matching mkfs) or use -T#?
[1] https://lore.kernel.org/linux-erofs/CAGSu4WPCYtq-+hVc-tg_A4u3a3zxnizx7ui7QSO0R8V1DirJSg@mail.gmail.com/
Thanks,
Utkal Singh
More information about the Linux-erofs
mailing list