[RFC] fsck.erofs: multi-threaded data verification/extraction
Utkal Singh
singhutkal015 at gmail.com
Mon Mar 23 03:11:15 AEDT 2026
Hi,
This RFC describes a design for multi-threaded verification/extraction
in fsck.erofs and asks for feedback before I send any code.
Problem
-------
fsck.erofs processes all inodes sequentially. When --extract is used
on a large compressed image, z_erofs_decompress() dominates CPU time
while other cores sit idle.
Why a naive per-inode thread fails
-----------------------------------
erofsfsck_dirent_iter() mutates a single global fsckcfg.extract_path[]
buffer in-place, advancing extract_pos on descent and restoring it on
return. Any two concurrent callbacks would corrupt each other's path
state. Additionally fsckcfg.corrupted, fsckcfg.physical_blocks, and
fsckcfg.logical_blocks are written without synchronisation.
Proposed two-phase design
--------------------------
Phase 1 (serial): Standard DFS walk, unchanged. mkdir(), symlink
creation, and hardlink table updates all happen here. For each regular
file, record (nid, strdup(path)) in a work list.
Phase 2 (parallel): lib/workqueue.c dispatches the work list. Each
worker carries a per-thread struct erofs_verify_ctx (raw/buffer
pointers) allocated via the on_start/on_exit TLS hooks already used
by mkfs. Workers call erofs_verify_inode_data() with their own ctx;
no buffer state is shared between threads. The workqueue is destroyed
after all jobs complete, ensuring all workers finish before exit.
I confirmed that erofs_map_blocks() and erofs_verify_xattr() in fsck
are already re-entrant: they use caller-local struct erofs_map_blocks
with its own erofs_buf. The only non-re-entrant state is in fsckcfg
and the hardlink table, both handled in Phase 1.
Shared state plan
------------------
fsckcfg.corrupted --> __atomic_store_n()
fsckcfg.physical_blocks --> per-worker counter, merged at join
fsckcfg.logical_blocks --> same
erofsfsck_link_hashtable --> Phase 1 only, no locking needed
fsckcfg.extract_path --> per-job strdup() at dispatch time
fd exhaustion: workers open() their own fd, use it, then close() it.
Maximum concurrent open fds equals nworkers (typically 4-16), not
the total file count.
Scope for v1
-------------
MT only activates when --extract is passed (check_decomp = true).
Without --extract the decode loop is skipped entirely and the default
path is completely unchanged. Directory creation, symlinks, hardlinks,
and packed inode verification remain serial.
Preparatory patch
------------------
I plan to send a small refactoring patch:
fsck: make erofs_verify_inode_data() buffer ownership explicit
It moves raw/buffer ownership into a caller-owned struct
erofs_verify_ctx with no behaviour change. This is the only structural
prerequisite for the MT implementation.
Would appreciate feedback on the two-phase model and the shared-state
plan before I proceed.
Thanks,
Utkal Singh
More information about the Linux-erofs
mailing list