[RFC] fsck.erofs: multi-threaded data verification/extraction

Mon Mar 23 03:11:15 AEDT 2026

Hi,

This RFC describes a design for multi-threaded verification/extraction
in fsck.erofs and asks for feedback before I send any code.

Problem
-------
fsck.erofs processes all inodes sequentially. When --extract is used
on a large compressed image, z_erofs_decompress() dominates CPU time
while other cores sit idle.

Why a naive per-inode thread fails
-----------------------------------
erofsfsck_dirent_iter() mutates a single global fsckcfg.extract_path[]
buffer in-place, advancing extract_pos on descent and restoring it on
return. Any two concurrent callbacks would corrupt each other's path
state. Additionally fsckcfg.corrupted, fsckcfg.physical_blocks, and
fsckcfg.logical_blocks are written without synchronisation.

Proposed two-phase design
--------------------------
Phase 1 (serial): Standard DFS walk, unchanged. mkdir(), symlink
creation, and hardlink table updates all happen here. For each regular
file, record (nid, strdup(path)) in a work list.

Phase 2 (parallel): lib/workqueue.c dispatches the work list. Each
worker carries a per-thread struct erofs_verify_ctx (raw/buffer
pointers) allocated via the on_start/on_exit TLS hooks already used
by mkfs. Workers call erofs_verify_inode_data() with their own ctx;
no buffer state is shared between threads. The workqueue is destroyed
after all jobs complete, ensuring all workers finish before exit.

I confirmed that erofs_map_blocks() and erofs_verify_xattr() in fsck
are already re-entrant: they use caller-local struct erofs_map_blocks
with its own erofs_buf. The only non-re-entrant state is in fsckcfg
and the hardlink table, both handled in Phase 1.

Shared state plan
------------------
  fsckcfg.corrupted      -->  __atomic_store_n()
  fsckcfg.physical_blocks  --> per-worker counter, merged at join
  fsckcfg.logical_blocks  -->  same
  erofsfsck_link_hashtable --> Phase 1 only, no locking needed
  fsckcfg.extract_path    -->   per-job strdup() at dispatch time

fd exhaustion: workers open() their own fd, use it, then close() it.
Maximum concurrent open fds equals nworkers (typically 4-16), not
the total file count.

Scope for v1
-------------
MT only activates when --extract is passed (check_decomp = true).
Without --extract the decode loop is skipped entirely and the default
path is completely unchanged. Directory creation, symlinks, hardlinks,
and packed inode verification remain serial.

Preparatory patch
------------------
I plan to send a small refactoring patch:

  fsck: make erofs_verify_inode_data() buffer ownership explicit

It moves raw/buffer ownership into a caller-owned struct
erofs_verify_ctx with no behaviour change. This is the only structural
prerequisite for the MT implementation.

Would appreciate feedback on the two-phase model and the shared-state
plan before I proceed.

Thanks,
Utkal Singh