[GSoC 2026] Introduction & Proposal Discussion — fsck.erofs Multi-threaded Decompression

Yash Gupta yashgupta9437 at gmail.com
Wed Apr 1 01:03:08 AEDT 2026


 Dear Yifan, Chunhai, and Gao Xiang,

I am Yash Gupta, a second-year MCA student at Chandigarh University, India
(IST, UTC+5:30). I am writing to introduce myself ahead of submitting my
GSoC 2026 proposal for the "Multi-threaded Decompression Support in
fsck.erofs" project.

I have studied the three problems documented openly in the erofs-utils
README — single-threaded extraction, slow fragment decompression, and
missing xattr/ACL restoration — and my proposal addresses all three
directly.

Preparation I have done so far:

- Built erofs-utils from source on Fedora 41 and Debian 13
- Profiled fsck.erofs --extract on sample images using perf and confirmed
the single-threaded CPU bottleneck firsthand
- Traced the pcluster decompression call path end-to-end through fsck/ and
lib/
- Reviewed the mkfs.erofs thread-pool implementation as a reference for
pool design
- Read the containerd EROFS snapshotter documentation to understand how
EROFS layer blobs are consumed downstream
- Subscribed to linux-erofs at lists.ozlabs.org and reviewed the last three
months of patches, including commit 2ce4b18 (xattr crash fix in the rebuild
path)

My proposed technical approach:

- A fixed-size pthreads pool (default: nproc, overridable via -j N) with an
MPMC work queue
- Pre-allocated, non-overlapping output buffers per inode — no locks needed
at write time, zero coordination between worker threads during decompression
- A reader-writer-locked hash-map cache keyed by pcluster disk block
address to eliminate redundant fragment re-decompression
- xattr and ACL restoration via lsetxattr(), built on the stabilized
2ce4b18 base, covering SELinux labels, file capabilities, and POSIX ACL
round-trips
- TSAN validation on every patch before sending upstream

I have practical experience with POSIX pthreads, producer-consumer queues,
and reader-writer locks from coursework and personal projects. I am
familiar with git format-patch, git send-email, Linux kernel coding style,
and checkpatch.pl.

I plan to post baseline benchmark numbers to the mailing list during the
community bonding period before writing any code, and to send incremental
patch series for review throughout the summer rather than a single large
batch at the end.

My proposal draft is attached. I would greatly appreciate any early
feedback — particularly on the thread-pool design, the fragment cache
approach, and whether the 12-week timeline is realistic from your
perspective.

Thank you for your time and for maintaining such a well-documented project.

Best regards,
Yash Gupta
MCA · Chandigarh University · India
github.com/developer-yashgupta
linkedin.com/in/developer-yash
yashgupta9437 at gmail.com
IST (UTC +5:30)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linux-erofs/attachments/20260331/a79d261d/attachment.htm>


More information about the Linux-erofs mailing list