<div dir="ltr"><p>Hi Utkal,</p><p>Thanks again for the detailed explanation and for pointing me to the RFC — it really helped clarify the bigger picture.</p><p>I spent some time going through the relevant parts of the code and your comments made a lot more sense in that context. I see now that while pcluster-level parallelism is valid, the main challenge is making the surrounding infrastructure safe before introducing concurrency.</p><p>In particular, I hadn’t fully accounted for:</p><ul><li><p>the lseek() + read() pattern in erofs_read_one_data() and why switching to pread() is necessary for correctness,</p></li><li><p>the lack of synchronization in erofs_iget()/erofs_iput(), which could lead to refcount races,</p></li><li><p>and the implications of using an unbounded workqueue on large images.</p></li></ul><p>Your point about backpressure was especially helpful — I’m now considering a bounded queue or a semaphore-based approach to ensure the producer doesn’t get too far ahead of the workers.</p><p>I also revisited the design with this in mind, and the two-phase model (serial traversal + parallel data verification/extraction) makes a lot more sense now, especially for isolating shared state like fsckcfg and path handling.</p><p>I’ll continue refining the proposal with these constraints in mind and go deeper into io.c, inode.c, and workqueue.c to make sure the design is correct before thinking about actual parallel execution.</p><p>Thanks again for taking the time to explain this — it was very helpful.</p><p>Regards,<br>Deepak Pathik</p></div><br><div class="gmail_quote gmail_quote_container"><div dir="ltr" class="gmail_attr">On Mon, Mar 30, 2026 at 1:50 AM Utkal Singh <<a href="mailto:singhutkal015@gmail.com">singhutkal015@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Sun, Mar 29, 2026 at 6:47 PM, Deepak Pathik wrote:<br>
> for LZMA-compressed images, are pclusters in fsck.erofs always<br>
> fixed-size and independently decompressible at the userspace level,<br>
> or are there cases where a pcluster depends on the state left by a<br>
> previous one?<br>
<br>
Hi Deepak,<br>
<br>
To answer your LZMA question: yes, each pcluster is independently<br>
decompressible by design. You can verify this directly in<br>
lib/decompress.c — z_erofs_decompress_lzma() calls lzma_stream_decoder()<br>
and lzma_end() within a single invocation, with no persistent lzma_stream<br>
across calls. The same holds for ZSTD and deflate. The on-disk format<br>
enforces this: no pcluster depends on decompressor state from a<br>
previous one.<br>
<br>
The parallelism boundary you identified is correct. The deeper issue<br>
is one level up: erofs_check_inode() is called sequentially in the<br>
dispatch loop in fsck/main.c, and each call may decompress many<br>
pclusters per inode. Inode-level dispatch is simpler than<br>
pcluster-level because it avoids output ordering constraints.<br>
<br>
One thing worth thinking through before wiring erofs_workqueue into<br>
the fsck path: the existing queue in lib/workqueue.c is an unbounded<br>
producer queue built for mkfs compression workloads. On a 34,000+<br>
inode image, it will accumulate all inode descriptors in memory before<br>
workers can drain it. Backpressure — either a bounded queue or a<br>
semaphore on the existing one — matters here.<br>
<br>
Two paths in the surrounding infrastructure also need fixing before<br>
concurrent dispatch is correct:<br>
<br>
- erofs_read_one_data() in lib/io.c: lseek()+read() on a shared fd<br>
is a TOCTOU race under concurrent calls. pread(2) fixes it cleanly.<br>
<br>
- erofs_iget()/erofs_iput() in lib/inode.c: ref-count mutations<br>
without synchronisation. Concurrent iput() can double-free.<br>
<br>
I sent an RFC on March 22 covering this design if it is useful context:<br>
<br>
<a href="https://lore.kernel.org/linux-erofs/CAGSu4WNBdB30K61xoUCi3FB9QR081fNh-1hoX1z2TZMk0nGpHQ@mail.gmail.com/" rel="noreferrer" target="_blank">https://lore.kernel.org/linux-erofs/CAGSu4WNBdB30K61xoUCi3FB9QR081fNh-1hoX1z2TZMk0nGpHQ@mail.gmail.com/</a><br>
<br>
Happy to discuss further on the list.<br>
<br>
Regards,<br>
Utkal Singh<br>
<br>
<br>
On Sun, 29 Mar 2026 at 18:47, Deepak Pathik <<a href="mailto:deepakpathik2005@gmail.com" target="_blank">deepakpathik2005@gmail.com</a>> wrote:<br>
><br>
> Hi,<br>
><br>
> I'm Deepak Pathik, a second-year B.Tech student applying for the GSoC 2026 project on multi-threaded decompression support in fsck.erofs.<br>
><br>
> While reading through the source, I traced the decompression path in erofs_verify_inode_data() and noticed that z_erofs_decompress() operates on a locally scoped struct z_erofs_decompress_req with its own input and output buffers — no shared mutable state between calls. My plan is to wire the existing erofs_workqueue (already used in lib/compress.c for mkfs.erofs) into the fsck extraction path at the pcluster level, with pwrite() for position-based output writes to avoid ordering locks.<br>
><br>
> One thing I wanted to confirm before finalizing my proposal: for LZMA-compressed images, are pclusters in fsck.erofs always fixed-size and independently decompressible at the userspace level, or are there cases where a pcluster depends on the state left by a previous one? I want to make sure I'm not understating the LZMA case in my design.<br>
><br>
> I've drafted a proposal and would be happy to share it for early feedback if that's useful.<br>
><br>
> Thanks,<br>
> Deepak Pathik<br>
> <a href="https://github.com/deepakpathik" rel="noreferrer" target="_blank">https://github.com/deepakpathik</a><br>
> <a href="mailto:deepakpathik2005@gmail.com" target="_blank">deepakpathik2005@gmail.com</a><br>
</blockquote></div>