Re: [GSoC 2026] Multi-threaded decompression for fsck.erofs — design question on z_erofs_decompress() parallelism

Deepak Pathik deepakpathik2005 at gmail.com
Mon Mar 30 19:30:44 AEDT 2026


Hi Utkal,

Thanks again for the detailed explanation and for pointing me to the RFC —
it really helped clarify the bigger picture.

I spent some time going through the relevant parts of the code and your
comments made a lot more sense in that context. I see now that while
pcluster-level parallelism is valid, the main challenge is making the
surrounding infrastructure safe before introducing concurrency.

In particular, I hadn’t fully accounted for:

   -

   the lseek() + read() pattern in erofs_read_one_data() and why switching
   to pread() is necessary for correctness,
   -

   the lack of synchronization in erofs_iget()/erofs_iput(), which could
   lead to refcount races,
   -

   and the implications of using an unbounded workqueue on large images.

Your point about backpressure was especially helpful — I’m now considering
a bounded queue or a semaphore-based approach to ensure the producer
doesn’t get too far ahead of the workers.

I also revisited the design with this in mind, and the two-phase model
(serial traversal + parallel data verification/extraction) makes a lot more
sense now, especially for isolating shared state like fsckcfg and path
handling.

I’ll continue refining the proposal with these constraints in mind and go
deeper into io.c, inode.c, and workqueue.c to make sure the design is
correct before thinking about actual parallel execution.

Thanks again for taking the time to explain this — it was very helpful.

Regards,
Deepak Pathik

On Mon, Mar 30, 2026 at 1:50 AM Utkal Singh <singhutkal015 at gmail.com> wrote:

> On Sun, Mar 29, 2026 at 6:47 PM, Deepak Pathik wrote:
> > for LZMA-compressed images, are pclusters in fsck.erofs always
> > fixed-size and independently decompressible at the userspace level,
> > or are there cases where a pcluster depends on the state left by a
> > previous one?
>
> Hi Deepak,
>
> To answer your LZMA question: yes, each pcluster is independently
> decompressible by design. You can verify this directly in
> lib/decompress.c — z_erofs_decompress_lzma() calls lzma_stream_decoder()
> and lzma_end() within a single invocation, with no persistent lzma_stream
> across calls. The same holds for ZSTD and deflate. The on-disk format
> enforces this: no pcluster depends on decompressor state from a
> previous one.
>
> The parallelism boundary you identified is correct. The deeper issue
> is one level up: erofs_check_inode() is called sequentially in the
> dispatch loop in fsck/main.c, and each call may decompress many
> pclusters per inode. Inode-level dispatch is simpler than
> pcluster-level because it avoids output ordering constraints.
>
> One thing worth thinking through before wiring erofs_workqueue into
> the fsck path: the existing queue in lib/workqueue.c is an unbounded
> producer queue built for mkfs compression workloads. On a 34,000+
> inode image, it will accumulate all inode descriptors in memory before
> workers can drain it. Backpressure — either a bounded queue or a
> semaphore on the existing one — matters here.
>
> Two paths in the surrounding infrastructure also need fixing before
> concurrent dispatch is correct:
>
>   - erofs_read_one_data() in lib/io.c: lseek()+read() on a shared fd
>     is a TOCTOU race under concurrent calls. pread(2) fixes it cleanly.
>
>   - erofs_iget()/erofs_iput() in lib/inode.c: ref-count mutations
>     without synchronisation. Concurrent iput() can double-free.
>
> I sent an RFC on March 22 covering this design if it is useful context:
>
>
> https://lore.kernel.org/linux-erofs/CAGSu4WNBdB30K61xoUCi3FB9QR081fNh-1hoX1z2TZMk0nGpHQ@mail.gmail.com/
>
> Happy to discuss further on the list.
>
> Regards,
> Utkal Singh
>
>
> On Sun, 29 Mar 2026 at 18:47, Deepak Pathik <deepakpathik2005 at gmail.com>
> wrote:
> >
> > Hi,
> >
> > I'm Deepak Pathik, a second-year B.Tech student applying for the GSoC
> 2026 project on multi-threaded decompression support in fsck.erofs.
> >
> > While reading through the source, I traced the decompression path in
> erofs_verify_inode_data() and noticed that z_erofs_decompress() operates on
> a locally scoped struct z_erofs_decompress_req with its own input and
> output buffers — no shared mutable state between calls. My plan is to wire
> the existing erofs_workqueue (already used in lib/compress.c for
> mkfs.erofs) into the fsck extraction path at the pcluster level, with
> pwrite() for position-based output writes to avoid ordering locks.
> >
> > One thing I wanted to confirm before finalizing my proposal: for
> LZMA-compressed images, are pclusters in fsck.erofs always fixed-size and
> independently decompressible at the userspace level, or are there cases
> where a pcluster depends on the state left by a previous one? I want to
> make sure I'm not understating the LZMA case in my design.
> >
> > I've drafted a proposal and would be happy to share it for early
> feedback if that's useful.
> >
> > Thanks,
> > Deepak Pathik
> > https://github.com/deepakpathik
> > deepakpathik2005 at gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linux-erofs/attachments/20260330/c7eaf30e/attachment.htm>


More information about the Linux-erofs mailing list