Practical Limit on EROFS lcluster size

Gao Xiang hsiangkao at linux.alibaba.com
Tue Dec 21 02:09:11 AEDT 2021


On Mon, Dec 20, 2021 at 10:02:13AM -0500, Kelvin Zhang wrote:
> On Mon, Dec 20, 2021 at 9:53 AM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
> >
> > Hi Kelvin,
> >
> > On Mon, Dec 20, 2021 at 08:45:42AM -0500, Kelvin Zhang wrote:
> > > Hi Gao,
> > >     I was playing with large pcluster sizes recently, I noticed a
> > > quirk about EROFS. In summary, logical cluster size has a practical
> > > limit of 8MB. Here's why:
> > >
> > >    Looking at https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/lib/compress.c?h=experimental&id=564adb0a852b38a1790db20516862fc31bca314d#n92
> > > , line 92, we see the following code:
> > >
> > > if (d0 == 1 && erofs_sb_has_big_pcluster()) {
> > >         type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD;
> > >         di.di_u.delta[0] = cpu_to_le16(ctx->compressedblks |
> > >                 Z_EROFS_VLE_DI_D0_CBLKCNT); // This line
> > >         di.di_u.delta[1] = cpu_to_le16(d1);
> > > } else if (d0) {
> > >         type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD;
> > >
> > >         di.di_u.delta[0] = cpu_to_le16(d0);  // and this line
> > >         di.di_u.delta[1] = cpu_to_le16(d1);
> > > }
> > >
> > > When a compressed index has type NOHEAD, delta[0] stores d0(distance
> > > to head block). But The 11th bit of d0 is also used as a flag bit to
> > > indicate that d0 stores the pcluster size. This means d0 cannot exceed
> > > Z_EROFS_VLE_DI_D0_CBLKCNT(2048), or else the parser will incorrectly
> > > interpret d0 as pcluster size, rather than distance to head block.
> > >     Is this an intentional design choice? It's not necessarily bad,
> > > but it's something I think is worth documenting in code.
> >
> > Thanks for this great insight! Actually on-disk EROFS format doesn't
> > have such limitation by design, since if it looks back to the delta0
> > lcluster and it's still a NONHEAD lcluster, it will look back with
> > new delta0 again until finding the final HEAD lcluster.
> >
> > But I'm not sure if mkfs code can handle > 8MiB lcluster properly yet,
> > without modification since lcluster size is strictly limited with
> > https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/include/erofs/compress.h#n14
> > EROFS_CONFIG_COMPR_MAX_SZ * 2
> 
> Right, the current lcluster buffer is on stack, and has size
> EROFS_CONFIG_COMPR_MAX_SZ*2.
> I was working on a patch that moves lcluster buffer to heap and
> increase it to way beyond 900KB.
> With large pclusters it make sense to have a larger lcluster limit as
> well, or else users wouldn't
> be able to take full advantage of large pclusters.

Okay, make sense.

> Currently this fails during writing compressed indices. As
> write_compacted_indexes
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/lib/compress.c?h=experimental&id=564adb0a852b38a1790db20516862fc31bca314d#n313
> would mistakenly interpret a large delta0 value as
> Z_EROFS_VLE_DI_D0_CBLKCNT, resulting in
> sanity check failure later on. Let me see if I can fix that... I
> haven't  read the kernel code so I'm not
> sure what the kernel is going to do with a large delta0 value which
> happens to have the
> Z_EROFS_VLE_DI_D0_CBLKCNT bit set.

I think kernel lookback function works as expected, it just looks back
with delta0 hints until finding the HEAD lcluster to get the whole
extent information.. But yeah, we might need to actually test it if
possible.

(Although my current top priority stuff now is ztailpacking feature..)

Thanks,
Gao Xiang

> 
> >
> > Yeah, I have to admit the current document might not be so detailed,
> > partially due to my somewhat bad English written speed, and limited
> > time...
> >
> > Thanks,
> > Gao Xiang
> >
> > >
> > >
> > > --
> > > Sincerely,
> > >
> > > Kelvin Zhang
> 
> 
> 
> -- 
> Sincerely,
> 
> Kelvin Zhang


More information about the Linux-erofs mailing list