Practical Limit on EROFS lcluster size

Kelvin Zhang zhangkelvin at google.com
Tue Dec 21 02:02:13 AEDT 2021


On Mon, Dec 20, 2021 at 9:53 AM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
>
> Hi Kelvin,
>
> On Mon, Dec 20, 2021 at 08:45:42AM -0500, Kelvin Zhang wrote:
> > Hi Gao,
> >     I was playing with large pcluster sizes recently, I noticed a
> > quirk about EROFS. In summary, logical cluster size has a practical
> > limit of 8MB. Here's why:
> >
> >    Looking at https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/lib/compress.c?h=experimental&id=564adb0a852b38a1790db20516862fc31bca314d#n92
> > , line 92, we see the following code:
> >
> > if (d0 == 1 && erofs_sb_has_big_pcluster()) {
> >         type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD;
> >         di.di_u.delta[0] = cpu_to_le16(ctx->compressedblks |
> >                 Z_EROFS_VLE_DI_D0_CBLKCNT); // This line
> >         di.di_u.delta[1] = cpu_to_le16(d1);
> > } else if (d0) {
> >         type = Z_EROFS_VLE_CLUSTER_TYPE_NONHEAD;
> >
> >         di.di_u.delta[0] = cpu_to_le16(d0);  // and this line
> >         di.di_u.delta[1] = cpu_to_le16(d1);
> > }
> >
> > When a compressed index has type NOHEAD, delta[0] stores d0(distance
> > to head block). But The 11th bit of d0 is also used as a flag bit to
> > indicate that d0 stores the pcluster size. This means d0 cannot exceed
> > Z_EROFS_VLE_DI_D0_CBLKCNT(2048), or else the parser will incorrectly
> > interpret d0 as pcluster size, rather than distance to head block.
> >     Is this an intentional design choice? It's not necessarily bad,
> > but it's something I think is worth documenting in code.
>
> Thanks for this great insight! Actually on-disk EROFS format doesn't
> have such limitation by design, since if it looks back to the delta0
> lcluster and it's still a NONHEAD lcluster, it will look back with
> new delta0 again until finding the final HEAD lcluster.
>
> But I'm not sure if mkfs code can handle > 8MiB lcluster properly yet,
> without modification since lcluster size is strictly limited with
> https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/include/erofs/compress.h#n14
> EROFS_CONFIG_COMPR_MAX_SZ * 2

Right, the current lcluster buffer is on stack, and has size
EROFS_CONFIG_COMPR_MAX_SZ*2.
I was working on a patch that moves lcluster buffer to heap and
increase it to way beyond 900KB.
With large pclusters it make sense to have a larger lcluster limit as
well, or else users wouldn't
be able to take full advantage of large pclusters.
Currently this fails during writing compressed indices. As
write_compacted_indexes
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs-utils.git/tree/lib/compress.c?h=experimental&id=564adb0a852b38a1790db20516862fc31bca314d#n313
would mistakenly interpret a large delta0 value as
Z_EROFS_VLE_DI_D0_CBLKCNT, resulting in
sanity check failure later on. Let me see if I can fix that... I
haven't  read the kernel code so I'm not
sure what the kernel is going to do with a large delta0 value which
happens to have the
Z_EROFS_VLE_DI_D0_CBLKCNT bit set.

>
> Yeah, I have to admit the current document might not be so detailed,
> partially due to my somewhat bad English written speed, and limited
> time...
>
> Thanks,
> Gao Xiang
>
> >
> >
> > --
> > Sincerely,
> >
> > Kelvin Zhang



-- 
Sincerely,

Kelvin Zhang


More information about the Linux-erofs mailing list