EROFS roadmap update

Sat Oct 9 17:11:51 AEDT 2021

Hi folks,

As you may noticed, I'm working on several different stuffs for the
upcoming 5.16 Linux kernel, which including multiple device support
for multi-layer container images for runc and kata-likewise containers
and LZMA algorithm support (I'll send out the formal patchset this
week.)

Here is the EROFS roadmap in the near/mid term as far as I know:

Container use cases:
 - Multiple device/blob support (v5.16, me);

 - Other stuffs working in progress (our whole team, mainly working
   in the form of new RAFS v6 format which is an EROFS-compatible
   format for nydus [1] container image service later.)

Embedded device use cases:
 - LZMA compression support, specifically MicroLZMA (v5.16, me with
   Lasse kind help/support) with complete in-place I/O and overlapped
   decompression (for embedded boards or a secondary auxiliary
   algorithm in a file as a complement for specific access patterns):
   https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/lzma

 - Tail-packing inline for compression files (AFAIK, Yue Hu is
   currently working on this new feature);

 - LZ4 range dictionary support (v5.xx?), which works in a way to
   seperate a file into several sub-file segments and add a
   external dictionary for each segment (4KiB dictionary for 2MiB
   segment for example), I can see the benefits for specific datasets
   and have some DEMO compressor code for this, for example:
     enwik9			1000000000
     enwik9_4k.erofs.img	 558346240
     enwik9_4k.dict.erofs.img	 449683456 (2MiB segs with 8KiB dicts);
     enwik9_4k.dict.erofs.img	 400093184 (1MiB segs with 32KiB dicts);
     ...

   https://github.com/hsiangkao/erofs-utils.git -b experimental-dictdemo

   I'd like to try to seek some potential volunteer who could also be
   interested in this kind of stuffs to optimize compression ratios
   for specific data patterns (Note that it's not a free lunch since you
   need to keep the whole dictionaries in memory before decompressing
   any data in the specific range, and again it doesn't work for all
   datasets [compared with LZMA] as far as I observed and the dictionary
   build time is relative slow);

 - Multi-threaded compression for mkfs, including file level paralleled
   compression and sub-file level paralleled compression. File level
   paralleled compression is trivial to think and sub-file level
   paralleled compression approach is quite similar to range
   dictionaries, separate the files into several segments (e.g. 16MiB)
   and compress each individually in parallel;

Others:
 - dump.erofs (AFAIK, Wang Qi / Guo Xuenan is working on this?)
   https://lore.kernel.org/r/OSZP286MB07097AE45E9D391B0049F661B2A89@OSZP286MB0709.JPNP286.PROD.OUTLOOK.COM
 - partial page up-to-date support and corresponding read interface;
 - code cleanup / simplification;
 - etc..

[1] https://github.com/dragonflyoss/image-service

Thanks,
Gao Xiang