EROFS roadmap update
Gao Xiang
xiang at kernel.org
Sat Oct 9 17:11:51 AEDT 2021
Hi folks,
As you may noticed, I'm working on several different stuffs for the
upcoming 5.16 Linux kernel, which including multiple device support
for multi-layer container images for runc and kata-likewise containers
and LZMA algorithm support (I'll send out the formal patchset this
week.)
Here is the EROFS roadmap in the near/mid term as far as I know:
Container use cases:
- Multiple device/blob support (v5.16, me);
- Other stuffs working in progress (our whole team, mainly working
in the form of new RAFS v6 format which is an EROFS-compatible
format for nydus [1] container image service later.)
Embedded device use cases:
- LZMA compression support, specifically MicroLZMA (v5.16, me with
Lasse kind help/support) with complete in-place I/O and overlapped
decompression (for embedded boards or a secondary auxiliary
algorithm in a file as a complement for specific access patterns):
https://git.kernel.org/pub/scm/linux/kernel/git/xiang/linux.git -b erofs/lzma
- Tail-packing inline for compression files (AFAIK, Yue Hu is
currently working on this new feature);
- LZ4 range dictionary support (v5.xx?), which works in a way to
seperate a file into several sub-file segments and add a
external dictionary for each segment (4KiB dictionary for 2MiB
segment for example), I can see the benefits for specific datasets
and have some DEMO compressor code for this, for example:
enwik9 1000000000
enwik9_4k.erofs.img 558346240
enwik9_4k.dict.erofs.img 449683456 (2MiB segs with 8KiB dicts);
enwik9_4k.dict.erofs.img 400093184 (1MiB segs with 32KiB dicts);
...
https://github.com/hsiangkao/erofs-utils.git -b experimental-dictdemo
I'd like to try to seek some potential volunteer who could also be
interested in this kind of stuffs to optimize compression ratios
for specific data patterns (Note that it's not a free lunch since you
need to keep the whole dictionaries in memory before decompressing
any data in the specific range, and again it doesn't work for all
datasets [compared with LZMA] as far as I observed and the dictionary
build time is relative slow);
- Multi-threaded compression for mkfs, including file level paralleled
compression and sub-file level paralleled compression. File level
paralleled compression is trivial to think and sub-file level
paralleled compression approach is quite similar to range
dictionaries, separate the files into several segments (e.g. 16MiB)
and compress each individually in parallel;
Others:
- dump.erofs (AFAIK, Wang Qi / Guo Xuenan is working on this?)
https://lore.kernel.org/r/OSZP286MB07097AE45E9D391B0049F661B2A89@OSZP286MB0709.JPNP286.PROD.OUTLOOK.COM
- partial page up-to-date support and corresponding read interface;
- code cleanup / simplification;
- etc..
[1] https://github.com/dragonflyoss/image-service
Thanks,
Gao Xiang
More information about the Linux-erofs
mailing list