[RFC PATCH 03/24] erofs: add Errno in Rust

Thu Sep 26 21:23:26 AEST 2024

On 2024/9/26 19:01, Ariel Miculas via Linux-erofs wrote:
> On 24/09/26 06:46, Gao Xiang wrote:

...

>>
>>>
>>>>
>>>> 	                Total Size (MiB)	Average layer size (MiB)	Saved / 766.1MiB
>>>> Compressed OCI (tar.gz)	282.5	28.3	63%
>>>> Uncompressed OCI (tar)	766.1	76.6	0%
>>>> Uncomprssed EROFS	109.5	11.0	86%
>>>> EROFS (DEFLATE,9,32k)	46.4	4.6	94%
>>>> EROFS (LZ4HC,12,64k)	54.2	5.4	93%
>>>>
>>>> I don't know which compression algorithm are you using (maybe Zstd?),
>>>> but from the result is
>>>>     EROFS (LZ4HC,12,64k)  54.2
>>>>     PuzzleFS compressed   53?
>>>>     EROFS (DEFLATE,9,32k) 46.4
>>>>
>>>> I could reran with EROFS + Zstd, but it should be smaller. This feature
>>>> has been supported since Linux 6.1, thanks.
>>>
>>> The average layer size is very impressive for EROFS, great work.
>>> However, if we multiply the average layer size by 10, we get the total
>>> size (5.4 MiB * 10 ~ 54.2 MiB), whereas for PuzzleFS, we see that while
>>> the average layer size is 30 MIB (for the compressed case), the unified
>>> size is only 53 MiB. So this tells me there's blob sharing between the
>>> different versions of Ubuntu Jammy with PuzzleFS, but there's no sharing
>>> with EROFS (what I'm talking about is deduplication across the multiple
>>> versions of Ubuntu Jammy and not within one single version).
>>
>> Don't make me wrong, I don't think you got the point.
>>
>> First, what you asked was `I'm referring specifically to this
>> comment: "EROFS already supports variable-sized chunks + CDC"`,
>> so I clearly answered with the result of compressed data global
>> deduplication with CDC.
>>
>> Here both EROFS and Squashfs compresses 10 Ubuntu images into
>> one image for fair comparsion to show the benefit of CDC, so
> 
> It might be a fair comparison, but that's not how container images are
> distributed. You're trying to argue that I should just use EROFS and I'm

First, OCI layer is just distributed like what I said.

For example, I could introduce some common blobs to keep
chunks as chunk dictionary.   And then the each image
will be just some index, and all data will be
deduplicated.  That is also what Nydus works.

> showing you that EROFS doesn't currently support the functionality
> provided by PuzzleFS: the deduplication across multiple images.

No, EROFS supports external devices/blobs to keep a lot of
chunks too (as dictionary to share data among images), but
clearly it has the upper limit.

But PuzzleFS just treat each individual chunk as a seperate
file, that will cause unavoidable "open arbitary number of
files on reading, even in page fault context".

> 
>> I believe they basically equal to your `Unified size`s, so
>> the result is
>>
>> 			Your unified size
>> 	EROFS (LZ4HC,12,64k)  54.2
>> 	PuzzleFS compressed   53?
>> 	EROFS (DEFLATE,9,32k) 46.4
>>
>> That is why I used your 53 unified size to show EROFS is much
>> smaller than PuzzleFS.
>>
>> The reason why EROFS and SquashFS doesn't have the `Total Size`s
>> is just because we cannot store every individual chunk into some
>> seperate file.
> 
> Well storing individual chunks into separate files is the entire point
> of PuzzleFS.
> 
>>
>> Currently, I have seen no reason to open arbitary kernel files
>> (maybe hundreds due to large folio feature at once) in the page
>> fault context.  If I modified `mkfs.erofs` tool, I could give
>> some similar numbers, but I don't want to waste time now due
>> to `open arbitary kernel files in the page fault context`.
>>
>> As I said, if PuzzleFS finally upstream some work to open kernel
>> files in page fault context, I will definitely work out the same
>> feature for EROFS soon, but currently I don't do that just
>> because it's very controversal and no in-tree kernel filesystem
>> does that.
> 
> The PuzzleFS kernel filesystem driver is still in an early POC stage, so
> there's still a lot more work to be done.

I suggest that you could just ask FS/MM folks about this ("open
kernel files when reading in the page fault") first.

If they say "no", I suggest please don't waste on this anymore.

Thanks,
Gao Xiang