[RFCv2] erofs-utils:code for detecting and tracking holes in uncompressed sparse files.

Gao Xiang gaoxiang25 at huawei.com
Tue Dec 24 22:15:53 AEDT 2019


On Tue, Dec 24, 2019 at 04:15:47PM +0530, Pratik Shinde wrote:
> Hi Gao,
> 
> No no. What I am saying is - in the current code (excluding all my changes)
> the block lookup will happens in constant time. with only hole list it

Not only lookup but other interfaces such as fiemap, that is why called
flat mode and fast path.

> won't be O(1) time but rather we have to traverse the holes list. (say in
> binary search way).
> what I don't understand is - what is the purpose of tracking data extents.
> hope you get it.

Mode plain and inline are called flat modes, which is the most common
case of regular and dir files. You can see that's the fastest path for
most file accesses (minimum metadata).

The reason why don't extend the flat modes but introduce another new
sparse mode for 3 main reasons:
 1) introduce a complete enhanced new extent table (or later B+-tree);
 2) we don't even know how many holes in the file if we only read
    inode base metadata, some extra header (no matter extent or hole
    header) need to be readed in advance;
 3) Old kernel backward compatibility need to be considered, not all
    files are sparsed, and we need to get them work properly, and rest
    files are sparsed, we need to block such files from accessed by
    old kernels;

Note that i_format is for such use, so we can introduce sparse mode
with some enhanced on-disk representation (but with more metadata
read amplification than flat modes).

So if files without holes it should be considered as flat modes (fast
path), and then considering the slow path --- upcoming sparse mode.

The purpose of tracking data extents is we could then use it
for deduping, repeated data or data redirect. Hole can only be 0
though.

Thanks,
Gao Xiang

> 
> --Pratik.
> 


More information about the Linux-erofs mailing list