Problem in EROFS: Not able to read the files after mount

Fri Mar 20 22:16:36 AEDT 2020

Hi Saumya,

On Fri, Mar 20, 2020 at 01:30:39PM +0530, Saumya Panda wrote:
> Hi Gao,
>   I am trying to evaluate Erofs on my device. Right now SquashFS is used
> for system files. Hence I am trying to compare Erofs with SquashFs. On my
> device with the below environment I am seeing Erofs is 3 times faster than
> SquashFS 128k (I used enwik8 (100MB) as testing file)) while doing Seq
> Read. Your test result shows it is near to SquasFs 128k. How Erofs is so
> fast for Seq Read?  I also tested  it on Suse VM with low memory(free
> memory 425MB) and I am seeing Erofs is pretty fast.
> 
> Also Can you tell me how to run FIO on directory instead of files ?
>  fio -filename=$i -rw=read -bs=4k -name=seqbench

Thanks for your detailed words.

Firstly, I cannot think out some way to run FIO on directory directly.
And maybe some numbers below are still strange in my opinion.

OK, Actually, I don't want to leave a lot of (maybe aggressive) comments
publicly to compare one filesystem with other filesystems, such as EROFS
vs squashfs (or ext4 vs f2fs). But there are actually some exist materials
which did this before, if you have some extra time, you could read through
the following reference materials about EROFS (although some parts are outdated):

[1] https://static.sched.com/hosted_files/kccncosschn19chi/ce/EROFS%20file%20system_OSS2019_Final.pdf
[2] https://www.usenix.org/system/files/atc19-gao.pdf

The reason why I think in this way is that (Objectively, I think) people
have their own judgement / insistance on every stuffs. But okay, there are
some hints why EROFS behaves well in this email (compared with Squashfs, but
I really want to avoid such aggressive topics):

 o EROFS has carefully designed critial paths, such as async decompression
   path. that partly answers your question about sequential read behavior;

 o EROFS has well-designed compression metadata (called EROFS compacted
   index). Each logic compressed block only takes 2-byte metadata on average
   (high information entropy, so no need to compress compacted indexes again)
   and it supports random read without pervious meta dependence. In contrast,
   the on-disk metadata of Squashfs doesn't support random read (and even
   metadata itself could be compressed), which means you have to cached more
   metadata in memory for random read, or you'll stand with its bad metadata
   random access performance. some hint: see ondisk blocklist, index cache
   and read_blocklist();

 o EROFS firstly uses fixed-sized output compression in filesystem field.
   By using fixed-sized output compression, EROFS can easily implement
   in-place decompression (or at least in-place I/O), which means that it
   doesn't allocate physical pages for most cases, therefore fewer memory
   reclaim/compaction possibility and keeps useful file-backed page cache
   as much as possible;

 o EROFS has designed on-disk directory format, it supports directory
   random access compared with current Squashfs;

In a word, I don't think the current on-disk squashfs is a well-designed
stuff in the long term. In other words, EROFS is a completely different
stuff either from its principle, the on-disk format  and runtime
implementation...)

By the way, the pervious link
https://blog.csdn.net/scnutiger/article/details/102507596
was _not_ written by me. I just noticed it by chance, I think
it was written by some Chinese kernel developer from some other
Android vendor.

And FIO cannot benchmark all cases, heavy memory workload
doesn't completely equal to low memory as well.

However, there is my FIO test script to benchmark different fses:

https://github.com/erofs/erofs-openbenchmark/blob/master/fio-benchmark.sh

for reference. Personally, I think it's reasonable.

It makes more sense to use designed dynamic model. Huawei interally uses
several well-designed light/heavy workloads to benchmark the whole system.

In addition, I noticed many complaints about Squashfs, e.g:

https://forum.snapcraft.io/t/squashfs-is-a-terrible-storage-format/9466

I don't want to comment the whole content itself. But for such runtime
workloads, I'd suggest using EROFS instead and see if it performs better
(compared with any configuration of squashfs+lz4).

There are many ongoing stuffs to do, but I'm really busy recently. After
implementing LZMA and larger compress cluster, I think EROFS will be more
useful, but it needs to be carefully designed first in order to avoid
further complexity of the whole solution. 

Sorry about my English, hope it of some help..

Thanks,
Gao Xiang

> 
>              Test on Embedded Device:
> 
> Total Memory 5.5 GB:
> 
>  Free Memory 1515
> 
>  No Swap
> 
> 
> $: /fio/erofs_test]$ free -m
> 
>               total        used        free      shared  buff/cache
> available
> 
> Mem:           5384        2315        1515        1378        1553
> 1592
> 
> Swap:             0           0           0
> 
> 
> 
> 
> 
> Seq Read
> 
> 
> 
> Rand Read
> 
> 
> 
> 
> 
> squashFS 4k
> 
> 
> 
> 51.8MB/s
> 
> 1931msec
> 
> 45.7MB/s
> 
> 2187msec
> 
> 
> 
> SquashFS 128k
> 
> 
> 
> 116MB/s
> 
> 861msec
> 
> 14MB/s
> 
> 877msec
> 
> 
> 
> SquashFS 1M
> 
> 
> 
> 124MB/s-124MB/s
> 
> 805msec
> 
> 119MB/s
> 
> 837msec
> 
> 
> 
> 
> 
> Erofs 4k
> 
> 
> 
> 658MB/s-658MB/s
> 
> 152msec
> 
> 
> 
> 103MB
> 
> 974msec
> 
> 
> 
> 
> 
> 
> 
>  Test on Suse VM:
> 
> 
> Total Memory 1.5 GB:
> 
>  Free Memory 425
> 
>  No Swap
> 
> localhost:/home/saumya/Documents/erofs_test # free -m
>               total        used        free      shared  buff/cache
> available
> Mem:           1436         817         425           5         192
> 444
> Swap:             0           0           0
> 
> 
> 
> 
> 
> 
> Seq Read
> 
> 
> 
> Rand Read
> 
> 
> 
> 
> 
> squashFS 4k
> 
> 
> 
> 30.7MB/s
> 
> 3216msec
> 
> 9333kB/s
> 
> 10715msec
> 
> 
> 
> SquashFS 128k
> 
> 
> 
> 318MB/s
> 
> 314msec
> 
> 5946kB/s
> 
> 16819msec
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> Erofs 4k
> 
> 
> 
> 469MB/s
> 
> 213msec
> 
> 
> 
> 11.9MB/s
> 
> 8414msec
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Wed, Jan 29, 2020 at 10:30 AM Gao Xiang <hsiangkao at aol.com> wrote:
> 
> > On Wed, Jan 29, 2020 at 09:43:37AM +0530, Saumya Panda wrote:
> > >
> > > localhost:~> fio --name=randread --ioengine=libaio --iodepth=16
> > > --rw=randread --bs=4k --direct=0 --size=512M --numjobs=4 --runtime=240
> > > --group_reporting --filename=/mnt/enwik9_erofs/enwik9
> > >
> > > randread: (g=0): rw=randread, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T)
> > > 4096B-4096B, ioengine=libaio, iodepth=16
> >
> > And I don't think such configuration is useful to calculate read
> > ampfication
> > since you read 100% finally, use multi-thread without memory limitation
> > (all
> > compressed data will be cached, so the total read is compressed size).
> >
> > I have no idea what you want to get via doing comparsion between EROFS and
> > Squashfs. Larger block size much like readahead in bulk. If you benchmark
> > uncompressed file systems, you will notice such filesystems cannot get such
> > high 100% randread number.
> >
> > Thank,
> > Gao Xiang
> >
> >
> 
> -- 
> Thanks,
> Saumya Prakash Panda