[PATCH v7 3/3] erofs-utils: optimize buffer allocation logic

Li GuiFu bluce.lee at aliyun.com
Sun Feb 7 02:29:40 AEDT 2021



On 2021/1/23 1:11, Gao Xiang via Linux-erofs wrote:
> From: Hu Weiwen <sehuww at mail.scut.edu.cn>
> 
> When using EROFS to pack our dataset which consists of millions of
> files, mkfs.erofs is very slow compared with mksquashfs.
> 
> The bottleneck is `erofs_balloc' and `erofs_mapbh' function, which
> iterate over all previously allocated buffer blocks, making the
> complexity of the algrithm O(N^2) where N is the number of files.
> 
> With this patch:
> 
> * global `last_mapped_block' is mantained to avoid full scan in
> `erofs_mapbh` function.
> 
> * global `mapped_buckets' maintains a list of already mapped buffer
> blocks for each type and for each possible used bytes in the last
> EROFS_BLKSIZ. Then it is used to identify the most suitable blocks in
> future `erofs_balloc', avoiding full scan. Note that not-mapped (and the
> last mapped) blocks can be expended, so we deal with them separately.
> 
> When I test it with ImageNet dataset (1.33M files, 147GiB), it takes
> about 4 hours. Most time is spent on IO.
> 
> Cc: Huang Jianan <jnhuang95 at gmail.com>
> Signed-off-by: Hu Weiwen <sehuww at mail.scut.edu.cn>
> Signed-off-by: Gao Xiang <hsiangkao at aol.com>
> ---
>  include/erofs/cache.h |   1 +
>  lib/cache.c           | 105 ++++++++++++++++++++++++++++++++++++------
>  2 files changed, 93 insertions(+), 13 deletions(-)
> 

It looks good
Reviewed-by: Li Guifu <bluce.lee at aliyun.com>

Thanks,


More information about the Linux-erofs mailing list