[PATCH v7 3/3] erofs-utils: optimize buffer allocation logic
Li GuiFu
bluce.lee at aliyun.com
Sun Feb 7 02:29:40 AEDT 2021
On 2021/1/23 1:11, Gao Xiang via Linux-erofs wrote:
> From: Hu Weiwen <sehuww at mail.scut.edu.cn>
>
> When using EROFS to pack our dataset which consists of millions of
> files, mkfs.erofs is very slow compared with mksquashfs.
>
> The bottleneck is `erofs_balloc' and `erofs_mapbh' function, which
> iterate over all previously allocated buffer blocks, making the
> complexity of the algrithm O(N^2) where N is the number of files.
>
> With this patch:
>
> * global `last_mapped_block' is mantained to avoid full scan in
> `erofs_mapbh` function.
>
> * global `mapped_buckets' maintains a list of already mapped buffer
> blocks for each type and for each possible used bytes in the last
> EROFS_BLKSIZ. Then it is used to identify the most suitable blocks in
> future `erofs_balloc', avoiding full scan. Note that not-mapped (and the
> last mapped) blocks can be expended, so we deal with them separately.
>
> When I test it with ImageNet dataset (1.33M files, 147GiB), it takes
> about 4 hours. Most time is spent on IO.
>
> Cc: Huang Jianan <jnhuang95 at gmail.com>
> Signed-off-by: Hu Weiwen <sehuww at mail.scut.edu.cn>
> Signed-off-by: Gao Xiang <hsiangkao at aol.com>
> ---
> include/erofs/cache.h | 1 +
> lib/cache.c | 105 ++++++++++++++++++++++++++++++++++++------
> 2 files changed, 93 insertions(+), 13 deletions(-)
>
It looks good
Reviewed-by: Li Guifu <bluce.lee at aliyun.com>
Thanks,
More information about the Linux-erofs
mailing list