[PATCH 0/2] erofs-utils: improve performance of mkfs with dedupe

Sandeep Dhavale dhavale at google.com
Fri May 24 07:01:29 AEST 2024


I got a report from AOSP user that performance of mkfs.erofs with dedupe
option that mkfs.erofs time increased to very high number. For example
creation of 8GB uncompressed erofs image increased from 36seconds to
27minutes when dedupe was enabled. After profiling mkfs.erofs for sample
data, I observed that the actual increased in time was coming from
erofs_blob_exit() and debugging further it showed that real inefficiency
was coming from hashmap_iter_first() which starts scanning for the first
element from tablepos = 0 always.

The following patches solve this by
- creating a helper function to disable hashmap shrinking
- using hashmap_iter_next() to avoid scanning from 0 and as rehashing is
  disabled it is guaranteed to go through all the elements even while
  doing hashmap_remove().

Test results now show order of magnitude improvements for larger
filesystem size.

You can verify the improvements with below steps

$ mkdir fs_data
$ dd if=/dev/urandom of=fs_data/random_file.bin bs=1M count=8192
$ time mkfs.erofs --chunksize=4096 erofs_dedupe.img fs_data

fs_size  Before   After   Improvement
1G       23s      7s	  3.2x
2G       81s      15s	  5.4x
4G       272s     31s     8.77x
8G       1252s    61s     20.52x

Thanks,
Sandeep

Sandeep Dhavale (2):
  erofs-utils: lib: provide helper to disable hashmap shrinking
  erofs-utils: lib: improve freeing hashmap in erofs_blob_exit()

 include/erofs/hashmap.h | 4 ++++
 lib/blobchunk.c         | 8 +++++++-
 2 files changed, 11 insertions(+), 1 deletion(-)

-- 
2.45.1.288.g0e0cd299f1-goog



More information about the Linux-erofs mailing list