[PATCH 2/2] erofs-utils: lib: improve freeing hashmap in erofs_blob_exit()

Sandeep Dhavale dhavale at google.com
Fri May 24 17:07:48 AEST 2024


On Thu, May 23, 2024 at 10:28 PM Gao Xiang <hsiangkao at linux.alibaba.com> wrote:
>
> Hi Sandeep,
>
Hi Gao,
> On 2024/5/24 05:01, Sandeep Dhavale wrote:
> > Depending on size of the filesystem being built there can be huge number
> > of elements in the hashmap. Currently we call hashmap_iter_first() in
> > while loop to iterate and free the elements. However technically
> > correct, this is inefficient in 2 aspects.
> >
> > - As we are iterating the elements for removal, we do not need overhead of
> > rehashing.
> > - Second part which contributes hugely to the performance is using
> > hashmap_iter_first() as it starts scanning from index 0 throwing away
> > the previous successful scan. For sparsely populated hashmap this becomes
> > O(n^2) in worst case.
> >
> > Lets fix this by disabling hashmap shrink which avoids rehashing
> > and use hashmap_iter_next() which is now guaranteed to iterate over
> > all the elements while removing while avoiding the performance pitfalls
> > of using hashmap_iter_first().
> >
> > Test with random data shows performance improvement as:
> >
> > fs_size  Before   After
> > 1G     23s      7s
> > 2G     81s      15s
> > 4G     272s     31s
> > 8G     1252s    61s
>
> Sigh.. BTW, in the long term, I guess we might need to
> find a better hashmap implementation (with MIT or BSD
> license) instead of this one.
>
Ok, sounds good. I have not seen any more problems (yet). But will be
good to see if something better is available.
> >
> > Signed-off-by: Sandeep Dhavale <dhavale at google.com>> ---
> >   lib/blobchunk.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> >
> > diff --git a/lib/blobchunk.c b/lib/blobchunk.c
> > index 645bcc1..8082aa4 100644
> > --- a/lib/blobchunk.c
> > +++ b/lib/blobchunk.c
> > @@ -548,11 +548,17 @@ void erofs_blob_exit(void)
> >       if (blobfile)
> >               fclose(blobfile);
> >
> > -     while ((e = hashmap_iter_first(&blob_hashmap, &iter))) {
> > +     /* Disable hashmap shrink, effectively disabling rehash.
> > +      * This way we can iterate over entire hashmap efficiently
> > +      * and safely by using hashmap_iter_next() */
>
> I will fix up the comment style manually, otherwise it
> looks good to me...
>
Ok, thanks!

> Reviewed-by: Gao Xiang <hsiangkao at linux.alibaba.com>
>
> Thanks,
> Gao Xiang


More information about the Linux-erofs mailing list