[PATCH] erofs: add a global page pool for lz4 decompression

Tue Jan 2 17:08:56 AEDT 2024

On 2023/12/31 9:09, Gao Xiang wrote:
> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
>
> On 2023/12/29 12:48, Chunhai Guo wrote:
>>> Hi Chunhai,
>>>
>>> On 2023/12/28 21:00, Chunhai Guo wrote:
>>>> Using a global page pool for LZ4 decompression significantly reduces
>>>> the time spent on page allocation in low memory scenarios.
>>>>
>>>> The table below shows the reduction in time spent on page allocation
>>>> for
>>>> LZ4 decompression when using a global page pool.
>>>> The results were obtained from multi-app launch benchmarks on ARM64
>>>> Android devices running the 5.15 kernel.
>>>> +--------------+---------------+--------------+---------+
>>>> |              | w/o page pool | w/ page pool |  diff   |
>>>> +--------------+---------------+--------------+---------+
>>>> | Average (ms) |     3434      |      21      | -99.38% |
>>>> +--------------+---------------+--------------+---------+
>>>>
>>>> Based on the benchmark logs, it appears that 256 pages are sufficient
>>>> for most cases, but this can be adjusted as needed. Additionally,
>>>> turning on CONFIG_EROFS_FS_DEBUG will simplify the tuning process.
>>> Thanks for the patch. I have some questions:
>>>     - what pcluster sizes are you using? 4k or more?
>> We currently use a 4k pcluster size.
>>
>>>     - what the detailed configuration are you using for the multi-app
>>>       launch workload? Such as CPU / Memory / the number of apps.
>> We ran the benchmark on Android devices with the following configuration.
>> In the benchmark, we launched 16 frequently-used apps, and the camera app
>> was the last one in each round. The results in the table above were
>> obtained from the launching process of the camera app.
>>        CPU: 8 cores
>>        Memory: 8GB
> It's the accumulated time of camera app for all rounds or the average
> time of camera app for each round?

It's the the average time of camera app for each round.

>>>> This patch currently only supports the LZ4 decompressor, other
>>>> decompressors will be supported in the next step.
>>>>
>>>> Signed-off-by: Chunhai Guo <guochunhai at vivo.com>
>>>> ---
>>>>     fs/erofs/compress.h     |   1 +
>>>>     fs/erofs/decompressor.c |  42 ++++++++++++--
>>>>     fs/erofs/internal.h     |   5 ++
>>>>     fs/erofs/super.c        |   1 +
>>>>     fs/erofs/utils.c        | 121 ++++++++++++++++++++++++++++++++++++++++
>>>>     5 files changed, 165 insertions(+), 5 deletions(-)
>>>>
>>>> diff --git a/fs/erofs/compress.h b/fs/erofs/compress.h index
>>>> 279933e007d2..67202b97d47b 100644
>>>> --- a/fs/erofs/compress.h
>>>> +++ b/fs/erofs/compress.h
>>>> @@ -31,6 +31,7 @@ struct z_erofs_decompressor {
>>>>     /* some special page->private (unsigned long, see below) */
>>>>     #define Z_EROFS_SHORTLIVED_PAGE             (-1UL << 2)
>>>>     #define Z_EROFS_PREALLOCATED_PAGE   (-2UL << 2)
>>>> +#define Z_EROFS_POOL_PAGE            (-3UL << 2)
>>>>
>>>>     /*
>>>>      * For all pages in a pcluster, page->private should be one of diff
>>>> --git a/fs/erofs/decompressor.c b/fs/erofs/decompressor.c index
>>>> d08a6ee23ac5..41b34f01416f 100644
>>>> --- a/fs/erofs/decompressor.c
>>>> +++ b/fs/erofs/decompressor.c
>>>> @@ -54,6 +54,7 @@ static int z_erofs_load_lz4_config(struct super_block *sb,
>>>>         sbi->lz4.max_distance_pages = distance ?
>>>>                                         DIV_ROUND_UP(distance, PAGE_SIZE) + 1 :
>>>>                                         LZ4_MAX_DISTANCE_PAGES;
>>>> +     erofs_global_page_pool_init();
>>>>         return erofs_pcpubuf_growsize(sbi->lz4.max_pclusterblks);
>>>>     }
>>>>
>>>> @@ -111,15 +112,42 @@ static int z_erofs_lz4_prepare_dstpages(struct z_erofs_lz4_decompress_ctx *ctx,
>>>>                         victim = availables[--top];
>>>>                         get_page(victim);
>>>>                 } else {
>>>> -                     victim = erofs_allocpage(pagepool,
>>>> +                     victim = erofs_allocpage_for_decmpr(pagepool,
>>>>                                                  GFP_KERNEL |
>>>> __GFP_NOFAIL);
>>> For each read request, the extreme case here will be 15 pages for 64k LZ4 sliding window (60k = 64k-4k). You could reduce
>>> LZ4 sliding window to save more pages with slight compression ratio loss.
>> OK, we will do the test on this. However, based on the data we have, 97% of
>> the compressed pages that have been read can be decompressed to less than 4
>> pages. Therefore, we may not put too much hope on this.
> Yes, but I'm not sure if just 3% of compressed data denodes the majority of
> latencies.  It'd be better to try it out anyway.

OK, we will do the test on this.

>
>>> Or, here __GFP_NOFAIL is actually unnecessary since we could bail out this if allocation failed for all readahead requests
>>> and only address __read requests__.   I have some plan to do
>>> this but it's too close to the next merge window.  So I was once to work this out for Linux 6.9.
>> This sounds great. It is more likely another optimization related to this
>> case.
>>
>>> Anyway, I'm not saying mempool is not a good idea, but I tend to reserve memory as less as possible if there are some other way to mitigate the same workload since reserving memory is not free (which means 1 MiB memory will be only used for this.) Even we will do a mempool, I wonder if we could unify pcpubuf and mempool together to make a better pool.
>> I totally agree with your opinion. We use 256 pages for the worst-case
>> scenario, and 1 MiB is acceptable in 8GB devices. However, for 95% of
>> scenarios, 64 pages are sufficient and more acceptable for other devices.
>> And you are right, I will create a patch to unify the pcpubuf and mempool
>> in the next step.
> Anyway, if a global mempool is really needed.  I'd like to add
> some new sysfs interface to change this value (by default, 0).
> Also you could reuse some of shortlived interfaces for global
> pool rather than introduce another type of pages.

OK. I will make another patch for this.

Thanks,

>
> Thanks,
> Gao Xiang