[PATCH v2] erofs: relaxed temporary buffers allocation on readahead

Fri Jan 26 14:49:45 AEDT 2024

On 2024/1/26 11:42, Chunhai Guo wrote:
> On 2024/1/26 10:47, Gao Xiang wrote:
>> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
>>
>> On 2024/1/26 10:41, Chunhai Guo wrote:
>>> On 2024/1/22 15:42, Chunhai Guo wrote:
>>>> On 2024/1/22 12:37, Gao Xiang wrote:
>>>>> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
>>>>>
>>>>> On 2024/1/22 11:49, Chunhai Guo wrote:
>>>>>> On 2024/1/22 10:07, Gao Xiang wrote:
>>>>>>> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问 https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
>>>>>>>
>>>>>>> On 2024/1/20 22:55, Chunhai Guo wrote:
>>>>>>>> Even with inplace decompression, sometimes extra temporary buffers are
>>>>>>>> still needed for decompression.  In low-memory scenarios, it would be
>>>>>>>> better to try to allocate with GFP_NOWAIT on readahead first. That can
>>>>>>>> help reduce the time spent on page allocation under memory pressure.
>>>>>>>>
>>>>>>>> There is an average reduction of 21% in page allocation time under
>>>>>>> It would be better to add a table to show the absolute numbers too
>>>>>>> (like what you did in the global pool commit.)  If it's possible, there
>>>>>>> is no need to send a update version for this, just reply the updated
>>>>>>> commit message and I will update the commit manually.
>>>>>> The table below shows detailed numbers. The reduction I mentioned before
>>>>>> was not accurate enough. Please help correct the improvement from 21% to
>>>>>> 20.21%.
>>>>>>
>>>>>>
>>>>>> +--------------+----------------+---------------+---------+
>>>>>> |              | w/o GFP_NOWAIT | w/ GFP_NOWAIT |  diff   |
>>>>>> +--------------+----------------+---------------+---------+
>>>>>> | Average (ms) |     3364       |      2684     | -20.21% |
>>>>>> +--------------+----------------+---------------+---------+
>>>>> Did it test without the 16k sliding window change?
>>>>> https://lore.kernel.org/linux-erofs/69711d55-f7a2-420b-9ba8-fa2921f66a4c@vivo.com
>>>> The result is tested with 64k sliding window change.
>>>>
>>>>> Could you benchmark these two optimizations together to
>>>>> show the extreme optimized case without a global pool?
>>>>> With a new table if possible? I will add this to
>>>>> the commit message too.
>>>> OK. I will reply to this email when the benchmark is finished.
>>> The benchmark has been completed and the table below shows that there is
>>> an average 52.14% reduction in page allocation time with these two
>>> optimizations.
>>>
>>> +--------------+----------------+---------------+---------+ | | 64k
>>> window | 16k window | | | | w/o GFP_NOWAIT | w/ GFP_NOWAIT | diff |
>>> +--------------+----------------+---------------+---------+ | Average
>>> (ms) | 3364 | 1610 | -52.14% |
>>> +--------------+----------------+---------------+---------+
>>>
>>> Table below summarizes the results of these three benchmarks.
>>>
>>> +--------------+----------------+----------------+---------------+---------------+
>>> |              |   64k window   |   16k window   |   64k window  | 16k
>>> window  |
>>> |              | w/o GFP_NOWAIT | w/o GFP_NOWAIT | w/ GFP_NOWAIT | w/
>>> GFP_NOWAIT |
>>> +--------------+----------------+----------------+---------------+---------------+
>>> | Average (ms) |     3364       |      2079      |      2684 |
>>> 1610     |
>>> +--------------+----------------+----------------+---------------+---------------+
>>> |     diff     |                |     -38.19%    |     -20.81% |
>>> -52.14%   |
>>> +--------------+----------------+----------------+---------------+---------------+
>>
>> The tables shows in a mess, could you just list the
>> numbers so I could refine this?
> 
> Sorry that there might be some issues with my email client. Here are the
> numerical results below.
>       64k window w/o GFP_NOWAIT : 3364
>       16k window w/o GFP_NOWAIT : 2079, diff: -38.19%
>       64k window w/  GFP_NOWAIT : 2684, diff: -20.81%
>       16k window w/  GFP_NOWAIT : 1610, diff: -52.14%
> 
> Images size comparision:
>       64k: 9117044 KB
>       16k: 9113096 KB

That is with 4k pcluster, yes?  I guess the overall image size
won't have great impacts, but it seems even getting smaller. :-)

I think this optimization would be helpful to everyone without
any extra memory reservation (which will be good too for much
much low-ended devices), let me revise the commit for formal
submission..

Thanks,
Gao Xiang

> 
> Thanks,
> 
>>
>> Thanks,
>> Gao Xiang
> 
>