答复: [PATCH v2] erofs: relaxed temporary buffers allocation on readahead

Fri Jan 26 14:56:51 AEDT 2024

> -----邮件原件-----
> 发件人: Gao Xiang <hsiangkao at linux.alibaba.com>
> 发送时间: 2024年1月26日 11:50
> 收件人: Chunhai Guo <guochunhai at vivo.com>; xiang at kernel.org
> 抄送: chao at kernel.org; huyue2 at coolpad.com; jefflexu at linux.alibaba.com;
> linux-erofs at lists.ozlabs.org
> 主题: Re: [PATCH v2] erofs: relaxed temporary buffers allocation on readahead
> 
> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问
> https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重要]
> 
> On 2024/1/26 11:42, Chunhai Guo wrote:
> > On 2024/1/26 10:47, Gao Xiang wrote:
> >> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访问
> >> https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么很重
> 要]
> >>
> >> On 2024/1/26 10:41, Chunhai Guo wrote:
> >>> On 2024/1/22 15:42, Chunhai Guo wrote:
> >>>> On 2024/1/22 12:37, Gao Xiang wrote:
> >>>>> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请访
> 问
> >>>>> https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什么
> 很重要]
> >>>>>
> >>>>> On 2024/1/22 11:49, Chunhai Guo wrote:
> >>>>>> On 2024/1/22 10:07, Gao Xiang wrote:
> >>>>>>> [你通常不会收到来自 hsiangkao at linux.alibaba.com 的电子邮件。请
> 访问
> >>>>>>> https://aka.ms/LearnAboutSenderIdentification，以了解这一点为什
> 么很重要]
> >>>>>>>
> >>>>>>> On 2024/1/20 22:55, Chunhai Guo wrote:
> >>>>>>>> Even with inplace decompression, sometimes extra temporary
> >>>>>>>> buffers are still needed for decompression.  In low-memory
> >>>>>>>> scenarios, it would be better to try to allocate with
> >>>>>>>> GFP_NOWAIT on readahead first. That can help reduce the time spent
> on page allocation under memory pressure.
> >>>>>>>>
> >>>>>>>> There is an average reduction of 21% in page allocation time
> >>>>>>>> under
> >>>>>>> It would be better to add a table to show the absolute numbers
> >>>>>>> too (like what you did in the global pool commit.)  If it's
> >>>>>>> possible, there is no need to send a update version for this,
> >>>>>>> just reply the updated commit message and I will update the commit
> manually.
> >>>>>> The table below shows detailed numbers. The reduction I mentioned
> >>>>>> before was not accurate enough. Please help correct the
> >>>>>> improvement from 21% to 20.21%.
> >>>>>>
> >>>>>>
> >>>>>> +--------------+----------------+---------------+---------+
> >>>>>> |              | w/o GFP_NOWAIT | w/ GFP_NOWAIT |  diff   |
> >>>>>> +--------------+----------------+---------------+---------+
> >>>>>> | Average (ms) |     3364       |      2684     | -20.21% |
> >>>>>> +--------------+----------------+---------------+---------+
> >>>>> Did it test without the 16k sliding window change?
> >>>>> https://apc01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> >>>>> lore.kernel.org%2Flinux-erofs%2F69711d55-f7a2-420b-9ba8-fa2921f66a
> >>>>>
> 4c%40vivo.com&data=05%7C02%7Cguochunhai%40vivo.com%7Ceadb2eb3d04
> 74
> >>>>>
> b3b905708dc1e21d8a0%7C923e42dc48d54cbeb5821a797a6412ed%7C0%7C0%
> 7C6
> >>>>>
> 38418377978918986%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAi
> LCJQ
> >>>>>
> IjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=I
> >>>>> QzcxbNhF8ZbG0zCnxAQTba6C3DU6tUC7bzZaISLYJE%3D&reserved=0
> >>>> The result is tested with 64k sliding window change.
> >>>>
> >>>>> Could you benchmark these two optimizations together to show the
> >>>>> extreme optimized case without a global pool?
> >>>>> With a new table if possible? I will add this to the commit
> >>>>> message too.
> >>>> OK. I will reply to this email when the benchmark is finished.
> >>> The benchmark has been completed and the table below shows that
> >>> there is an average 52.14% reduction in page allocation time with
> >>> these two optimizations.
> >>>
> >>> +--------------+----------------+---------------+---------+ | | 64k
> >>> window | 16k window | | | | w/o GFP_NOWAIT | w/ GFP_NOWAIT | diff |
> >>> +--------------+----------------+---------------+---------+ | Averag
> >>> +--------------+----------------+---------------+---------+ | e
> >>> (ms) | 3364 | 1610 | -52.14% |
> >>> +--------------+----------------+---------------+---------+
> >>>
> >>> Table below summarizes the results of these three benchmarks.
> >>>
> >>> +--------------+----------------+----------------+---------------+---------------+
> >>> |              |   64k window   |   16k window   |   64k window  | 16k
> >>> window  |
> >>> |              | w/o GFP_NOWAIT | w/o GFP_NOWAIT | w/ GFP_NOWAIT |
> >>> | w/
> >>> GFP_NOWAIT |
> >>> +--------------+----------------+----------------+---------------+---------------+
> >>> | Average (ms) |     3364       |      2079      |      2684 |
> >>> 1610     |
> >>> +--------------+----------------+----------------+---------------+---------------+
> >>> |     diff     |                |     -38.19%    |     -20.81% |
> >>> -52.14%   |
> >>> +--------------+----------------+----------------+---------------+---------------+
> >>
> >> The tables shows in a mess, could you just list the numbers so I
> >> could refine this?
> >
> > Sorry that there might be some issues with my email client. Here are
> > the numerical results below.
> >       64k window w/o GFP_NOWAIT : 3364
> >       16k window w/o GFP_NOWAIT : 2079, diff: -38.19%
> >       64k window w/  GFP_NOWAIT : 2684, diff: -20.81%
> >       16k window w/  GFP_NOWAIT : 1610, diff: -52.14%
> >
> > Images size comparision:
> >       64k: 9117044 KB
> >       16k: 9113096 KB
> 
> That is with 4k pcluster, yes?  I guess the overall image size won't have great
> impacts, but it seems even getting smaller. :-)

Yes, this is with 4k pcluster.

Thanks,

> 
> I think this optimization would be helpful to everyone without any extra memory
> reservation (which will be good too for much much low-ended devices), let me
> revise the commit for formal submission..
> 
> Thanks,
> Gao Xiang
> 
> >
> > Thanks,
> >
> >>
> >> Thanks,
> >> Gao Xiang
> >
> >