[v3 0/9] parallelized "struct page" zeroing
Pasha Tatashin
pasha.tatashin at oracle.com
Thu May 11 01:01:40 AEST 2017
On 05/10/2017 10:57 AM, Michal Hocko wrote:
> On Wed 10-05-17 09:42:22, Pasha Tatashin wrote:
>>>
>>> Well, I didn't object to this particular part. I was mostly concerned
>>> about
>>> http://lkml.kernel.org/r/1494003796-748672-4-git-send-email-pasha.tatashin@oracle.com
>>> and the "zero" argument for other functions. I guess we can do without
>>> that. I _think_ that we should simply _always_ initialize the page at the
>>> __init_single_page time rather than during the allocation. That would
>>> require dropping __GFP_ZERO for non-memblock allocations. Or do you
>>> think we could regress for single threaded initialization?
>>>
>>
>> Hi Michal,
>>
>> Thats exactly right, I am worried that we will regress when there is no
>> parallelized initialization of "struct pages" if we force unconditionally do
>> memset() in __init_single_page(). The overhead of calling memset() on a
>> smaller chunks (64-bytes) may cause the regression, this is why I opted only
>> for parallelized case to zero this metadata. This way, we are guaranteed to
>> see great improvements from this change without having regressions on
>> platforms and builds that do not support parallelized initialization of
>> "struct pages".
>
> Have you measured that? I do not think it would be super hard to
> measure. I would be quite surprised if this added much if anything at
> all as the whole struct page should be in the cache line already. We do
> set reference count and other struct members. Almost nobody should be
> looking at our page at this time and stealing the cache line. On the
> other hand a large memcpy will basically wipe everything away from the
> cpu cache. Or am I missing something?
>
Perhaps you are right, and I will measure on x86. But, I suspect hit can
become unacceptable on some platfoms: there is an overhead of calling a
function, even if it is leaf-optimized, and there is an overhead in
memset() to check for alignments of size and address, types of setting
(zeroing vs. non-zeroing), etc., that adds up quickly.
Pasha
More information about the Linuxppc-dev
mailing list