[v3 0/9] parallelized "struct page" zeroing
Pasha Tatashin
pasha.tatashin at oracle.com
Sat May 13 03:24:52 AEST 2017
On 05/12/2017 12:57 PM, David Miller wrote:
> From: Pasha Tatashin <pasha.tatashin at oracle.com>
> Date: Thu, 11 May 2017 16:59:33 -0400
>
>> We should either keep memset() only for deferred struct pages as what
>> I have in my patches.
>>
>> Another option is to add a new function struct_page_clear() which
>> would default to memset() and to something else on platforms that
>> decide to optimize it.
>>
>> On SPARC it would call STBIs, and we would do one membar call after
>> all "struct pages" are initialized.
>
> No membars will be performed for single individual page struct clear,
> the cutoff to use the STBI is larger than that.
>
Right now it is larger, but what I suggested is to add a new optimized
routine just for this case, which would do STBI for 64-bytes but without
membar (do membar at the end of memmap_init_zone() and
deferred_init_memmap()
#define struct_page_clear(page) \
__asm__ __volatile__( \
"stxa %%g0, [%0]%2\n" \
"stxa %%xg0, [%0 + %1]%2\n" \
: /* No output */ \
: "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))
And insert it into __init_single_page() instead of memset()
The final result is 4.01s/T which is even faster compared to current 4.97s/T
Pasha
More information about the Linuxppc-dev
mailing list