[v3 0/9] parallelized "struct page" zeroing
David Miller
davem at davemloft.net
Sat May 13 03:37:42 AEST 2017
From: Pasha Tatashin <pasha.tatashin at oracle.com>
Date: Fri, 12 May 2017 13:24:52 -0400
> Right now it is larger, but what I suggested is to add a new optimized
> routine just for this case, which would do STBI for 64-bytes but
> without membar (do membar at the end of memmap_init_zone() and
> deferred_init_memmap()
>
> #define struct_page_clear(page) \
> __asm__ __volatile__( \
> "stxa %%g0, [%0]%2\n" \
> "stxa %%xg0, [%0 + %1]%2\n" \
> : /* No output */ \
> : "r" (page), "r" (0x20), "i"(ASI_BLK_INIT_QUAD_LDD_P))
>
> And insert it into __init_single_page() instead of memset()
>
> The final result is 4.01s/T which is even faster compared to current
> 4.97s/T
Ok, indeed, that would work.
More information about the Linuxppc-dev
mailing list