[v3 0/9] parallelized "struct page" zeroing

Michal Hocko mhocko at kernel.org
Thu Jun 1 18:46:09 AEST 2017


On Wed 31-05-17 23:35:48, Pasha Tatashin wrote:
> >OK, so why cannot we make zero_struct_page 8x 8B stores, other arches
> >would do memset. You said it would be slower but would that be
> >measurable? I am sorry to be so persistent here but I would be really
> >happier if this didn't depend on the deferred initialization. If this is
> >absolutely a no-go then I can live with that of course.
> 
> Hi Michal,
> 
> This is actually a very good idea. I just did some measurements, and it
> looks like performance is very good.
> 
> Here is data from SPARC-M7 with 3312G memory with single thread performance:
> 
> Current:
> memset() in memblock allocator takes: 8.83s
> __init_single_page() take: 8.63s
> 
> Option 1:
> memset() in __init_single_page() takes: 61.09s (as we discussed because of
> membar overhead, memset should really be optimized to do STBI only when size
> is 1 page or bigger).
> 
> Option 2:
> 
> 8 stores (stx) in __init_single_page(): 8.525s!
> 
> So, even for single thread performance we can double the initialization
> speed of "struct page" on SPARC by removing memset() from memblock, and
> using 8 stx in __init_single_page(). It appears we never miss L1 in
> __init_single_page() after the initial 8 stx.

OK, that is good to hear and it actually matches my understanding that
writes to a single cacheline should add an overhead.

Thanks!
-- 
Michal Hocko
SUSE Labs


More information about the Linuxppc-dev mailing list