[RFC PATCH 0/5] powerpc/mm/slice: improve slice speed and stack use

Christophe LEROY christophe.leroy at c-s.fr
Tue Feb 13 02:02:23 AEDT 2018



Le 10/02/2018 à 09:11, Nicholas Piggin a écrit :
> This series intends to improve performance and reduce stack
> consumption in the slice allocation code. It does it by keeping slice
> masks in the mm_context rather than compute them for each allocation,
> and by reducing bitmaps and slice_masks from stacks, using pointers
> instead where possible.
> 
> checkstack.pl gives, before:
> 0x00000de4 slice_get_unmapped_area [slice.o]:           656
> 0x00001b4c is_hugepage_only_range [slice.o]:            512
> 0x0000075c slice_find_area_topdown [slice.o]:           416
> 0x000004c8 slice_find_area_bottomup.isra.1 [slice.o]:   272
> 0x00001aa0 slice_set_range_psize [slice.o]:             240
> 0x00000a64 slice_find_area [slice.o]:                   176
> 0x00000174 slice_check_fit [slice.o]:                   112
> 
> after:
> 0x00000d70 slice_get_unmapped_area [slice.o]:           320
> 0x000008f8 slice_find_area [slice.o]:                   144
> 0x00001860 slice_set_range_psize [slice.o]:             144
> 0x000018ec is_hugepage_only_range [slice.o]:            144
> 0x00000750 slice_find_area_bottomup.isra.4 [slice.o]:   128
> 
> The benchmark in https://github.com/linuxppc/linux/issues/49 gives, before:
> $ time ./slicemask
> real	0m20.712s
> user	0m5.830s
> sys	0m15.105s
> 
> after:
> $ time ./slicemask
> real	0m13.197s
> user	0m5.409s
> sys	0m7.779s

Hi,

I tested your serie on an 8xx, on top of patch 
https://patchwork.ozlabs.org/patch/871675/

I don't get a result as significant as yours, but there is some 
improvment anyway:

ITERATION 500000

Before:

root at vgoip:~# time ./slicemask
real    0m 33.26s
user    0m 1.94s
sys     0m 30.85s

After:
root at vgoip:~# time ./slicemask
real    0m 29.69s
user    0m 2.11s
sys     0m 27.15s

Most significant improvment is obtained with the first patch of your serie:
root at vgoip:~# time ./slicemask
real    0m 30.85s
user    0m 1.80s
sys     0m 28.57s

Had to modify your serie a bit, if you are interested I can post it.

Christophe


> 
> Thanks,
> Nick
> 
> Nicholas Piggin (5):
>    powerpc/mm/slice: pass pointers to struct slice_mask where possible
>    powerpc/mm/slice: implement a slice mask cache
>    powerpc/mm/slice: implement slice_check_range_fits
>    powerpc/mm/slice: Use const pointers to cached slice masks where
>      possible
>    powerpc/mm/slice: use the dynamic high slice size to limit bitmap
>      operations
> 
>   arch/powerpc/include/asm/book3s/64/mmu.h |  20 +-
>   arch/powerpc/mm/slice.c                  | 302 +++++++++++++++++++------------
>   2 files changed, 204 insertions(+), 118 deletions(-)
> 


More information about the Linuxppc-dev mailing list