[PATCH 19/31] powerpc/mm: Convert 4k hash insert to C

Aneesh Kumar K.V aneesh.kumar at linux.vnet.ibm.com
Tue Sep 29 18:13:43 AEST 2015

Benjamin Herrenschmidt <benh at kernel.crashing.org> writes:

> On Mon, 2015-09-21 at 12:10 +0530, Aneesh Kumar K.V wrote:
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.vnet.ibm.com>
>> ---
>>  arch/powerpc/mm/Makefile        |   3 +
>>  arch/powerpc/mm/hash64_64k.c    | 204 +++++++++++++++++++++
>>  arch/powerpc/mm/hash_low_64.S   | 380 ------------------------------
>> ----------
>>  arch/powerpc/mm/hash_utils_64.c |   4 +-
>>  4 files changed, 210 insertions(+), 381 deletions(-)
>>  create mode 100644 arch/powerpc/mm/hash64_64k.c
> Did you check if there was any measurable performance difference ?

I looked at the performance number with and without patch. I don't see much
impact in the numbers. We do have a path lengh increase ( I measured this
using systemsim)

Path length __hash_page_4k
with patch: 196
without patch: 142

Path length __hash_page_64k
with patch: 219
without patch: 154

But even if we have a path lengh increase of around 50 instructions. We don't see
the impact when running workload. I tried the kernelbuild test. 

With THP enabled (which is default) we see an improvement. I haven't fully looked at
the reason. This could be due to reduced contention of ptl lock. __hash_thp_page is
already a C code.

make -j64 vmlinux modules 
With fix:
real    1m35.509s
user    56m8.565s
sys     4m34.973s

real    1m32.174s
user    57m2.336s
sys     4m39.142s

Without fix:
real    1m37.703s
user    58m50.783s
sys     7m52.440s

real    1m37.890s
user    57m55.445s
sys     7m50.501s

THP disabled:

make -j64 vmlinux modules 
With fix:
real    1m37.197s
user    58m28.672s
sys     7m58.188s

real    1m44.638s
user    58m37.551s
sys     7m53.960s

Without fix:
real    1m41.224s
user    58m46.944s
sys     7m49.714s

real    1m42.585s
user    59m14.019s
sys     7m52.714s


