[PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64
Anton Blanchard
anton at samba.org
Mon May 30 09:08:33 AEST 2016
Hi Ben,
> That is surprising, do we have any idea what specifically increases
> the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I
> notice in our io.h for example we still do manual ld/std + swap
> because old processors didn't know these, we should fix that for
> CONFIG_POWER8 (or is it POWER7 that brought these ?).
The futex issue seems to be __get_user_pages_fast():
ld r11,0(r6)
...
rldicl r8,r11,32,32
rotlwi r28,r11,24
rlwimi r28,r11,8,8,15
rotlwi r6,r8,24
rlwimi r28,r11,8,24,31
rlwimi r6,r8,8,8,15
rlwimi r6,r8,8,24,31
rldicr r28,r28,32,31
or r28,r28,r6
cmpdi cr7,r28,0
beq cr7,2428
That's a whole lot of work just to check if a pte is zero. I assume
the reason gcc can't replace this with a byte reversed load is that
we access the pte via the READ_ONCE() macro.
I see the same issue in unmap_page_range(), __hash_page_64K(),
handle_mm_fault().
The other issue I see is when we access a pte via larx/stcx, and then
we have no choice but to byte swap it manually. I see that in
__hash_page_64K():
rldicl r28,r30,32,32
rotlwi r0,r30,24
rlwimi r0,r30,8,8,15
rotlwi r10,r28,24
rlwimi r0,r30,8,24,31
rlwimi r10,r28,8,8,15
rlwimi r10,r28,8,24,31
rldicr r0,r0,32,31
or r0,r0,r10
hwsync
ldarx r12,0,r6
cmpd r12,r11
bne- c00000000004fad0
stdcx. r0,0,r6
bne- c00000000004fab8
hwsync
Anton
More information about the Linuxppc-dev
mailing list