[PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64
mpe at ellerman.id.au
Mon May 30 13:42:44 AEST 2016
On Mon, 2016-05-30 at 09:08 +1000, Anton Blanchard via Linuxppc-dev wrote:
> > That is surprising, do we have any idea what specifically increases
> > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I
> > notice in our io.h for example we still do manual ld/std + swap
> > because old processors didn't know these, we should fix that for
> > CONFIG_POWER8 (or is it POWER7 that brought these ?).
> The futex issue seems to be __get_user_pages_fast():
> ld r11,0(r6)
> rldicl r8,r11,32,32
> rotlwi r28,r11,24
> rlwimi r28,r11,8,8,15
> rotlwi r6,r8,24
> rlwimi r28,r11,8,24,31
> rlwimi r6,r8,8,8,15
> rlwimi r6,r8,8,24,31
> rldicr r28,r28,32,31
> or r28,r28,r6
> cmpdi cr7,r28,0
> beq cr7,2428
> That's a whole lot of work just to check if a pte is zero. I assume
> the reason gcc can't replace this with a byte reversed load is that
> we access the pte via the READ_ONCE() macro.
Did I mention we need a bswap instruction?
We can possibly improve some of them by doing the comparison on the raw value,
eg. see hash__pte_same().
The above is from pgd_none() ?
More information about the Linuxppc-dev