[PATCH V2 04/68] powerpc/mm: Use big endian page table for book3s 64

Michael Ellerman mpe at ellerman.id.au
Mon May 30 13:42:44 AEST 2016

On Mon, 2016-05-30 at 09:08 +1000, Anton Blanchard via Linuxppc-dev wrote:
> > That is surprising, do we have any idea what specifically increases
> > the overhead so significantly ? Does gcc know about ldbrx/stdbrx ? I
> > notice in our io.h for example we still do manual ld/std + swap
> > because old processors didn't know these, we should fix that for
> > CONFIG_POWER8 (or is it POWER7 that brought these ?).
> The futex issue seems to be __get_user_pages_fast():
>         ld      r11,0(r6)
>         ...
>         rldicl  r8,r11,32,32
>         rotlwi  r28,r11,24
>         rlwimi  r28,r11,8,8,15
>         rotlwi  r6,r8,24
>         rlwimi  r28,r11,8,24,31
>         rlwimi  r6,r8,8,8,15
>         rlwimi  r6,r8,8,24,31
>         rldicr  r28,r28,32,31
>         or      r28,r28,r6
>         cmpdi   cr7,r28,0
>         beq     cr7,2428
> That's a whole lot of work just to check if a pte is zero. I assume
> the reason gcc can't replace this with a byte reversed load is that
> we access the pte via the READ_ONCE() macro.

Did I mention we need a bswap instruction?

We can possibly improve some of them by doing the comparison on the raw value,
eg. see hash__pte_same().

The above is from pgd_none() ?


More information about the Linuxppc-dev mailing list