[PATCH 2/3] powerpc/e6500: hw tablewalk: optimize a bit for tcd lock acquiring codes
Scott Wood
scottwood at freescale.com
Fri Aug 14 04:44:43 AEST 2015
On Thu, 2015-08-13 at 19:51 +0800, Kevin Hao wrote:
> It makes no sense to put the instructions for calculating the lock
> value (cpu number + 1) and the clearing of eq bit of cr1 in lbarx/stbcx
> loop. And when the lock is acquired by the other thread, the current
> lock value has no chance to equal with the lock value used by current
> cpu. So we can skip the comparing for these two lock values in the
> lbz/bne loop.
>
> Signed-off-by: Kevin Hao <haokexin at gmail.com>
> ---
> arch/powerpc/mm/tlb_low_64e.S | 10 +++++-----
> 1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/arch/powerpc/mm/tlb_low_64e.S b/arch/powerpc/mm/tlb_low_64e.S
> index 765b419883f2..e4185581c5a7 100644
> --- a/arch/powerpc/mm/tlb_low_64e.S
> +++ b/arch/powerpc/mm/tlb_low_64e.S
> @@ -308,11 +308,11 @@ BEGIN_FTR_SECTION /* CPU_FTR_SMT */
> *
> * MAS6:IND should be already set based on MAS4
> */
> -1: lbarx r15,0,r11
> lhz r10,PACAPACAINDEX(r13)
> - cmpdi r15,0
> - cmpdi cr1,r15,1 /* set cr1.eq = 0 for non-recursive */
> addi r10,r10,1
> + crclr cr1*4+eq /* set cr1.eq = 0 for non-recursive */
> +1: lbarx r15,0,r11
> + cmpdi r15,0
> bne 2f
You're optimizing the contended case at the expense of introducing stalls in
the uncontended case. Does it really matter if there are more instructions
in the loop? This change just means that you'll spin in the loop for more
iterations (if it even does that -- I think the cycles per loop iteration
might be the same before and after, due to load latency and pairing) while
waiting for the other thread to release the lock.
Do you have any benchmark results for this patch?
-Scott
More information about the Linuxppc-dev
mailing list