[PATCH 2/3] powerpc/e6500: hw tablewalk: optimize a bit for tcd lock acquiring codes

Scott Wood scottwood at freescale.com
Tue Aug 18 07:08:14 AEST 2015


On Mon, 2015-08-17 at 19:16 +0800, Kevin Hao wrote:
> On Fri, Aug 14, 2015 at 09:44:28PM -0500, Scott Wood wrote:
> > I tried a couple different benchmarks and didn't find a significant 
> > difference, relative to the variability of the results running on the 
> > same 
> > kernel.  A patch that claims to "optimize a bit" as its main purpose 
> > ought to 
> > show some results. :-)
> 
> I tried to compare the execution time of these two code sequences with the
> following test module:
> 
> #include <linux/module.h>
> #include <linux/kernel.h>
> #include <linux/printk.h>
> 
> static void test1(void)
> {
>       int i;
>       unsigned char lock, c;
>       unsigned short cpu, s;
> 
>       for (i = 0; i < 100000; i++) {
>               lock = 0;
>               cpu = 1;
> 
>               asm volatile (  
> "1:           lbarx   %0,0,%2\n\
>               lhz     %1,0(%3)\n\
>               cmpdi   %0,0\n\
>               cmpdi   cr1,%1,1\n\

This should be either "cmpdi cr1,%0,1" or crclr, not that it made much 
difference.  The test seemed to be rather sensitive to additional 
instructions inserted at the beginning of the asm statement (especially 
isync), so the initial instructions before the loop are probably pairing with 
something outside the asm.

That said, it looks like this patch at least doesn't make things worse, and 
does convert cmpdi to a more readable crclr, so I guess I'll apply it even 
though it doesn't show any measurable benefit when testing entire TLB misses 
(much less actual applications).

I suspect the point where I misunderstood the core manual was where it listed 
lbarx as having a repeat-rate of 3 cycles.  I probably assumed that that was 
because of the presync, and thus a subsequent unrelated load could execute 
partially in parallel, but it looks like the repeat rate is specifically 
talking about how long it is until the execution unit can accept any other 
instruction.

-Scott



More information about the Linuxppc-dev mailing list