ppc44x - how do i optimize driver for tlb hits

Benjamin Herrenschmidt benh at kernel.crashing.org
Mon Oct 4 09:38:45 EST 2010


On Sun, 2010-10-03 at 14:13 -0500, Ayman El-Khashab wrote:
> On Sat, Sep 25, 2010 at 08:11:04AM +1000, Benjamin Herrenschmidt wrote:
> > On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> > > 
> > > I suppose another option is to to use the kernel profiling option I 
> > > always see but have never used.  Is that a viable option to figure out
> > > what is happening here?  
> > 
> > With perf and stochastic sampling ? If you sample fast enough... but
> > you'll mostly point to your routine I suppose... though it might tell
> > you statistically where in your code, which -might- help.
> > 
> 
> Thanks I didn't end up profiling it b/c we found the biggest culprit. 
> Basically we were mapping this memory in kernel space and as long as we
> did that ONLY everything was ok.  But then we would mmap the physical
> addresses into user space.  Using MAP_SHARED made it extremely slow. 
> Using MAP_PRIVATE made it very fast.  So it works, but why is MAP_SHARED
> that much slower?

I don't see any reason off hand why this would be the case. Can you
inspect the content of the TLB with either xmon or whatever HW debugger
you may have at hand and show me what difference you have between an
entry for your workload coming from MAP_SHARED vs. one coming from
MAP_PRIVATE ?

> The other optimization was a change in the algorithm to take advantage
> of the L2 prefetching.  Since we were operating on many simultaneous
> streams it seems that the cache performance was not good.  

Cheers,
Ben.

> thanks
> ame




More information about the Linuxppc-dev mailing list