ppc44x - how do i optimize driver for tlb hits

Ayman El-Khashab ayman at elkhashab.com
Mon Oct 4 06:13:05 EST 2010


On Sat, Sep 25, 2010 at 08:11:04AM +1000, Benjamin Herrenschmidt wrote:
> On Fri, 2010-09-24 at 08:08 -0500, Ayman El-Khashab wrote:
> > 
> > I suppose another option is to to use the kernel profiling option I 
> > always see but have never used.  Is that a viable option to figure out
> > what is happening here?  
> 
> With perf and stochastic sampling ? If you sample fast enough... but
> you'll mostly point to your routine I suppose... though it might tell
> you statistically where in your code, which -might- help.
> 

Thanks I didn't end up profiling it b/c we found the biggest culprit. 
Basically we were mapping this memory in kernel space and as long as we
did that ONLY everything was ok.  But then we would mmap the physical
addresses into user space.  Using MAP_SHARED made it extremely slow. 
Using MAP_PRIVATE made it very fast.  So it works, but why is MAP_SHARED
that much slower?

The other optimization was a change in the algorithm to take advantage
of the L2 prefetching.  Since we were operating on many simultaneous
streams it seems that the cache performance was not good.  

thanks
ame


More information about the Linuxppc-dev mailing list