[Cbe-oss-dev] [PATCH 0/2] powerpc: new copy_4K_page()

Mark Nelson markn at au1.ibm.com
Fri Aug 22 14:32:59 EST 2008


On Thu, 14 Aug 2008 04:17:32 pm Mark Nelson wrote:
> Hi All,
> 
> What follows is an updated version of copy_4K_page that has been tuned
> for the Cell processor. With this new routine it was found that the
> system time measured when compiling a 2.6.26 pseries_defconfig was
> reduced by ~10s:
> 
> mainline (2.6.27-rc1-00632-g2e1e921):
> 
> real    17m8.727s
> user    59m48.693s
> sys     3m56.089s
> 
> real    17m9.350s
> user    59m44.822s
> sys     3m56.666s
> 
> new routine:
> 
> real    17m7.311s
> user    59m51.339s
> sys     3m47.043s
> 
> real    17m7.863s
> user    59m49.028s
> sys     3m46.608s
> 
> This same routine was also found to improve performance on 970 CPUs
> too (but by a much smaller amount):
> 
> mainline (2.6.27-rc1-00632-g2e1e921):
> 
> real    16m8.545s
> user    14m38.134s
> sys     1m55.156s
> 
> real    16m7.089s
> user    14m37.974s
> sys     1m55.010s
> 
> new routine:
> 
> real    16m11.641s
> user    14m37.251s
> sys     1m52.618s
> 
> real    16m6.139s
> user    14m38.282s
> sys     1m53.184s
> 
> 
> I also did testing on Power{3..6} and I found that Power3, Power5 and
> Power6 did better with this new routine when the dcbt and dcbz
> weren't used (in which case they achieved performance comparable to
> the existing kernel copy_4K_page routine). Power4 on other hand
> performed slightly better with the dcbt and dcbz included (still
> comparable to the current kernel copy_4K_page).
> 
> So in order to get the best performance across the board I created a
> new CPU feature that will govern whether the dcbt and dcbz are used
> (and un-creatively named it CPU_FTR_CP_USE_DCBTZ). I added it to the
> CPU features of Cell, Power4 and 970.
> Unfortunately I don't have access to a PA6T but judging by the
> marketing material I could find, it looks like it has a strong enough
> hardware prefetcher that it probably wouldn't benefit from the dcbt
> and dcbz...
> 
> Okay, that's probably enough prattling along - you can all go and look
> at the code now.
> 
> All comments appreciated
> 
> [I decided to post the whole copy routine rather than a diff between
> it and the current one because I found the diff quite unreadable. I'll post
> a real patchset after I've addressed any comments.]
> 
> Many thanks!
> 

The actual patches for the new copy_4K_page() follow this.

Note: I changed the order of the patches so that the new CPU feature
bit is introduced in the first patch and then the new copy_4K_page
is introduced in the second patch.



More information about the cbe-oss-dev mailing list