Benchmarking on POWER5 server...
Michael Neuling
mikey at neuling.org
Mon Feb 6 15:05:16 EST 2006
Guys,
We have a 16 thread (2 NUMA nodes, 2 chips per node, 2 cores per chip,
2 threads per core) POWER5 machine with 16 GB of memory. I've done
some benchmarking by compiling the Linux kernel and seeing what effect
some different options have on the compile times. All tests were run
with distcc off. This machine has reasonably good disk IO so other
large machines may get different results. Tests were performed with
0.8.4 with tonyb's semaphone fix and ccache 2.4. With each config, I
did a few runs, and picked two which seem to represent the most
consistent times (in lay speak: highly unscientific)
Results are below (2 runs per config shown):
cpus = 24. warm ccache
real 0m48.724s 0m46.442s
user 5m30.411s 5m30.315s
sys 1m14.379s 1m14.932s
cpus = 16. warm ccache
real 0m48.594s 0m47.191s
user 5m25.503s 5m22.890s
sys 1m13.584s 1m13.033s
cpus = 8. warm ccache
real 0m55.128s 0m57.276s
user 4m50.494s 4m50.390s
sys 1m8.749s 1m8.806s
cpus = 4. warm ccache
real 1m19.286s 1m14.782s
user 4m33.085s 4m30.333s
sys 1m4.092s 1m3.434s
cpus = 16. no ccache
real 1m46.624s 1m46.504s
user 21m11.198s 21m13.730s
sys 1m38.683s 1m38.648s
cpus = 16. cold ccache (rm -rf ~/.ccache)
real 1m52.340s 1m52.960s
user 22m47.736s 22m50.005s
sys 2m0.261s 2m0.187s
cpus = 16. cold ccache (rm -rf ~/.ccache) with CCACHE_NOSTATS
real 1m53.605s 1m54.222s
user 22m51.339s 22m41.690s
sys 2m0.929s 2m0.490s
Some brief analysis:
Setting cpu > 16 doesn't help.
- more processes than threads
Setting cpu = 16 vs cpu = 8 helps a bit
- should setup 1 process per thread rather than core
Setting cpu < 8 slows you down a lot.
- less than the number of cores
So best to keep all the threads busy.
ccache helps a lot when the cache is warm.
ccache does cost something to maintain the cache when it's cache is
cold. With so many processes present (ccontrol spawns 20 x cpus),
I thought this could be due to the ccache stats file locking over
head. Hence, I did a run with CCACHE_NOSTATS set but it didn't
improve the performance.
More information about the ccontrol
mailing list