[Cbe-oss-dev] initial performance comparison
Luke Browning
lukebr at linux.vnet.ibm.com
Fri Nov 2 03:56:26 EST 2007
Here one bit of performance analysis that shows
that the new scheduler is roughly 10X more efficient
when the system is over committed. Also, the new
scheduler is better even when running just one job.
regards, Luke
====================================
Single Job
====================================
Executing the following command:
time ./matrix_mul -i 200 -m 512 -s 16
======================================
New spufs scheduler had these numbers:
real 0m2.488s
user 0m0.281s
sys 0m0.154s
real 0m2.489s
user 0m0.279s
sys 0m0.156s
real 0m2.487s
user 0m0.280s
sys 0m0.154s
See numbers below. Note elapsed time above
is half of the old scheduler, same for
user time, even system time is 30% less.
======================================
Old spufs scheduler:
real 0m4.063s
user 0m0.456s
sys 0m0.230s
real 0m4.567s
user 0m0.457s
sys 0m0.215s
real 0m3.926s
user 0m0.439s
sys 0m0.211s
The reduction is system time, I attribute to streamlining
and rearranging the logic in the loop. I also eliminated
a global lock reference and some calls to check_signal().
======================================
Over committed examples
======================================
Running the following from a shell script:
time ./matrix_mul -i 200 -m 512 -s 16 &
time ./matrix_mul -i 200 -m 512 -s 16 &
New spufs scheduler:
real 0m18.318s 0m18.425s (each col is a separate job)
user 0m0.190s 0m0.291s
sys 0m0.134s 0m0.203s
real 0m14.463s 0m14.603s
user 0m0.239s 0m0.410s
sys 0m0.140s 0m0.176s
real 0m15.465s 0m15.844s
user 0m0.232s 0m0.504s
sys 0m0.187s 0m0.190s
Note system time increases slightly in the new scheduler.
0.154-0.156 (new - single job) vs. 0.134-0.203 (new - two jobs)
Old spufs scheduler:
real 1m14.541s 1m14.542s
user 0m0.464s 0m0.460s
sys 0m11.086s 0m10.466s
real 1m12.540s 1m12.621s
user 0m0.446s 0m0.451s
sys 0m11.419s 0m10.112s
real 1m3.298s 1m3.478s
user 0m0.467s 0m0.465s
sys 0m8.107s 0m9.942s
Note the explosion in sys time -- 10 seconds! This is probably
caused by the extra context switching that I eliminated in the
new scheduler. The secondary impact of this context switching
is that user code must wait longer to run.
Parallel applications like matrix_mul suffer the most as user
code must wait to synchronize spe output.
The new scheduler scales much better. It is roughly 10
times as efficient (sys time comparison) with multiple jobs.
=======================================
Running three instances of job.
new spufs scheduler:
real 0m33.682s 0m33.682s 0m33.683s
user 0m0.379s 0m0.438s 0m0.364s
sys 0m0.171s 0m0.140s 0m0.186s
real 0m22.858s 0m28.083s 0m28.101s
user 0m0.278s 0m0.496s 0m0.506s
sys 0m0.199s 0m0.297s 0m0.162s
real 0m30.015s 0m31.840s 0m32.277s
user 0m0.545s 0m0.283s 0m0.540s
sys 0m0.255s 0m0.258s 0m0.192s
Note adding a third job did not significantly increase
system overhead. It went up from 0.134-0.200 to 0.140-0.297s
which is statistically not important, particularly when
you throw out the best and worst numbers. This yields
0.162-0.258 showing that the new scheduler is very
deterministic and scales well.
old spufs scheduler:
real 1m14.541s 1m14.542s XXX (system hang)
user 0m0.464s 0m0.460s XXX
sys 0m11.086s 0m10.466s XXX
It never completed (system hung), but it completed 2 of the 3 jobs.
Essentially, the same numbers as with two jobs.
New spufs scheduler is 10X more efficient when it has 3 jobs.
regards, Luke
More information about the cbe-oss-dev
mailing list