[Linuxppc-users] Discrepancies between Performance Simulator and Silicon
Nicolas Koenig
koenigni at student.ethz.ch
Sun Jun 9 09:06:32 AEST 2019
Hello everyone,
while trying to solve the riddle surrounding xsadddp's throughput, I
recently came across the power9 performance simulator, which is supposed
to be cycle-accurate. When trying it, I noticed that there appears to be
a discrepancy for the following code:
loop:
.rept 16
mtvsrd %vs1, %r3
.endr
bdnz loop
When executing it in the performance simulator, it yields a stable 4
mtvsrd instructions per cycle (excluding branches), while the actual
silicon can only sustain 3 mtvsrd instructions per cycle (again,
excluding branches). What might be the reason for this difference?
Thanks in advance
Nicolas
P.S.: It also seems like scrollpv can't disassemble the mtvsrd
instruction, it just shows ?????? and the instruction in hex (it is the
right instruction though, I double checked).
More information about the Linuxppc-users
mailing list