[Linuxppc-users] Discrepancies between Performance Simulator and Silicon

Nicolas Koenig koenigni at student.ethz.ch
Sun Jun 9 09:06:32 AEST 2019


Hello everyone,

while trying to solve the riddle surrounding xsadddp's throughput, I 
recently came across the power9 performance simulator, which is supposed 
to be cycle-accurate. When trying it, I noticed that there appears to be 
a discrepancy for the following code:

loop:
   .rept 16
     mtvsrd %vs1, %r3
   .endr
   bdnz loop

When executing it in the performance simulator, it yields a stable 4 
mtvsrd instructions per cycle (excluding branches), while the actual 
silicon can only sustain 3 mtvsrd instructions per cycle (again, 
excluding branches). What might be the reason for this difference?

Thanks in advance
   Nicolas

P.S.: It also seems like scrollpv can't disassemble the mtvsrd 
instruction, it just shows ?????? and the instruction in hex (it is the 
right instruction though, I double checked).


More information about the Linuxppc-users mailing list