[Linuxppc-users] Discrepancies between Performance Simulator and Silicon

Pat Haugen pthaugen at linux.ibm.com
Tue Jun 11 03:26:39 AEST 2019


On 6/8/19 6:06 PM, Nicolas Koenig wrote:
> Hello everyone,
> 
> while trying to solve the riddle surrounding xsadddp's throughput, I recently came across the power9 performance simulator, which is supposed to be cycle-accurate. When trying it, I noticed that there appears to be a discrepancy for the following code:
> 
> loop:
>   .rept 16
>     mtvsrd %vs1, %r3
>   .endr
>   bdnz loop
> 
> When executing it in the performance simulator, it yields a stable 4 mtvsrd instructions per cycle (excluding branches), while the actual silicon can only sustain 3 mtvsrd instructions per cycle (again, excluding branches). What might be the reason for this difference?
> 
How did you determine the hardware can only sustain 3? Is the loop at least quadword aligned to eliminate any variability between the two wrt fetching behavior?

> Thanks in advance
>   Nicolas
> 
> P.S.: It also seems like scrollpv can't disassemble the mtvsrd instruction, it just shows ?????? and the instruction in hex (it is the right instruction though, I double checked).

Sounds like an old version or missing flag for whatever scrollpv uses for disassembling.

-Pat



More information about the Linuxppc-users mailing list