[Linuxppc-users] xsadddp throughput on Power9
Nicolas Koenig
koenigni at student.ethz.ch
Wed Mar 6 21:35:50 AEDT 2019
Hello world,
After asking this question on another mailing list, I was redirected to
this list. I hope someone on here will be able to help me :)
While running a few benchmarks, I noticed that the following code (with
SMT disabled) only manages about 2.25 xsadddp instr/clk (measured via
pmc6) instead of the expected 4:
loop:
.rept 12
xsadddp %vs2, %vs1, %vs1
.endr
bdnz loop
From what I can gather, the bottleneck shouldn't be the history
buffers. Since there are no long latency operations, FIN->COMP shouldn't
take more than 12 cycles (the size of the secondary HB for FPSCR, the
smallest relevant one). The primary HB and the issue queue shouldn't
overflow either, since xsadddp takes 7 cycles from issue to finish and
they can accomodate 20 and 13 entries respectivly with one instruction
only using one of each. It doesn't stall on writeback ports either,
because there are only 4 results in any one clock and 4 writeback ports
(the decrement of the bdnz instruction is handled in the branch slice
without involving the writeback network).
Has anyone here any idea where the bottleneck might be?
Thanks in advance
Nicolas
More information about the Linuxppc-users
mailing list