[Linuxppc-users] xsadddp throughput on Power9

Nicolas Koenig koenigni at student.ethz.ch
Wed Mar 6 21:35:50 AEDT 2019


Hello world,

After asking this question on another mailing list, I was redirected to 
this list. I hope someone on here will be able to help me :)

While running a few benchmarks, I noticed that the following code (with 
SMT disabled) only manages about 2.25 xsadddp instr/clk (measured via 
pmc6) instead of the expected 4:

loop:
     .rept 12
         xsadddp %vs2, %vs1, %vs1
     .endr
     bdnz loop

 From what I can gather, the bottleneck shouldn't be the history 
buffers. Since there are no long latency operations, FIN->COMP shouldn't 
take more than 12 cycles (the size of the secondary HB for FPSCR, the 
smallest relevant one). The primary HB and the issue queue shouldn't 
overflow either, since xsadddp takes 7 cycles from issue to finish and 
they can accomodate 20 and 13 entries respectivly with one instruction 
only using one of each. It doesn't stall on writeback ports either, 
because there are only 4 results in any one clock and 4 writeback ports 
(the decrement of the bdnz instruction is handled in the branch slice 
without involving the writeback network).

Has anyone here any idea where the bottleneck might be?

Thanks in advance
     Nicolas


More information about the Linuxppc-users mailing list