[Linuxppc-users] xsadddp throughput on Power9

Bill Schmidt wschmidt at linux.ibm.com
Fri Mar 8 06:24:58 AEDT 2019


Hi Nicolas,


On 3/6/19 4:35 AM, Nicolas Koenig wrote:
> Hello world,
>
> After asking this question on another mailing list, I was redirected
> to this list. I hope someone on here will be able to help me :)
>
> While running a few benchmarks, I noticed that the following code
> (with SMT disabled) only manages about 2.25 xsadddp instr/clk
> (measured via pmc6) instead of the expected 4:
>
> loop:
>     .rept 12
>         xsadddp %vs2, %vs1, %vs1
>     .endr
>     bdnz loop
>
> From what I can gather, the bottleneck shouldn't be the history
> buffers. Since there are no long latency operations, FIN->COMP
> shouldn't take more than 12 cycles (the size of the secondary HB for
> FPSCR, the smallest relevant one). The primary HB and the issue queue
> shouldn't overflow either, since xsadddp takes 7 cycles from issue to
> finish and they can accomodate 20 and 13 entries respectivly with one
> instruction only using one of each. It doesn't stall on writeback
> ports either, because there are only 4 results in any one clock and 4
> writeback ports (the decrement of the bdnz instruction is handled in
> the branch slice without involving the writeback network).
>
> Has anyone here any idea where the bottleneck might be?

Donald Stence was kind enough to answer this question for me.  Here is his note,
which indicates this is actually performing better than you think!

Hi Bill,

    P9's design has it combine 64-bit execution units from two slices for processing a single 128-bit op.

    Therefore, it can only issue two 128-bit ops per cycle, a theoretical max.

 

    The Dispatch rate is higher than the Issue rate, of 2 xsaddp's per cycle, will result in the Issue Queue

    slots becoming full within just a few cycles and will result in Dispatch holds (nothing gets Dispatched

    for a cycle because there are no available Issue slots to place more ops into).

 

    The branch overlaps and actually pushes the IPC up from just 2 ops/cycle.

 

    Thanks,

 

Donald Stence

IBM PSP - P10 Technical Lead

 

Cheers,

Bill

>
> Thanks in advance
>     Nicolas
> _______________________________________________
> Linuxppc-users mailing list
> Linuxppc-users at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-users
>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-users/attachments/20190307/f69c6241/attachment.htm>


More information about the Linuxppc-users mailing list