[Linuxppc-users] xsadddp throughput on Power9

Bill Schmidt wschmidt at linux.ibm.com
Fri Mar 8 07:16:46 AEDT 2019


On 3/7/19 1:24 PM, Bill Schmidt wrote:
> Hi Nicolas,
>
> On 3/6/19 4:35 AM, Nicolas Koenig wrote:
>> Hello world,
>>
>> After asking this question on another mailing list, I was redirected
>> to this list. I hope someone on here will be able to help me :)
>>
>> While running a few benchmarks, I noticed that the following code
>> (with SMT disabled) only manages about 2.25 xsadddp instr/clk
>> (measured via pmc6) instead of the expected 4:
>>
>> loop:
>>     .rept 12
>>         xsadddp %vs2, %vs1, %vs1
>>     .endr
>>     bdnz loop
>>
>> From what I can gather, the bottleneck shouldn't be the history
>> buffers. Since there are no long latency operations, FIN->COMP
>> shouldn't take more than 12 cycles (the size of the secondary HB for
>> FPSCR, the smallest relevant one). The primary HB and the issue queue
>> shouldn't overflow either, since xsadddp takes 7 cycles from issue to
>> finish and they can accomodate 20 and 13 entries respectivly with one
>> instruction only using one of each. It doesn't stall on writeback
>> ports either, because there are only 4 results in any one clock and 4
>> writeback ports (the decrement of the bdnz instruction is handled in
>> the branch slice without involving the writeback network).
>>
>> Has anyone here any idea where the bottleneck might be?
> Donald Stence was kind enough to answer this question for me.  Here is his note,
> which indicates this is actually performing better than you think!
>
> Hi Bill,
>     P9's design has it combine 64-bit execution units from two slices for processing a single 128-bit op.
>     Therefore, it can only issue two 128-bit ops per cycle, a theoretical max.

Hrm, it is pointed out to me that this is xsadddp, not xvadddp, so I don't think we have an answer yet.

Sorry,
Bill

>  
>     The Dispatch rate is higher than the Issue rate, of 2 xsaddp's per cycle, will result in the Issue Queue
>     slots becoming full within just a few cycles and will result in Dispatch holds (nothing gets Dispatched
>     for a cycle because there are no available Issue slots to place more ops into).
>  
>     The branch overlaps and actually pushes the IPC up from just 2 ops/cycle.
>  
>     Thanks,
>  
> Donald Stence
> IBM PSP - P10 Technical Lead
>  
> Cheers,
> Bill
>>
>> Thanks in advance
>>     Nicolas
>> _______________________________________________
>> Linuxppc-users mailing list
>> Linuxppc-users at lists.ozlabs.org
>> https://lists.ozlabs.org/listinfo/linuxppc-users
>>
>
>
> _______________________________________________
> Linuxppc-users mailing list
> Linuxppc-users at lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-users/attachments/20190307/4c241551/attachment.htm>


More information about the Linuxppc-users mailing list