[Linuxppc-users] xsadddp throughput on Power9

Segher Boessenkool segher at linux.ibm.com
Thu Mar 21 05:16:29 AEDT 2019


Hi Nicolas,

On Wed, Mar 20, 2019 at 02:54:12AM +0100, Nicolas König wrote:
> Thanks for digging this up, this is really interesting new information!
> 
> But it still doesn't quite solve the puzzle, because the throughput
> of just xsadddp instructions is 2.27 (the throughput of all
> instructions including branches is 2.44 for the case of 12 xsadddp)
> which is more than the 2.0 we would expect from vector instructions.
> Also, since the instruction isn't tuple restricted each superslice
> can dispatch 3 xsadddp instr/clk, and since each slice can at most
> accept 2 instructions from dispatch, both the primary and the
> supplementary dispatch port of each slice must be able to handle one
> xsadddp instruction/cycle and there can only ever be one slice
> taking care of one xsadddp instruction.

Sure, and we can issue 4 xsadddp in the same cycle, too.  There is a
bottleneck elsewhere, for this test; and it seems to be the register
renaming.  We'll get back to you when we know more.


Segher



More information about the Linuxppc-users mailing list