[SLOF] [PATCH v2 4/6] fbuffer: Implement RFILL as an accelerated primitive
Thomas Huth
thuth at redhat.com
Tue Aug 4 06:51:14 AEST 2015
On 03/08/15 21:53, Segher Boessenkool wrote:
> On Mon, Aug 03, 2015 at 09:30:59PM +0200, Thomas Huth wrote:
>> + SET_CI; \
>> + while (size > 0) { \
>> + *d1++ = tmp; size -= sizeof(t); \
>> + } \
>> + CLR_CI; \
>
> If you haven't tested this on a real 970, could you check if the
> generated assembler is what you expect / want? I.e. no extra memory
> accesses (to stack or whatever) between the hid4 things. I expect it
> will "just work", but :-)
Uh, that was a good idea - this indeed revealed a bug: The compiler
optimized that code completely away, since the previous loop never
terminated:
int i = sizeof(t); \
while (i > 0) { \
tmp <<= 8; tmp |= pat & 0xff; \
} \
That was stupid ... I've got to send a v3 with a proper loop counter
handling...
Anyway, after fixing the above problem, the disassembly looks like this:
e101f64: 2f a8 00 00 cmpdi cr7,r8,0
e101f68: 38 e7 ff fc addi r7,r7,-4
e101f6c: 41 9e 00 10 beq cr7,e101f7c <.engine+0x17dc>
e101f70: 35 08 ff fc addic. r8,r8,-4
e101f74: 95 27 00 04 stwu r9,4(r7)
e101f78: 40 82 ff f8 bne e101f70 <.engine+0x17d0>
So it really only touches the destination IO memory in the loop.
I also checked the disassembly of FAST_MRMOVE_TYPED and that looks ok, too.
Thomas
More information about the SLOF
mailing list