[SLOF] [PATCH v2 4/6] fbuffer: Implement RFILL as an accelerated primitive

Tue Aug 4 06:51:14 AEST 2015

On 03/08/15 21:53, Segher Boessenkool wrote:
> On Mon, Aug 03, 2015 at 09:30:59PM +0200, Thomas Huth wrote:
>> +	SET_CI; \
>> +	while (size > 0) { \
>> +		*d1++ = tmp; size -= sizeof(t); \
>> +	} \
>> +	CLR_CI; \
> 
> If you haven't tested this on a real 970, could you check if the
> generated assembler is what you expect / want?  I.e. no extra memory
> accesses (to stack or whatever) between the hid4 things.  I expect it
> will "just work", but :-)

Uh, that was a good idea - this indeed revealed a bug: The compiler
optimized that code completely away, since the previous loop never
terminated:

	int i = sizeof(t); \
	while (i > 0) { \
		tmp <<= 8; tmp |= pat & 0xff; \
	} \

That was stupid ... I've got to send a v3 with a proper loop counter
handling...

Anyway, after fixing the above problem, the disassembly looks like this:

 e101f64:       2f a8 00 00     cmpdi   cr7,r8,0
 e101f68:       38 e7 ff fc     addi    r7,r7,-4
 e101f6c:       41 9e 00 10     beq     cr7,e101f7c <.engine+0x17dc>
 e101f70:       35 08 ff fc     addic.  r8,r8,-4
 e101f74:       95 27 00 04     stwu    r9,4(r7)
 e101f78:       40 82 ff f8     bne     e101f70 <.engine+0x17d0>

So it really only touches the destination IO memory in the loop.
I also checked the disassembly of FAST_MRMOVE_TYPED and that looks ok, too.

 Thomas