[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive

Segher Boessenkool segher at kernel.crashing.org
Sat Sep 12 09:03:47 AEST 2015


On Sat, Sep 12, 2015 at 12:41:08AM +0200, Thomas Huth wrote:
> I think I might have found something. "include" uses "call-c"
> to run "c_romfs_lookup" from llfw/romfs.S.
> Now have a look at the disassembly of call_c() in slof/ppc64.c :
> 
> 000000000e1005f0 <.call_c>:
>  e1005f0:       7c 08 02 a6     mflr    r0
>  e1005f4:       fb e1 ff f8     std     r31,-8(r1)  ; <<< !!!
>  e1005f8:       f8 01 00 10     std     r0,16(r1)
>  e1005fc:       7c c0 33 78     mr      r0,r6
>  e100600:       7f e8 02 a6     mflr    r31
>  e100604:       7c 09 03 a6     mtctr   r0
>  e100608:       4e 80 04 21     bctrl
>  e10060c:       7f e8 03 a6     mtlr    r31
>  e100610:       e8 01 00 10     ld      r0,16(r1)
>  e100614:       eb e1 ff f8     ld      r31,-8(r1)
>  e100618:       7c 08 03 a6     mtlr    r0
>  e10061c:       4e 80 00 20     blr
> 
> The code saves r31 to a negative stack offset, without decrementing r1,
> and then jumps to c_romfs_lookup. That assembler function then saves
> some more registers on the stack and thus destroys the save-area of
> r31.
> 
> Why is GCC doing this? It sounds weird that it does not decrement r1
> before going into the inline-assembler code...?

It is perfectly valid -- that is, if this is a leaf function, so it
doesn't call anything.  And GCC does not know any better :-(

> I wonder whether the inline assembly in call_c needs to save the lr
> in r31 at all? lr is listed in the clobber list, so there should
> not be the need to do this manually here (and looking at the
> disassembly, GCC takes care of lr already).

That looks correct yes.

> The following patch seems
> to fix this issue for me:
> 
> diff --git a/slof/ppc64.c b/slof/ppc64.c
> index 20d9270..8541e09 100644
> --- a/slof/ppc64.c
> +++ b/slof/ppc64.c
> @@ -52,11 +52,11 @@ call_c(cell arg0, cell arg1, cell arg2, cell entry)
>  	register unsigned long r5 asm("r5") = arg2.u;
>  	register unsigned long r6 = entry.u         ;
>  
> -	asm volatile("mflr 31 ; mtctr %4 ; bctrl ; mtlr 31"
> +	asm volatile(" mtctr %4 ; bctrl "
>  		     : "=r" (r3)
>  		     : "r" (r3), "r" (r4), "r" (r5), "r" (r6)
>  		     : "ctr", "r6", "r7", "r8", "r9", "r10", "r11",
> -		       "r12", "r13", "r31", "lr", "cc");
> +		       "r12", "r13", "lr", "cc");
>  
>  	return r3;
>  }

Very nasty code, it still doesn't look 100% correct (and mightily confusing
with that "r6" name) :-)

(All of r3..r5 need to be marked as clobbered, i.e. make them "+r").


Segher


More information about the SLOF mailing list