[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive
Segher Boessenkool
segher at kernel.crashing.org
Sat Sep 12 09:03:47 AEST 2015
On Sat, Sep 12, 2015 at 12:41:08AM +0200, Thomas Huth wrote:
> I think I might have found something. "include" uses "call-c"
> to run "c_romfs_lookup" from llfw/romfs.S.
> Now have a look at the disassembly of call_c() in slof/ppc64.c :
>
> 000000000e1005f0 <.call_c>:
> e1005f0: 7c 08 02 a6 mflr r0
> e1005f4: fb e1 ff f8 std r31,-8(r1) ; <<< !!!
> e1005f8: f8 01 00 10 std r0,16(r1)
> e1005fc: 7c c0 33 78 mr r0,r6
> e100600: 7f e8 02 a6 mflr r31
> e100604: 7c 09 03 a6 mtctr r0
> e100608: 4e 80 04 21 bctrl
> e10060c: 7f e8 03 a6 mtlr r31
> e100610: e8 01 00 10 ld r0,16(r1)
> e100614: eb e1 ff f8 ld r31,-8(r1)
> e100618: 7c 08 03 a6 mtlr r0
> e10061c: 4e 80 00 20 blr
>
> The code saves r31 to a negative stack offset, without decrementing r1,
> and then jumps to c_romfs_lookup. That assembler function then saves
> some more registers on the stack and thus destroys the save-area of
> r31.
>
> Why is GCC doing this? It sounds weird that it does not decrement r1
> before going into the inline-assembler code...?
It is perfectly valid -- that is, if this is a leaf function, so it
doesn't call anything. And GCC does not know any better :-(
> I wonder whether the inline assembly in call_c needs to save the lr
> in r31 at all? lr is listed in the clobber list, so there should
> not be the need to do this manually here (and looking at the
> disassembly, GCC takes care of lr already).
That looks correct yes.
> The following patch seems
> to fix this issue for me:
>
> diff --git a/slof/ppc64.c b/slof/ppc64.c
> index 20d9270..8541e09 100644
> --- a/slof/ppc64.c
> +++ b/slof/ppc64.c
> @@ -52,11 +52,11 @@ call_c(cell arg0, cell arg1, cell arg2, cell entry)
> register unsigned long r5 asm("r5") = arg2.u;
> register unsigned long r6 = entry.u ;
>
> - asm volatile("mflr 31 ; mtctr %4 ; bctrl ; mtlr 31"
> + asm volatile(" mtctr %4 ; bctrl "
> : "=r" (r3)
> : "r" (r3), "r" (r4), "r" (r5), "r" (r6)
> : "ctr", "r6", "r7", "r8", "r9", "r10", "r11",
> - "r12", "r13", "r31", "lr", "cc");
> + "r12", "r13", "lr", "cc");
>
> return r3;
> }
Very nasty code, it still doesn't look 100% correct (and mightily confusing
with that "r6" name) :-)
(All of r3..r5 need to be marked as clobbered, i.e. make them "+r").
Segher
More information about the SLOF
mailing list