[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive
Thomas Huth
thuth at redhat.com
Mon Sep 14 20:10:51 AEST 2015
On 12/09/15 01:03, Segher Boessenkool wrote:
> On Sat, Sep 12, 2015 at 12:41:08AM +0200, Thomas Huth wrote:
>> I think I might have found something. "include" uses "call-c"
>> to run "c_romfs_lookup" from llfw/romfs.S.
>> Now have a look at the disassembly of call_c() in slof/ppc64.c :
>>
>> 000000000e1005f0 <.call_c>:
>> e1005f0: 7c 08 02 a6 mflr r0
>> e1005f4: fb e1 ff f8 std r31,-8(r1) ; <<< !!!
>> e1005f8: f8 01 00 10 std r0,16(r1)
>> e1005fc: 7c c0 33 78 mr r0,r6
>> e100600: 7f e8 02 a6 mflr r31
>> e100604: 7c 09 03 a6 mtctr r0
>> e100608: 4e 80 04 21 bctrl
>> e10060c: 7f e8 03 a6 mtlr r31
>> e100610: e8 01 00 10 ld r0,16(r1)
>> e100614: eb e1 ff f8 ld r31,-8(r1)
>> e100618: 7c 08 03 a6 mtlr r0
>> e10061c: 4e 80 00 20 blr
>>
>> The code saves r31 to a negative stack offset, without decrementing r1,
>> and then jumps to c_romfs_lookup. That assembler function then saves
>> some more registers on the stack and thus destroys the save-area of
>> r31.
>>
>> Why is GCC doing this? It sounds weird that it does not decrement r1
>> before going into the inline-assembler code...?
>
> It is perfectly valid -- that is, if this is a leaf function, so it
> doesn't call anything. And GCC does not know any better :-(
Ah, I was not aware of this paragraph in the ABI:
"The 288 bytes below the stack pointer is available as volatile storage
which is not preserved across function calls. ..."
(http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#STACK)
... sounds non-intuitive at a first glance, but ok, it makes sense from
a performance point of view.
Thomas
More information about the SLOF
mailing list