[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive

Thomas Huth thuth at redhat.com
Mon Sep 14 20:10:51 AEST 2015


On 12/09/15 01:03, Segher Boessenkool wrote:
> On Sat, Sep 12, 2015 at 12:41:08AM +0200, Thomas Huth wrote:
>> I think I might have found something. "include" uses "call-c"
>> to run "c_romfs_lookup" from llfw/romfs.S.
>> Now have a look at the disassembly of call_c() in slof/ppc64.c :
>>
>> 000000000e1005f0 <.call_c>:
>>  e1005f0:       7c 08 02 a6     mflr    r0
>>  e1005f4:       fb e1 ff f8     std     r31,-8(r1)  ; <<< !!!
>>  e1005f8:       f8 01 00 10     std     r0,16(r1)
>>  e1005fc:       7c c0 33 78     mr      r0,r6
>>  e100600:       7f e8 02 a6     mflr    r31
>>  e100604:       7c 09 03 a6     mtctr   r0
>>  e100608:       4e 80 04 21     bctrl
>>  e10060c:       7f e8 03 a6     mtlr    r31
>>  e100610:       e8 01 00 10     ld      r0,16(r1)
>>  e100614:       eb e1 ff f8     ld      r31,-8(r1)
>>  e100618:       7c 08 03 a6     mtlr    r0
>>  e10061c:       4e 80 00 20     blr
>>
>> The code saves r31 to a negative stack offset, without decrementing r1,
>> and then jumps to c_romfs_lookup. That assembler function then saves
>> some more registers on the stack and thus destroys the save-area of
>> r31.
>>
>> Why is GCC doing this? It sounds weird that it does not decrement r1
>> before going into the inline-assembler code...?
> 
> It is perfectly valid -- that is, if this is a leaf function, so it
> doesn't call anything.  And GCC does not know any better :-(

Ah, I was not aware of this paragraph in the ABI:

"The 288 bytes below the stack pointer is available as volatile storage
which is not preserved across function calls. ..."

(http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html#STACK)

... sounds non-intuitive at a first glance, but ok, it makes sense from
a performance point of view.

 Thomas



More information about the SLOF mailing list