[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive

Sat Sep 12 08:41:08 AEST 2015

On 09/09/15 13:05, Nikunj A Dadhania wrote:
> Thomas Huth <thuth at redhat.com> writes:
> 
>> On 09/09/15 08:45, Nikunj A Dadhania wrote:
...
>>> TCG seem to be broken with this new series, git bisect pointed to:
>>>
>>> 59a135e fbuffer: Implement MRMOVE as an accelerated primitive
>>>
>>> I havent looked into detail of why its failing though.
>>
>> It seems to work fine for me here (SLOF master and QEMU master
>> branch). Which version of QEMU did you use? 
> 
> QEMU: fc04a73 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150908' into staging
> 
> SLOF: 811277a version: update to 20150813
> 
>> Which command line parameters?
> 
>         ./ppc64-softmmu/qemu-system-ppc64 -machine pseries -m 2048  -serial stdio
>         VNC server running on `127.0.0.1:5900'
> 	SLOF **********************************************************************
> 	QEMU Starting
> 	 Build Date = Sep  9 2015 16:30:15
> 	 FW Version = git-811277ac91f674a9
> 	 Press "s" to enter Open Firmware.
> 	
> 	Cannot open file : fbuffer.fs
> 	Populating /vdevice methods
> 	Cannot open file : vio-hvterm.fs
> 	Cannot open file : rtas-nvram.fs
> 	Cannot open file : vio-veth.fs
> 	Cannot open file : vio-vscsi.fs
> 	Cannot open file : pci-phb.fs

I think I might have found something. "include" uses "call-c"
to run "c_romfs_lookup" from llfw/romfs.S.
Now have a look at the disassembly of call_c() in slof/ppc64.c :

000000000e1005f0 <.call_c>:
 e1005f0:       7c 08 02 a6     mflr    r0
 e1005f4:       fb e1 ff f8     std     r31,-8(r1)  ; <<< !!!
 e1005f8:       f8 01 00 10     std     r0,16(r1)
 e1005fc:       7c c0 33 78     mr      r0,r6
 e100600:       7f e8 02 a6     mflr    r31
 e100604:       7c 09 03 a6     mtctr   r0
 e100608:       4e 80 04 21     bctrl
 e10060c:       7f e8 03 a6     mtlr    r31
 e100610:       e8 01 00 10     ld      r0,16(r1)
 e100614:       eb e1 ff f8     ld      r31,-8(r1)
 e100618:       7c 08 03 a6     mtlr    r0
 e10061c:       4e 80 00 20     blr

The code saves r31 to a negative stack offset, without decrementing r1,
and then jumps to c_romfs_lookup. That assembler function then saves
some more registers on the stack and thus destroys the save-area of
r31.

Why is GCC doing this? It sounds weird that it does not decrement r1
before going into the inline-assembler code...?

I wonder whether the inline assembly in call_c needs to save the lr
in r31 at all? lr is listed in the clobber list, so there should
not be the need to do this manually here (and looking at the
disassembly, GCC takes care of lr already). The following patch seems
to fix this issue for me:

diff --git a/slof/ppc64.c b/slof/ppc64.c
index 20d9270..8541e09 100644
--- a/slof/ppc64.c
+++ b/slof/ppc64.c
@@ -52,11 +52,11 @@ call_c(cell arg0, cell arg1, cell arg2, cell entry)
 	register unsigned long r5 asm("r5") = arg2.u;
 	register unsigned long r6 = entry.u         ;
 
-	asm volatile("mflr 31 ; mtctr %4 ; bctrl ; mtlr 31"
+	asm volatile(" mtctr %4 ; bctrl "
 		     : "=r" (r3)
 		     : "r" (r3), "r" (r4), "r" (r5), "r" (r6)
 		     : "ctr", "r6", "r7", "r8", "r9", "r10", "r11",
-		       "r12", "r13", "r31", "lr", "cc");
+		       "r12", "r13", "lr", "cc");
 
 	return r3;
 }

 Thomas