[SLOF] [PATCH 3/4] fbuffer: Implement MRMOVE as an accelerated primitive
thuth at redhat.com
Sat Sep 12 08:41:08 AEST 2015
On 09/09/15 13:05, Nikunj A Dadhania wrote:
> Thomas Huth <thuth at redhat.com> writes:
>> On 09/09/15 08:45, Nikunj A Dadhania wrote:
>>> TCG seem to be broken with this new series, git bisect pointed to:
>>> 59a135e fbuffer: Implement MRMOVE as an accelerated primitive
>>> I havent looked into detail of why its failing though.
>> It seems to work fine for me here (SLOF master and QEMU master
>> branch). Which version of QEMU did you use?
> QEMU: fc04a73 Merge remote-tracking branch 'remotes/pmaydell/tags/pull-target-arm-20150908' into staging
> SLOF: 811277a version: update to 20150813
>> Which command line parameters?
> ./ppc64-softmmu/qemu-system-ppc64 -machine pseries -m 2048 -serial stdio
> VNC server running on `127.0.0.1:5900'
> SLOF **********************************************************************
> QEMU Starting
> Build Date = Sep 9 2015 16:30:15
> FW Version = git-811277ac91f674a9
> Press "s" to enter Open Firmware.
> Cannot open file : fbuffer.fs
> Populating /vdevice methods
> Cannot open file : vio-hvterm.fs
> Cannot open file : rtas-nvram.fs
> Cannot open file : vio-veth.fs
> Cannot open file : vio-vscsi.fs
> Cannot open file : pci-phb.fs
I think I might have found something. "include" uses "call-c"
to run "c_romfs_lookup" from llfw/romfs.S.
Now have a look at the disassembly of call_c() in slof/ppc64.c :
e1005f0: 7c 08 02 a6 mflr r0
e1005f4: fb e1 ff f8 std r31,-8(r1) ; <<< !!!
e1005f8: f8 01 00 10 std r0,16(r1)
e1005fc: 7c c0 33 78 mr r0,r6
e100600: 7f e8 02 a6 mflr r31
e100604: 7c 09 03 a6 mtctr r0
e100608: 4e 80 04 21 bctrl
e10060c: 7f e8 03 a6 mtlr r31
e100610: e8 01 00 10 ld r0,16(r1)
e100614: eb e1 ff f8 ld r31,-8(r1)
e100618: 7c 08 03 a6 mtlr r0
e10061c: 4e 80 00 20 blr
The code saves r31 to a negative stack offset, without decrementing r1,
and then jumps to c_romfs_lookup. That assembler function then saves
some more registers on the stack and thus destroys the save-area of
Why is GCC doing this? It sounds weird that it does not decrement r1
before going into the inline-assembler code...?
I wonder whether the inline assembly in call_c needs to save the lr
in r31 at all? lr is listed in the clobber list, so there should
not be the need to do this manually here (and looking at the
disassembly, GCC takes care of lr already). The following patch seems
to fix this issue for me:
diff --git a/slof/ppc64.c b/slof/ppc64.c
index 20d9270..8541e09 100644
@@ -52,11 +52,11 @@ call_c(cell arg0, cell arg1, cell arg2, cell entry)
register unsigned long r5 asm("r5") = arg2.u;
register unsigned long r6 = entry.u ;
- asm volatile("mflr 31 ; mtctr %4 ; bctrl ; mtlr 31"
+ asm volatile(" mtctr %4 ; bctrl "
: "=r" (r3)
: "r" (r3), "r" (r4), "r" (r5), "r" (r6)
: "ctr", "r6", "r7", "r8", "r9", "r10", "r11",
- "r12", "r13", "r31", "lr", "cc");
+ "r12", "r13", "lr", "cc");
More information about the SLOF