[Skiboot] [RFC PATCH] Use two VSX registers to do initial copy of skiboot
Stewart Smith
stewart at linux.vnet.ibm.com
Thu May 21 18:40:35 AEST 2015
Note my awful fiddling of MSR so that I can use VSX registers :)
In the hello_world test running in mambo, we get the following reduction
in instruction/cycle count:
Before:
20284943: ** finished running 20284942 instructions **
Single VSX register:
19687022: ** finished running 19641315 instructions **
using 2 vsx registers:
19621488: ** finished running 19575781 instructions **
19621488: ** finished running 19575781 instructions **
65534 fewer cycles & instructions than just 1
709161 fewer than base implementation
using 3 vsx regs:
19883634: ** finished running 19837927 instructions **
Signed-off-by: Stewart Smith <stewart at linux.vnet.ibm.com>
---
asm/head.S | 24 +++++++++++++++++++-----
1 file changed, 19 insertions(+), 5 deletions(-)
diff --git a/asm/head.S b/asm/head.S
index fd6e3fb..9b3d7bb 100644
--- a/asm/head.S
+++ b/asm/head.S
@@ -314,16 +314,30 @@ boot_entry:
cmpd %r29,%r30
beq 2f
LOAD_IMM32(%r3, _sbss - __head)
- srdi %r3,%r3,3
+ srdi %r3,%r3,5
mtctr %r3
+ mfmsr %r24
+ mfmsr %r4
+ oris %r4,%r4, (1<<13)@h
+ oris %r4,%r4, (1<<23)@h
+ oris %r4,%r4, (1<<25)@h
+ mtmsr %r4
mr %r4,%r30
mr %r15,%r30
mr %r30,%r29
-1: ld %r0,0(%r4)
- std %r0,0(%r29)
- addi %r29,%r29,8
- addi %r4,%r4,8
+ addi %r5,%r4,16
+ addi %r6,%r29,16
+
+1: lxvd2x %vs1,%r0,%r4
+ lxvd2x %vs2,%r0,%r5
+ stxvd2x %vs1,%r0,%r29
+ stxvd2x %vs2,%r0,%r6
+ addi %r29,%r29,16
+ addi %r4,%r4,16
+ addi %r5,%r5,16
+ addi %r6,%r6,16
bdnz 1b
+ mtmsr %r24
sync
icbi 0,%r29
sync
--
1.7.10.4
More information about the Skiboot
mailing list