[SLOF] [PATCH v2] slof/engine.in: refine +COMP and -COMP by not using COMPILE
Kautuk Consul
kconsul at linux.vnet.ibm.com
Thu Feb 22 22:46:17 AEDT 2024
Hi Thomas,
>
> Hi Kautuk,
>
> could you maybe do some performance checks to see whether this make a
> difference (e.g. by running the command in a tight loop many times)?
> You can use "tb@" to get the current value of the timebase counter, so
> reading that before and after the loop should provide you with a way of
> measuring the required time.
>
> Thomas
>
This patch is to improve compilation timings of the
IF/AHEAD/THEN/CASE/ENDCASE/BEGIN/AGAIN/UNTIL/DO/?DO/LOOP/+LOOP/ Forth words
that are NOT within any Forth procedure.
And it does this in the same way for all of these Forth words because
all of these words simply utilize the +COMP and -COMP words.
I created a patch on top of this patch file that introduces the older
implementation of IF and THEN and I called them IF2 and THEN2 as
follows:
col(+COMP-BEFORE STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
col(-COMP-BEFORE -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE SEMICOLON THERE @ DOTO HERE COMP-BUFFER EXECUTE)
imm(IF2 +COMP-BEFORE DOTICK DO0BRANCH COMPILE, HERE 0 COMPILE,)
imm(THEN2 ?COMP RESOLVE-ORIG -COMP-BEFORE)
The IF2 and THEN2 use -COMP-BEFORE and +COMP-BEFORE in order to have the
changes before I applied my "[PATCH v2] slof/engine.in: refine +COMP and -COMP by not using"
patch file.
Now that I have both implementation, I used the timebase in order to
test what is the difference in timebase before and after invocation of
numerous IF-THEN and IF2-THEN2 Forth words. I made the following changes
to ./board-qemu/slof/OF.fs:
diff --git a/board-qemu/slof/OF.fs b/board-qemu/slof/OF.fs
index df33c80..56805fc 100644
--- a/board-qemu/slof/OF.fs
+++ b/board-qemu/slof/OF.fs
@@ -22,6 +22,7 @@ hex
#include "base.fs"
+
\ Set default load-base to 0x4000
4000 to default-load-base
@@ -329,6 +330,151 @@ check-boot-from-ram
8ff cp
+." BEFORE-PATCH: BEFORE TB is: " tb@ .
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+cr ." BEFORE-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr
With the above changes in slof/engine.in and board-qemu/slof/OF.fs I
complied SLOF and got the following output on running a guest:
[root at r223l performance_work]# virsh start vm4 --console
Domain 'vm4' started
Connected to domain 'vm4'
Escape character is ^] (Ctrl + ])
Populating /vdevice methods
Populating /vdevice/vty at 30000000
Populating /vdevice/nvram at 71000000
Populating /pci at 800000020000000
00 0800 (D) : 1b36 000d serial bus [ usb-xhci ]
00 1000 (D) : 1af4 1003 virtio [ serial ]
00 1800 (D) : 1af4 1001 virtio [ block ]
00 2000 (D) : 1af4 1002 legacy-device*
00 2800 (D) : 1234 1111 qemu vga
No NVRAM common partition, re-initializing...
Installing QEMU fb
Scanning USB
XHCI: Initializing
USB Keyboard
No console specified using screen & keyboard
BEFORE-PATCH: BEFORE TB is: 9de978a1
BEFORE-PATCH: AFTER TB is: 9e78efba
AFTER-PATCH: BEFORE TB is: 9ebb67aa
AFTER-PATCH: AFTER TB is: 9f2247cc
AFTER-PATCH: BEFORE TB is: 9f64b9fd
AFTER-PATCH: AFTER TB is: 9fc33e6c
Welcome to Open Firmware
Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
This program and the accompanying materials are made available
under the terms of the BSD License available at
http://www.opensource.org/licenses/bsd-license.php
Trying to load: from: /pci at 800000020000000/scsi at 3 ... Successfully loaded
[root at r223l performance_work]# echo $((0x9e78efba-0x9de978a1))
9402137
[root at r223l performance_work]# echo $((0x9f2247cc-0x9ebb67aa))
6742050
[root at r223l performance_work]# echo $((0x9fc33e6c-0x9f64b9fd))
6194287
[root at r223l performance_work]# echo "scale=4;(9402137-6742050)/512" | bc
5195.4824
[root at r223l performance_work]# echo "scale=4;(9402137-6194287)/512" | bc
6265.3320
[root at r223l performance_work]#
As per the calculations in the output of the BEFORE-PATCH and
AFTER-PATCH logs I find that there is a very noticeable and consistent improvement
in multiple runs in terms of microseconds. (My POWER9 bare-metal has 512 MHz timebase-frequency
so thats why I am dividing by 512).
Note: The above figures include the execution speed of IF-THEN and
IF2-THEN2 after compilation. But since the actual execution speeds of the IF-THEN and the IF2-THEN2 after
their compilation should be the same, this should get adjusted in my
subtraction in the above 2 bc commands.
>
More information about the SLOF
mailing list