[SLOF] [PATCH v2] slof/engine.in: refine +COMP and -COMP by not using COMPILE

Kautuk Consul kconsul at linux.vnet.ibm.com
Thu Feb 22 22:46:17 AEDT 2024


Hi Thomas,

> 
>  Hi Kautuk,
> 
> could you maybe do some performance checks to see whether this make a
> difference (e.g. by running the command in a tight loop many times)?
> You can use "tb@" to get the current value of the timebase counter, so
> reading that before and after the loop should provide you with a way of
> measuring the required time.
> 
>  Thomas
> 
This patch is to improve compilation timings of the 
IF/AHEAD/THEN/CASE/ENDCASE/BEGIN/AGAIN/UNTIL/DO/?DO/LOOP/+LOOP/ Forth words
that are NOT within any Forth procedure.
And it does this in the same way for all of these Forth words because
all of these words simply utilize the +COMP and -COMP words.

I created a patch on top of this patch file that introduces the older
implementation of IF and THEN and I called them IF2 and THEN2 as
follows:
col(+COMP-BEFORE STATE @ 1 STATE +! 0BRANCH(1) EXIT HERE THERE ! COMP-BUFFER DOTO HERE COMPILE DOCOL)
col(-COMP-BEFORE -1 STATE +! STATE @ 0BRANCH(1) EXIT COMPILE SEMICOLON THERE @ DOTO HERE COMP-BUFFER EXECUTE)
imm(IF2 +COMP-BEFORE DOTICK DO0BRANCH COMPILE, HERE 0 COMPILE,)
imm(THEN2 ?COMP RESOLVE-ORIG -COMP-BEFORE)

The IF2 and THEN2 use -COMP-BEFORE and +COMP-BEFORE in order to have the
changes before I applied my "[PATCH v2] slof/engine.in: refine +COMP and -COMP by not using"
patch file.

Now that I have both implementation, I used the timebase in order to
test what is the difference in timebase before and after invocation of
numerous IF-THEN and IF2-THEN2 Forth words. I made the following changes
to ./board-qemu/slof/OF.fs:
diff --git a/board-qemu/slof/OF.fs b/board-qemu/slof/OF.fs
index df33c80..56805fc 100644
--- a/board-qemu/slof/OF.fs
+++ b/board-qemu/slof/OF.fs
@@ -22,6 +22,7 @@ hex
 
 #include "base.fs"
 
+
 \ Set default load-base to 0x4000
 4000 to default-load-base
 
@@ -329,6 +330,151 @@ check-boot-from-ram
 
 8ff cp
 
+." BEFORE-PATCH: BEFORE TB is: " tb@ .
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+1 IF2 0 drop THEN2
+cr ." BEFORE-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr
+
+." AFTER-PATCH: BEFORE TB is: " tb@ .
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+1 IF 0 drop THEN
+cr ." AFTER-PATCH: AFTER TB is: " tb@ . cr

With the above changes in slof/engine.in and board-qemu/slof/OF.fs I
complied SLOF and got the following output on running a guest:
[root at r223l performance_work]# virsh start vm4 --console                
Domain 'vm4' started
Connected to domain 'vm4'
Escape character is ^] (Ctrl + ])
Populating /vdevice methods
Populating /vdevice/vty at 30000000
Populating /vdevice/nvram at 71000000
Populating /pci at 800000020000000
                     00 0800 (D) : 1b36 000d    serial bus [ usb-xhci ]
                     00 1000 (D) : 1af4 1003    virtio [ serial ]
                     00 1800 (D) : 1af4 1001    virtio [ block ]
                     00 2000 (D) : 1af4 1002    legacy-device*
                     00 2800 (D) : 1234 1111    qemu vga
No NVRAM common partition, re-initializing...
Installing QEMU fb



Scanning USB
  XHCI: Initializing
    USB Keyboard
No console specified using screen & keyboard
BEFORE-PATCH: BEFORE TB is: 9de978a1
BEFORE-PATCH: AFTER TB is: 9e78efba
AFTER-PATCH: BEFORE TB is: 9ebb67aa
AFTER-PATCH: AFTER TB is: 9f2247cc
AFTER-PATCH: BEFORE TB is: 9f64b9fd
AFTER-PATCH: AFTER TB is: 9fc33e6c

  Welcome to Open Firmware

  Copyright (c) 2004, 2017 IBM Corporation All rights reserved.
  This program and the accompanying materials are made available
  under the terms of the BSD License available at
  http://www.opensource.org/licenses/bsd-license.php


Trying to load:  from: /pci at 800000020000000/scsi at 3 ...   Successfully loaded

[root at r223l performance_work]# echo $((0x9e78efba-0x9de978a1))
9402137
[root at r223l performance_work]# echo $((0x9f2247cc-0x9ebb67aa))                                                                                                                                             
6742050
[root at r223l performance_work]# echo $((0x9fc33e6c-0x9f64b9fd))                                                                                                                                             
6194287
[root at r223l performance_work]# echo "scale=4;(9402137-6742050)/512" | bc                                                                                                                                   
5195.4824
[root at r223l performance_work]# echo "scale=4;(9402137-6194287)/512" | bc                                                                                                                                   
6265.3320
[root at r223l performance_work]#

As per the calculations in the output of the BEFORE-PATCH and
AFTER-PATCH logs I find that there is a very noticeable and consistent improvement
in multiple runs in terms of microseconds. (My POWER9 bare-metal has 512 MHz timebase-frequency
so thats why I am dividing by 512).

Note: The above figures include the execution speed of IF-THEN and
IF2-THEN2 after compilation. But since the actual execution speeds of the IF-THEN and the IF2-THEN2 after
their compilation should be the same, this should get adjusted in my
subtraction in the above 2 bc commands.
> 


More information about the SLOF mailing list