[Skiboot] skiboot v6.3-rc3 released
Stewart Smith
stewart at linux.ibm.com
Thu May 2 18:31:40 AEST 2019
skiboot-6.3-rc3
***************
skiboot v6.3-rc3 was released on Thursday May 2nd 2019. It is the
third release candidate of skiboot 6.3, which will become the new
stable release of skiboot following the 6.2 release, first released
December 14th 2018.
Skiboot 6.3 will mark the basis for op-build v2.3. I expect to tag the
final skiboot 6.3 in the next week (I also predicted this last time,
so take my predictions with a large amount of sodium).
skiboot v6.3-rc3 contains all bug fixes as of skiboot-6.0.19, and
skiboot-6.2.3 (the currently maintained stable releases).
For how the skiboot stable releases work, see Skiboot stable tree
rules and releases for details.
Over skiboot-6.3-rc2, we have the following changes:
* Expose PNOR Flash partitions to host MTD driver via devicetree
This makes it possible for the host to directly address each
partition without requiring each application to directly parse the
FFS headers. This has been in use for some time already to allow
BOOTKERNFW partition updates from the host.
All partitions except BOOTKERNFW are marked readonly.
The BOOTKERNFW partition is currently exclusively used by the
TalosII platform
* Write boot progress to LPC port 80h
This is an adaptation of what we currently do for op_display() on
FSP machines, inventing an encoding for what we can write into the
single byte at LPC port 80h.
Port 80h is often used on x86 systems to indicate boot
progress/status and dates back a decent amount of time. Since a byte
isn’t exactly very expressive for everything that can go on (and
wrong) during boot, it’s all about compromise.
Some systems (such as Zaius/Barreleye G2) have a physical dual 7
segment display that display these codes. So far, this has only been
driven by hostboot (see hostboot commit 90ec2e65314c).
* Write boot progress to LPC ports 81 and 82
There’s a thought to write more extensive boot progress codes to LPC
ports 81 and 82 to supplement/replace any reliance on port 80.
We want to still emit port 80 for platforms like Zaius and Barreleye
that have the physical display. Ports 81 and 82 can be monitored by
a BMC though.
* Copy and convert Romulus descriptors to Talos
Talos II has some hardware differences from Romulus, therefore we
cannot guarantee Talos II == Romulus in skiboot. Copy and slightly
modify the Romulus files for Talos II.
* npu2: Disable Probe-to-Invalid-Return-Modified-or-Owned snarfing
by default
V100 GPUs are known to violate NVLink2 protocol in some cases (one
is when memory was accessed by the CPU and they by GPU using so
called block linear mapping) and issue double probes to NPU which
can cope with this problem only if CONFIG_ENABLE_SNARF_CPM
(“disable/enable Probe.I.MO snarfing a cp_m”) is not set in the
CQ_SM Misc Config register #0. If the bit is set (which is the case
today), NPU issues the machine check stop.
The snarfing feature is designed to detect 2 probes in flight and
combine them into one.
This adds a new “opal-npu2-snarf-cpm” nvram variable which controls
CONFIG_ENABLE_SNARF_CPM for all NVLinks to prevent the machine check
stop from happening.
This disables snarfing by default as otherwise a broken GPU driver
can crash the entire box even when a GPU is passed through to a
guest. This provides a dial to allow regression tests (might be
useful for a bare metal). To enable snarfing, the user needs to run:
sudo nvram -p ibm,skiboot --update-config opal-npu2-snarf-cpm=enable
and reboot the host system.
* hw/npu2: Show name of opencapi error interrupts
* core/pci: Use PHB io-base-location by default for PHB slots
On witherspoon only the GPU slots and the three pluggable PCI slots
(SLOT0, 1, 2) have platform defined slot names. For builtin devices
such as the SATA controller or the PLX switch that fans out to the
GPU slots we have no location codes which some people consider an
issue.
This patch address the problem by making the ibm,slot-location-code
for the root port device default to the ibm,io-base-location-code
which is typically the location code for the system itself.
e.g.
pciex at 600c3c0100000/ibm,loc-code
"UOPWR.0000000-Node0-Proc0"
pciex at 600c3c0100000/pci at 0/ibm,loc-code
"UOPWR.0000000-Node0-Proc0"
pciex at 600c3c0100000/pci at 0/usb-xhci at 0/ibm,loc-code
"UOPWR.0000000-Node0"
The PHB node, and the root complex nodes have a loc code of the
processor they are attached to, while the usb-xhci device under the
root port has a location code of the system itself.
* hw/phb4: Read ibm,loc-code from PBCQ node
On P9 the PBCQs are subdivided by stacks which implement the PCI
Express logic. When phb4 was forked from phb3 most of the properties
that were in the pbcq node moved into the stack node, but ibm,loc-
code was not one of them. This patch fixes the phb4 init sequence to
read the base location code from the PBCQ node (parent of the stack
node) rather than the stack node itself.
* hw/xscom: add missing P9P chip name
* asm/head: balance branches to avoid link stack predictor
mispredicts
The Linux wrapper for OPAL call and return is arranged like this:
__opal_call:
mflr r0
std r0,PPC_STK_LROFF(r1)
LOAD_REG_ADDR(r11, opal_return)
mtlr r11
hrfid -> OPAL
opal_return:
ld r0,PPC_STK_LROFF(r1)
mtlr r0
blr
When skiboot returns to Linux, it branches to LR (i.e., opal_return)
with a blr. This unbalances the link stack predictor and will cause
mispredicts back up the return stack.
* external/mambo: also invoke readline for the non-autorun case
* asm/head.S: set POWER9 radix HID bit at entry
When running in virtual memory mode, the radix MMU hid bit should
not be changed, so set this in the initial boot SPR setup.
As a side effect, fast reboot also has HID0:RADIX bit set by the
shared spr init, so no need for an explicit call.
* opal-prd: Fix memory leak in is-fsp-system check
* opal-prd: Check malloc return value
* hw/phb4: Squash the IO bridge window
The PCI-PCI bridge spec says that bridges that implement an IO
window should hardcode the IO base and limit registers to zero.
Unfortunately, these registers only define the upper bits of the IO
window and the low bits are assumed to be 0 for the base and 1 for
the limit address. As a result, setting both to zero can be mis-
interpreted as a 4K IO window.
This patch fixes the problem the same way PHB3 does. It sets the IO
base and limit values to 0xf000 and 0x1000 respectively which most
software interprets as a disabled window.
lspci before patch:
0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
I/O behind bridge: 00000000-00000fff
lspci after patch:
0000:00:00.0 PCI bridge: IBM Device 04c1 (prog-if 00 [Normal decode])
I/O behind bridge: None
* build: link with –orphan-handling=warn
The linker can warn when the linker script does not explicitly place
all sections. These orphan sections are placed according to
heuristics, which may not always be desirable. Enable this warning.
* build: -fno-asynchronous-unwind-tables
skiboot does not use unwind tables, this option saves about 100kB,
mostly from .text.
* hw/xscom: Enable sw xstop by default on p9
This was disabled at some point during bringup to make life easier
for the lab folks trying to debug NVLink issues. This hack really
should have never made it out into the wild though, so we now have
the following situation occuring in the field:
1. A bad happens
2. The host kernel recieves an unrecoverable HMI and calls into
OPAL to request a platform reboot.
3. OPAL rejects the reboot attempt and returns to the kernel with
OPAL_PARAMETER.
4. Kernel panics and attempts to kexec into a kdump kernel.
A side effect of the HMI seems to be CPUs becoming stuck which
results in the initialisation of the kdump kernel taking a extremely
long time (6+ hours). It’s also been observed that after performing
a dump the kdump kernel then crashes itself because OPAL has ended
up in a bad state as a side effect of the HMI.
All up, it’s not very good so re-enable the software checkstop by
default. If people still want to turn it off they can using the
nvram override.
* opal/hmi: Initialize the hmi event with old value of TFMR.
Do this before we fix TFAC errors. Otherwise the event at host
console shows no thread error reported in TFMR register.
Without this patch the console event show TFMR with no thread error:
(DEC parity error TFMR[59] injection)
[ 53.737572] Severe Hypervisor Maintenance interrupt [Recovered]
[ 53.737596] Error detail: Timer facility experienced an error
[ 53.737611] HMER: 0840000000000000
[ 53.737621] TFMR: 3212000870e04000
After this patch it shows old TFMR value on host console:
[ 2302.267271] Severe Hypervisor Maintenance interrupt [Recovered]
[ 2302.267305] Error detail: Timer facility experienced an error
[ 2302.267320] HMER: 0840000000000000
[ 2302.267330] TFMR: 3212000870e14010
--
Stewart Smith
OPAL Architect, IBM.
More information about the Skiboot
mailing list