[Skiboot] [PATCH] doc: Add (most) nvram debugging options
Stewart Smith
stewart at linux.ibm.com
Tue Apr 30 14:39:32 AEST 2019
Signed-off-by: Stewart Smith <stewart at linux.ibm.com>
---
doc/console-log.rst | 17 ++++
doc/device-tree/ibm,opal/power-mgt.rst | 2 +
doc/index.rst | 1 +
doc/opal-api/opal-cec-reboot-6-116.rst | 11 +++
doc/pci.rst | 119 ++++++++++++++++++++++++-
doc/power-management.rst | 17 ++++
6 files changed, 166 insertions(+), 1 deletion(-)
create mode 100644 doc/power-management.rst
diff --git a/doc/console-log.rst b/doc/console-log.rst
index ca9ec3ff04ad..c758e9a57482 100644
--- a/doc/console-log.rst
+++ b/doc/console-log.rst
@@ -74,3 +74,20 @@ still only PR_NOTICE through drivers.
People who write something like 0x1f will get a very quiet boot indeed.
+Debugging
+---------
+
+You can change the log level of what goes to the in memory buffer and whta
+goes to the driver (i.e. serial port / IPMI Serial over LAN) at boot time
+by setting NVRAM variables: ::
+
+ nvram -p ibm,skiboot --update-config log-level-driver=7
+ nvram -p ibm,skiboot --update-config log-level-memory=7
+
+You can also use the named versions of emerg, alert, crit, err,
+warning, notice, printf, info, debug, trace or insane. ie. ::
+
+ nvram -p ibm,skiboot --update-config log-level-driver=insane
+
+
+You an also write to the debug_descriptor to change it at runtime.
diff --git a/doc/device-tree/ibm,opal/power-mgt.rst b/doc/device-tree/ibm,opal/power-mgt.rst
index b326a24b8700..8d9439d7db16 100644
--- a/doc/device-tree/ibm,opal/power-mgt.rst
+++ b/doc/device-tree/ibm,opal/power-mgt.rst
@@ -1,3 +1,5 @@
+.. _power-mgt-devtree:
+
ibm,opal/power-mgt device tree entries
======================================
diff --git a/doc/index.rst b/doc/index.rst
index b7a868c96e85..79a5accf2434 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -46,6 +46,7 @@ Developer Guide and Internals
xscom-node-bindings
xive
imc
+ power-management
OPAL ABI
diff --git a/doc/opal-api/opal-cec-reboot-6-116.rst b/doc/opal-api/opal-cec-reboot-6-116.rst
index 516d4fc01f9e..e9e53ce24a95 100644
--- a/doc/opal-api/opal-cec-reboot-6-116.rst
+++ b/doc/opal-api/opal-cec-reboot-6-116.rst
@@ -66,3 +66,14 @@ OPAL_REBOOT_FULL_IPL = 2
Unsupported Reboot type
For unsupported reboot type, this function will return with
OPAL_UNSUPPORTED and no reboot will be triggered.
+
+Debugging
+^^^^^^^^^
+
+This is **not** ABI and may change or be removed at any time.
+
+You can change if the software checkstop trigger is used or not by an NVRAM
+variable: ::
+
+ nvram -p ibm,skiboot --update-config opal-sw-xstop=enable
+ nvram -p ibm,skiboot --update-config opal-sw-xstop=disable
diff --git a/doc/pci.rst b/doc/pci.rst
index f72fc1480b53..d18d35d8f301 100644
--- a/doc/pci.rst
+++ b/doc/pci.rst
@@ -1,7 +1,124 @@
PCI
===
-**WARNING**: This documentation **urgently needs updating** and is *woefully* incomplete.
+Debugging
+---------
+
+There exist a couple of NVRAM options for enabling extra debug functionality
+to help debug PCI issues. These are not ABI and may be changed or removed at
+**any** time.
+
+Verbose EEH
+^^^^^^^^^^^
+
+::
+
+ nvram -p ibm,skiboot --update-config pci-eeh-verbose=true
+
+Disable EEH MMIO
+^^^^^^^^^^^^^^^^
+::
+ nvram -p ibm,skiboot --update-config pci-eeh-mmio=disabled
+
+
+Check for RX errors after link training
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Some PHB4 PHYs can get stuck in a bad state where they are constantly
+retraining the link. This happens transparently to skiboot and Linux
+but will causes PCIe to be slow. Resetting the PHB4 clears the
+problem.
+
+We can detect this case by looking at the RX errors count where we
+check for link stability. This patch does this by modifying the link
+optimal code to check for RX errors. If errors are occurring we
+retrain the link irrespective of the chip rev or card.
+
+Normally when this problem occurs, the RX error count is maxed out at
+255. When there is no problem, the count is 0. We chose 8 as the max
+rx errors value to give us some margin for a few errors. There is also
+a knob that can be used to set the error threshold for when we should
+retrain the link. i.e. ::
+
+ nvram -p ibm,skiboot --update-config phb-rx-err-max=8
+
+Retrain link if degraded
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+On P9 Scale Out (Nimbus) DD2.0 and Scale in (Cumulus) DD1.0 (and
+below) the PCIe PHY can lockup causing training issues. This can cause
+a degradation in speed or width in ~5% of training cases (depending on
+the card). This is fixed in later chip revisions. This issue can also
+cause PCIe links to not train at all, but this case is already
+handled.
+
+There is code in skiboot that checks if the PCIe link has trained optimally
+and if not, does a full PHB reset (to fix the PHY lockup) and retrain.
+
+One complication is some devices are known to train degraded unless
+device specific configuration is performed. Because of this, we only
+retrain when the device is in a whitelist. All devices in the current
+whitelist have been testing on a P9DSU/Boston, ZZ and Witherspoon.
+
+We always gather information on the link and print it in the logs even
+if the card is not in the whitelist.
+
+For testing purposes, there's an nvram to retry all PCIe cards and all
+P9 chips when a degraded link is detected. The new option is
+'pci-retry-all=true' which can be set using: ::
+
+ nvram -p ibm,skiboot --update-config pci-retry-all=true
+
+This option may increase the boot time if used on a badly behaving
+card.
+
+Maximum link speed
+^^^^^^^^^^^^^^^^^^
+
+Was useful during bringup on P9 DD1.
+
+::
+ nvram -p ibm,skiboot --update-config pcie-max-link-speed=4
+
+
+Ric Mata Mode
+^^^^^^^^^^^^^
+
+This mode (for PHB4) will trace the training process closely. This activates
+as soon as PERST is deasserted and produces human readable output of
+the process.
+
+It will also add the PCIe Link Training and Status State Machine (LTSSM) tracing
+and details on speed and link width.
+
+Output looks a bit like this ::
+
+ [ 1.096995141,3] PHB#0000[0:0]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
+ [ 1.102849137,3] PHB#0000[0:0]: TRACE:0x0000102101000000 11ms presence GEN1:x16:polling
+ [ 1.104341838,3] PHB#0000[0:0]: TRACE:0x0000182101000000 14ms training GEN1:x16:polling
+ [ 1.104357444,3] PHB#0000[0:0]: TRACE:0x00001c5101000000 14ms training GEN1:x16:recovery
+ [ 1.104580394,3] PHB#0000[0:0]: TRACE:0x00001c5103000000 14ms training GEN3:x16:recovery
+ [ 1.123259359,3] PHB#0000[0:0]: TRACE:0x00001c5104000000 51ms training GEN4:x16:recovery
+ [ 1.141737656,3] PHB#0000[0:0]: TRACE:0x0000144104000000 87ms presence GEN4:x16:L0
+ [ 1.141752318,3] PHB#0000[0:0]: TRACE:0x0000154904000000 87ms trained GEN4:x16:L0
+ [ 1.141757964,3] PHB#0000[0:0]: TRACE: Link trained.
+ [ 1.096834019,3] PHB#0001[0:1]: TRACE:0x0000001101000000 0ms GEN1:x16:detect
+ [ 1.105578525,3] PHB#0001[0:1]: TRACE:0x0000102101000000 17ms presence GEN1:x16:polling
+ [ 1.112763075,3] PHB#0001[0:1]: TRACE:0x0000183101000000 31ms training GEN1:x16:config
+ [ 1.112778956,3] PHB#0001[0:1]: TRACE:0x00001c5081000000 31ms training GEN1:x08:recovery
+ [ 1.113002083,3] PHB#0001[0:1]: TRACE:0x00001c5083000000 31ms training GEN3:x08:recovery
+ [ 1.114833873,3] PHB#0001[0:1]: TRACE:0x0000144083000000 35ms presence GEN3:x08:L0
+ [ 1.114848832,3] PHB#0001[0:1]: TRACE:0x0000154883000000 35ms trained GEN3:x08:L0
+ [ 1.114854650,3] PHB#0001[0:1]: TRACE: Link trained.
+
+Enabled via NVRAM: ::
+
+ nvram -p ibm,skiboot --update-config pci-tracing=true
+
+Named after the person the output of this mode is typically sent to.
+
+
+**WARNING**: The documentation below **urgently needs updating** and is *woefully* incomplete.
IODA PE Setup Sequences
-----------------------
diff --git a/doc/power-management.rst b/doc/power-management.rst
new file mode 100644
index 000000000000..76491a71464d
--- /dev/null
+++ b/doc/power-management.rst
@@ -0,0 +1,17 @@
+Power Management
+================
+
+See :ref:`power-mgt-devtree` for device tree structure describing power management facilities.
+
+Debugging
+---------
+
+There exist a few debug knobs that can be set via nvram settings. These are
+**not** ABI and may be changed or removed at *any* time.
+
+Disabling specific stop states
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+On boot, specific stop states can be disabled via setting a mask. For example,
+to disable all but stop 0,1,2, use ~0xE0000000. ::
+
+ nvram -p ibm,skiboot --update-config opal-stop-state-disable-mask=0x1FFFFFFF
--
2.20.1
More information about the Skiboot
mailing list