[Skiboot] [PATCH v8 00/24] MPIPL support
Vasant Hegde
hegdevasant at linux.vnet.ibm.com
Mon Jun 17 03:10:00 AEST 2019
Memory Preserving Initial Program Load (MPIPL) is a Power feature where
the contents of memory are preserved while the system reboots after a
failure. This is accomplished by the firmware/OS publishing ranges of
memory to be preserved across boots.
In the OPAL context, OPAL and host Linux communicate the memory ranges
to be preserved via source descriptor tables in the HDAT. OPAL and Linux
can update these tables during runtime. OPAL sends relocated OPAL base
address to SBE. When OPAL or Linux crashes, SBE gets to know of the
event via a special interrupt which causes it ot trigger the MPIPL.
SBE then collects archicted register data and loads Hostboot. Hostboot
then re-IPLs the machine taking care to copy over contents of the source
descriptor tables to a alternate memory locations and publishes this
information in the destination descriptor tables. The success/failure
of the copy is indicated by a results table. Hostboot also copies
architected register states to OPAL passed memory.
On an MPIPL boot, OPAL creates new device tree propety to indicate its
MPIPL boot (/ibm,opal/dump/mpipl-boot). Linux makes MPIPL API call to
get metadata pointers. Kernel uses metadata information to create
vmcore and opalcore.
Flow:
- Hostboot relies on MDST, MDDT, MDRT ntuple in HDAT for MPIPL.
- During boot/runtime, OPAL will update MDST and MDDT table.
- Kernel will create metadata area which contain source, destination
address, size etc.
- Kernel will use MPIPL API for registration
- It will pass src, dest, size to OPAL
- Pass metadata tag to OPAL. OPAL will preserve this tag across
MPIPL and pass it back to kernel during MPIPL boot.
- Kernel -OR- OPAL will request for MPIPL.
- On FSP system OPAL will trigger attn intruction
- On BMC system OPAL will trigger SBE S0 interrupt
- SBE quiesce the system and collect CPU register state of running
threads.
- SBE -> hostboot -> memory preserved + CPU data copied to OPAL reserved
memory -> load OPAL
- OPAL validates DUMP result table and adds `mpipl-boot` device tree property
- Kernel detects its MPIPL boot.
- Kernel will use MPIPL query tag API to retrieve metadata tags.
- Kenel will create `vmcore` and `opalcore`
- Use existing crash tool to debug `vmcore` and gdb to debug `opalcore`
Dependency:
- We need Linux kernel changes to generate opal core.
Hari will post Linux side patches.
Impact on kernel:
Upstream kernel has `fadump` (Firmware Assisted Dump) feature on PowerVM
LPAR. This works on top of kdump and uses same vmcore format. From kernel
point of view, this is extending fadump feature for OPAL based system.
User space:
We are reusing existing kernel/user space infrastructure. Hence this
feature is transparent to end user. User can use existing crash tool
to debug `vmcore` and gdb to debug `opalcore`.
CPU register data collection:
Before initiating crash, kernel will save running thread register
content and initiates crash. Then control goes to SBE. SBE will quiesce
the system and collect CPU register content for all applicable threads.
Kernel will use these data to create vmcore.
We had offline discussion with Nick. On of his suggestion was to use
kernel SRESET IPI to collect secondary CPU register data. Technically
it is possible to use SRESET, but that is still not completely
water-right. We can switch to that down the line when SRESET works
reliably and we find a way to collect secondary CPU data for OPAL
dump.
Testing:
Hostboot and SBE side of changes is merged and available in upstream
op-build. We can use upstream op-build to build PNOR with MPIPL
support.
TODO:
- Capture OPAL crashing CPU information
Current patchset relies on SBE to capture OPAL crashing CPU
information. We may miss some of the important register
information. In future we will enhance OPAL to collect crashing
CPU details.
- Capture VSX registers (As suggested by Mikey)
Testing:
- We have tested this patchset with upstream op-build and Hari's
kernel patches. Its working fine. We are able to get vmcore and
opalcore.
Changes in v8:
- v7 got good review from various folks including Nick, Oliver,
Mikey, etc. Thanks for all the reviews.
- As suggested by Nick I have deferred early OPAL crash.
- Completely reworked OPAL - Kernel interface
- Removed OPAL_FADUMP_MANAGE API
- Added new API for MPIPL Update (OPAL_MPIPL_UPDATE)
- Added new API to retrieve query tags (OPAL_MPIPL_QUERY_TAG)
- Added support to create OPAL metadata area and send metadata
pointer to kernel.
- Added explicit assert in few places - suggested by Mikey
- Removed few redundant checks - suggested by Oliver
Changes in v7:
- Rebased on top of current master
- Fixed hiomap test cases
- Few minor fixes/logging improvements
Changes in v6:
- Added support to get architected register data
- Added support for HIOMAP reset
- Added new patch to export OPAL boot entry address in device tree
(Needed for OPAL core)
- Added support to save/export crashing CPU details
- Few other minor fixes
Changes in v5:
- As Stewart suggested moved "dump" device tree node under /ibm,opal
- Updated OPAL API number to 170
- Added check before triggering MPIPL.
If MPIPL supported then it will trigger MPIPL, else it will call
normal reboot.
Changes in v4:
- Make sure crashing CPU will not go to stop state
- Send stash MPIPL chip op to all SBEs
- Minor prlog improvements
Changes in v3:
- Added documenation for new OPAL API and device tree binding
- Fixed MPIPL trigger path
I have hooked MPIPL trigger path to assert path. Now it will be
trigged on witherspoon only. We haven't tested on other BMC
platform. Once we test we will enable on other BMC system.
- Added MBOX reset support before triggering MPIPL
- Added support to detect `MPIPL support` system params
Changes in v2:
- Added support to get architected registers
- SBE guys changed MPIPL trigger interrupt bit. Now its S0 on both
master and slave chip SBE
- Fixed few other minor issues
Vasant Hegde (24):
OPAL: Add OPAL boot entry address to device tree
FSP/MDST: Rename fsp-mdst-table.c -> fsp-sysdump.c
hdata: Fix MDST structure
hdata: Define various DUMP related structures
mem-map: Setup memory for MDDT table
mem-map: Setup memory for MDRT table
hdata: Update spirah structure
hdata: Adjust various structure offset after relocation
hdata: Create /ibm,opal/dump device tree node
MPIPL: Register for OPAL dump
MPIPL: Define OPAL metadata area
MPIPL: Add OPAL API to register for dump
SBE: Send OPAL relocated base address to SBE
MPIPL: Add support to trigger MPIPL on BMC system
MPIPL: Save crashing PIR
HIOMAP: Reset bmc mbox in MPIPL path
platform: Introduce new reboot type
hdata: Add "mpipl-boot" property to "dump" node
MPIPL: Prepare OPAL data tag
MPIPL: Add OPAL API to query saved tags
MPIPL: Invalidate dump
MPIPL: Reserve memory to capture architected registers data
MPIPL: Prepare architected registers data tag
MPIPL: Add documentation
core/Makefile.inc | 2 +-
core/flash.c | 14 +-
core/init.c | 6 +-
core/opal-dump.c | 520 +++++++++++++++++++++++++++++
core/opal.c | 4 +-
core/platform.c | 8 +-
doc/device-tree/ibm,opal/dump.rst | 23 ++
doc/index.rst | 1 +
doc/mpipl.rst | 46 +++
doc/opal-api/index.rst | 4 +
doc/opal-api/opal-cec-reboot-6-116.rst | 9 +
doc/opal-api/opal-mpipl-173-174.rst | 119 +++++++
hdata/spira.c | 137 +++++++-
hdata/spira.h | 10 +-
hw/fsp/Makefile.inc | 2 +-
hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} | 16 +-
hw/ipmi/ipmi-attn.c | 16 +-
hw/sbe-p9.c | 121 +++++++
include/fsp-mdst-table.h | 48 ---
include/mem-map.h | 38 ++-
include/opal-api.h | 40 ++-
include/opal-dump.h | 139 ++++++++
include/sbe-p9.h | 18 +-
include/skiboot.h | 3 +-
libflash/blocklevel.h | 1 +
libflash/ipmi-hiomap.c | 32 +-
libflash/ipmi-hiomap.h | 2 +-
libflash/mbox-flash.c | 29 +-
libflash/mbox-flash.h | 2 +-
libflash/test/mbox-server.c | 1 +
libflash/test/test-ipmi-hiomap.c | 169 ++++++++++
skiboot.lds.S | 12 +-
32 files changed, 1503 insertions(+), 89 deletions(-)
create mode 100644 core/opal-dump.c
create mode 100644 doc/device-tree/ibm,opal/dump.rst
create mode 100644 doc/mpipl.rst
create mode 100644 doc/opal-api/opal-mpipl-173-174.rst
rename hw/fsp/{fsp-mdst-table.c => fsp-sysdump.c} (96%)
delete mode 100644 include/fsp-mdst-table.h
create mode 100644 include/opal-dump.h
--
2.14.3
More information about the Skiboot
mailing list