[Skiboot] [PATCH v8 00/24] MPIPL support

Nicholas Piggin npiggin at gmail.com
Fri Jun 28 10:47:33 AEST 2019

Vasant Hegde's on June 17, 2019 3:10 am:
> Memory Preserving Initial Program Load (MPIPL) is a Power feature where
> the contents of memory are preserved while the system reboots after a
> failure. This is accomplished by the firmware/OS publishing ranges of
> memory to be preserved across boots.
> In the OPAL context, OPAL and host Linux communicate the memory ranges
> to be preserved via source descriptor tables in the HDAT. OPAL and Linux
> can update these tables during runtime. OPAL sends relocated OPAL base
> address to SBE. When OPAL or Linux crashes, SBE gets to know of the
> event via a special interrupt which causes it ot trigger the MPIPL.
> SBE then collects archicted register data and loads Hostboot. Hostboot
> then re-IPLs the machine taking care to copy over contents of the source
> descriptor tables to a alternate memory locations and publishes this
> information in the destination descriptor tables. The success/failure
> of the copy is indicated by a results table. Hostboot also copies
> architected register states to OPAL passed memory.
> On an MPIPL boot, OPAL creates new device tree propety to indicate its
> MPIPL boot (/ibm,opal/dump/mpipl-boot). Linux makes MPIPL API call to
> get metadata pointers. Kernel uses metadata information to create
> vmcore and opalcore.
> Flow:
>   - Hostboot relies on MDST, MDDT, MDRT ntuple in HDAT for MPIPL.
>   - During boot/runtime, OPAL will update MDST and MDDT table.
>   - Kernel will create metadata area which contain source, destination
>     address, size etc.
>   - Kernel will use MPIPL API for registration
>     - It will pass src, dest, size to OPAL
>     - Pass metadata tag to OPAL. OPAL will preserve this tag across
>       MPIPL and pass it back to kernel during MPIPL boot.
>   - Kernel -OR- OPAL will request for MPIPL.
>      - On FSP system OPAL will trigger attn intruction
>      - On BMC system OPAL will trigger SBE S0 interrupt
>   - SBE quiesce the system and collect CPU register state of running
>     threads.
>   - SBE -> hostboot -> memory preserved + CPU data copied to OPAL reserved
>     memory -> load OPAL
>   - OPAL validates DUMP result table and adds `mpipl-boot` device tree property
>   - Kernel detects its MPIPL boot.
>   - Kernel will use MPIPL query tag API to retrieve metadata tags.
>   - Kenel will create `vmcore` and `opalcore`
>   - Use existing crash tool to debug `vmcore` and gdb to debug `opalcore`
> Dependency:
>   - We need Linux kernel changes to generate opal core.
>     Hari will post Linux side patches.
> Impact on kernel:
>   Upstream kernel has `fadump` (Firmware Assisted Dump) feature on PowerVM
>   LPAR. This works on top of kdump and uses same vmcore format. From kernel
>   point of view, this is extending fadump feature for OPAL based system.
> User space:
>   We are reusing existing kernel/user space infrastructure. Hence this
>   feature is transparent to end user. User can use existing crash tool
>   to debug `vmcore` and gdb to debug `opalcore`.
> CPU register data collection:
>   Before initiating crash, kernel will save running thread register
>   content and initiates crash. Then control goes to SBE. SBE will quiesce
>   the system and collect CPU register content for all applicable threads.
>   Kernel will use these data to create vmcore.
>   We had offline discussion with Nick. On of his suggestion was to use
>   kernel SRESET IPI to collect secondary CPU register data. Technically
>   it is possible to use SRESET, but that is still not completely
>   water-right. We can switch to that down the line when SRESET works
>   reliably and we find a way to collect secondary CPU data for OPAL
>   dump.

I would prefer a Linux initiated crash dump to follow the normal Linux
crash process which is SRESET. I think it's much better for Linux to
manage its own registers and the SRESET facility has to be #1 priority
to support reasonable crash handling (e.g., xmon, kdump, etc).

I would have completely nacked the SBE register collection as 
unnecessary and over complication of interfaces except that it could
be used for a BMC initiated dump with Linux completely out of the
picture. That seems like an interesting feature, although I would have
preferred to make the Linux/sreset approach work first I won't quibble
about it.

> Testing:
>   Hostboot and SBE side of changes is merged and available in upstream
>   op-build. We can use upstream op-build to build PNOR with MPIPL
>   support.
>   - Capture OPAL crashing CPU information
>     Current patchset relies on SBE to capture OPAL crashing CPU
>     information. We may miss some of the important register
>     information. In future we will enhance OPAL to collect crashing
>     CPU details.

What do you mean by this? I thought you didn't have the OPAL crash
part of the series in here?

>   - Capture VSX registers (As suggested by Mikey)


More information about the Skiboot mailing list