[Skiboot] [PATCH v8 00/24] MPIPL support
Nicholas Piggin
npiggin at gmail.com
Fri Jun 28 10:47:33 AEST 2019
Vasant Hegde's on June 17, 2019 3:10 am:
> Memory Preserving Initial Program Load (MPIPL) is a Power feature where
> the contents of memory are preserved while the system reboots after a
> failure. This is accomplished by the firmware/OS publishing ranges of
> memory to be preserved across boots.
>
> In the OPAL context, OPAL and host Linux communicate the memory ranges
> to be preserved via source descriptor tables in the HDAT. OPAL and Linux
> can update these tables during runtime. OPAL sends relocated OPAL base
> address to SBE. When OPAL or Linux crashes, SBE gets to know of the
> event via a special interrupt which causes it ot trigger the MPIPL.
>
> SBE then collects archicted register data and loads Hostboot. Hostboot
> then re-IPLs the machine taking care to copy over contents of the source
> descriptor tables to a alternate memory locations and publishes this
> information in the destination descriptor tables. The success/failure
> of the copy is indicated by a results table. Hostboot also copies
> architected register states to OPAL passed memory.
>
> On an MPIPL boot, OPAL creates new device tree propety to indicate its
> MPIPL boot (/ibm,opal/dump/mpipl-boot). Linux makes MPIPL API call to
> get metadata pointers. Kernel uses metadata information to create
> vmcore and opalcore.
>
> Flow:
> - Hostboot relies on MDST, MDDT, MDRT ntuple in HDAT for MPIPL.
> - During boot/runtime, OPAL will update MDST and MDDT table.
> - Kernel will create metadata area which contain source, destination
> address, size etc.
> - Kernel will use MPIPL API for registration
> - It will pass src, dest, size to OPAL
> - Pass metadata tag to OPAL. OPAL will preserve this tag across
> MPIPL and pass it back to kernel during MPIPL boot.
> - Kernel -OR- OPAL will request for MPIPL.
> - On FSP system OPAL will trigger attn intruction
> - On BMC system OPAL will trigger SBE S0 interrupt
> - SBE quiesce the system and collect CPU register state of running
> threads.
> - SBE -> hostboot -> memory preserved + CPU data copied to OPAL reserved
> memory -> load OPAL
> - OPAL validates DUMP result table and adds `mpipl-boot` device tree property
> - Kernel detects its MPIPL boot.
> - Kernel will use MPIPL query tag API to retrieve metadata tags.
> - Kenel will create `vmcore` and `opalcore`
> - Use existing crash tool to debug `vmcore` and gdb to debug `opalcore`
>
> Dependency:
> - We need Linux kernel changes to generate opal core.
> Hari will post Linux side patches.
>
> Impact on kernel:
> Upstream kernel has `fadump` (Firmware Assisted Dump) feature on PowerVM
> LPAR. This works on top of kdump and uses same vmcore format. From kernel
> point of view, this is extending fadump feature for OPAL based system.
> User space:
> We are reusing existing kernel/user space infrastructure. Hence this
> feature is transparent to end user. User can use existing crash tool
> to debug `vmcore` and gdb to debug `opalcore`.
>
> CPU register data collection:
> Before initiating crash, kernel will save running thread register
> content and initiates crash. Then control goes to SBE. SBE will quiesce
> the system and collect CPU register content for all applicable threads.
> Kernel will use these data to create vmcore.
>
> We had offline discussion with Nick. On of his suggestion was to use
> kernel SRESET IPI to collect secondary CPU register data. Technically
> it is possible to use SRESET, but that is still not completely
> water-right. We can switch to that down the line when SRESET works
> reliably and we find a way to collect secondary CPU data for OPAL
> dump.
I would prefer a Linux initiated crash dump to follow the normal Linux
crash process which is SRESET. I think it's much better for Linux to
manage its own registers and the SRESET facility has to be #1 priority
to support reasonable crash handling (e.g., xmon, kdump, etc).
I would have completely nacked the SBE register collection as
unnecessary and over complication of interfaces except that it could
be used for a BMC initiated dump with Linux completely out of the
picture. That seems like an interesting feature, although I would have
preferred to make the Linux/sreset approach work first I won't quibble
about it.
>
> Testing:
> Hostboot and SBE side of changes is merged and available in upstream
> op-build. We can use upstream op-build to build PNOR with MPIPL
> support.
>
> TODO:
> - Capture OPAL crashing CPU information
> Current patchset relies on SBE to capture OPAL crashing CPU
> information. We may miss some of the important register
> information. In future we will enhance OPAL to collect crashing
> CPU details.
What do you mean by this? I thought you didn't have the OPAL crash
part of the series in here?
> - Capture VSX registers (As suggested by Mikey)
>
>
Thanks,
Nick
More information about the Skiboot
mailing list