[Skiboot] [PATCH v8 20/24] MPIPL: Add OPAL API to query saved tags
Michael Neuling
mikey at neuling.org
Wed Jul 10 13:52:14 AEST 2019
On Tue, 2019-07-09 at 20:53 +0530, Vasant Hegde wrote:
> On 07/09/2019 03:33 PM, Oliver O'Halloran wrote:
> > On Sun, 2019-06-16 at 22:40 +0530, Vasant Hegde wrote:
> > > Pre-MPIPL kernel saves various information required to create vmcore in
> > > metadata area and passes metadata area pointer to OPAL. OPAL will preserve
> > > this pointer across MPIPL. Post MPIPL kernel will request for saved tags
> > > via this API. Kernel also needs below tags:
> > > - Saved CPU registers data to access CPU registers
> > > - OPAL metadata area to create opalcore
> > >
> > > Format:
> > > opal_mpipl_query_tag(uint32_t idx, uint64_t *tag)
> > >
> > > idx : tag index (0..n)
> > > tag : OPAL will pass saved tag
> > >
> > > Kernel will make this call with increased `index` until OPAL returns
> > > OPAL_EMPTY.
> > >
> > > Return values:
> > > OPAL_SUCCESS : Operation success
> > > OPAL_PARAMETER : Invalid parameter
> > > OPAL_EMPTY : OPAL completed sending all tags to kernel
> >
> > After spending a while picking through the linux patches I'll say that the
> > revised API is mostly fine, BUT:
> >
> > a) The documentation is *woefully* inadequate, and
>
> I have documented API under `doc/opal-api/opal-mpipl-173-174.rst` with
> expected
> behavior
> under various scenarios (pretty much similar to how we documented other APIs).
>
>
> > b) We need to think a bit more about how tags are used.
>
> We (Myself, Ananth, Nick, Hari, Mahesh) had a detailed *offline* discussion
> on
> tag. We did
> thought about passing `tag` as part of opal_mpipl_query_tag() API. But Nick
> don't wanted
> that and wanted to keep API simple. So now our API doesn't care about `tag`
> type. Its just
> a interface to retrieve various tags. So we had to push the tag field to
> structure.
>
>
> > My interpretation is that the kernel tag is there to allow the crashing
> > kernel
> > to define it's own dump metadata for the recieving kernel. As a result
> > firmware,
>
> Yes. Kernel will have its metadata area and tag points to the metadata area.
> (well, actually tag
> can be anything using which post kernel can retrieve its metadata area).
>
> > which includes the petitboot kernel, can't make assumptions about the
> > contents
> > of the tagged data since it could be populated by a version of Linux with a
> > different metadata ABI or by another OS entirely.
>
> OPAL will not care about tag content. Its job is to preserve tag across MPIPL
> and
> pass it back to kernel.
>
>
> > So as far as I can tell the overall process is something like:
> >
> > 1. We boot an OS. This may or may not be based on a Linux kernel.
> > 2. OS configures MPIPL and sets the kernel tag to some structure it defines.
> > 3. OS triggers a MPIPL. We re-run through the SBE and hostboot. Most of
> > memory is left as-is. Some memory used by hostboot, the ranges in the
> > MDDT,
> > and the skiboot region at 0x3000_0000 are overwritten.
> > 4. Skiboot parses the MDRT and sets a few DT flags to indicate an MPIPL
> > happened and populates the tags which are defined by OPAL.
> > 5. Petitkernel boots and detects that an MPIPL happened and tries to
> > preserve
> > the crashed kernel's memory.
> > 6. We kexec into a capture kernel that knows how to parse the kernel tag's
> > metadata and generates a dump from that.
>
> Correct.
>
> >
> > This raises a few questions:
> >
> > 1) How much memory does firmware have to work with? This doesn't seem to be
>
> So Post MPIPL,
> - hostboot needs to be loaded and executed. Its done at predefined address
> and
> that memory is part of reserved-memory range. So we don't need to worry
> about this memory.
> - OPAL memory
> By the time OPAL loads, hostboot already preserved memory based on MDST,
> MDDT table.
> These table contains OPAL runtime memory as well. So by the time we load
> OPAL its already
> preserved. We are good to reload OPAL on same memory.
> - Memory used by OPAL to load petitboot kernel
> First kernel advertises these range to kernel via `fw-load-area` so that
> kernel care of
> preserving these memory.
>
>
>
> > actually documented anywhere. The closest thing I can find to an answer is
> > that
> > in the Linux patches will preserve at least 0x0...0x3000_0000 since that's
> > what
> > the fadump.ops->get_bootmem_min() returns for powernv.
>
> Kernel will preserve kernel memory ranges. It won't care about firmware memory
> area.
>
> > However, I think the
> > minimum memory that firmware needs is something that it should tell the OS
> > via
> > the DT rather than having the OS guess.
>
> OS won't guess anything about firmware memory usage. In fact it won't care
> about
> firmware memory except `fw-load-area`.
>
> > We already communicate some of that
> > information via the fw-load-area DT property, but that only contains the
> > ranges where skiboot loads the kernel and initrd (0x2000_000...0x2fff_ffff).
>
> OS needs to know about these memory ranges *only*...as first kernel may use
> these
> memory. If it uses it should take care of reserving destination memory to
> preserve these
> memory and pass it during MPIPL registration.
>
> > Odds are hostboot doesn't use *that* much memory, but the petitboot kernel
> > can
>
> As mentioned above kernel need not worry about hostboot memory usage...as its
> part of reserved-memory ranges.
Vasant,
I think you may have missed Oliver's point in the above.
The Petitboot kernel needs some amount space to operate. In your current patch
series, the amount of space "firmware" is allowed to use is hard-wired by the
crashing kernel. Olivers point is that this size should be defined dynamically
by firmware, not hard-wired into the crashing kernel.
If we every change petitboot or some other component of the firmware so that it
needs to use more memory, then the crashing kernel won't have any way of
discovering that and it won't allocate enough space for this process to start.
Mikey
More information about the Skiboot
mailing list