[Skiboot] [PATCH v7 18/22] fadump: Add documentation

Tue Jun 4 18:57:18 AEST 2019

Vasant Hegde's on June 3, 2019 6:28 pm:
> On 05/31/2019 09:10 AM, Nicholas Piggin wrote:
>> Well it lets the kernel preserve a pointer to its crash metadata within
>> a preserved memory range. Although maybe that still has a chicken and egg
>> problem when you retrieve the data if the memory has been moved. I guess
>> rather than arbitrary tag, it could be a pointer offset that will get
>> updated to point to the destination location when you boot up.
> 
> We don't need such tags. Based on destination memory address, kernel can
> retrieve meta data.

How? You have a new crash kernel booted with some saved ranges from the
previous kernel. How do you decide where the fadump metadata is?

>>> 3. Use structure to pass entire data. Device tree will contain total
>>> result-table size.
>>>       Kernel will make API call to get data.
>> 
>> Does it need any device tree?
> 
> Yes. At least we should indicate we its MPIPL boot and size of result table.

MPIPL boot okay, but for retrieving the saved ranges? An OPAL call seems
nicer than parsing dt, and dynamic.

>>  How about OPAL calls to retrieve the
>> preserved ranges?
> 
> Yeah. We can consider OPAL API.
> 
>> 
>> So you retrieve you kernel metadata pointer that's in dest memory,
>> and that contains all your saved ranges and other metadata.
> 
> During MPIPL, hostboot takes care of  moving data from source to destination memory.
> Hostboot point of view its just memory preserving (copying memory from source to
> destination). It doesn't differentiate whether its kernel memory -OR- OPAL memory.
> Once it completes it boots OPAL and passes result table to OPAL. This result 
> table contains both OPAL and kernel entries. We have to pass this information to 
> kernel. Also kernel needs a way
> to differentiate between OPAL and kernel entries.

Oh, the new kernel needs to distinguish that. Right, a structure
or convention shared between OPAL's crash dump facility and Linux's
would be needed then. It does not have to be part of the MPIPL API
though.

> So here is my new interface proposal. Does this looks ok?
> 
> Registration :
>    opal_fadump_manage(u8 cmd, u64 src, u64 dest,  u64 size)
                        ^^

			enum, I think? No need for u8?

And opal_mpipl_update, perhaps? OPAL_MPIPL_ADD_RANGE, REMOVE_RANGE, 
REMOVE_ALL?

If you have also opal_mpipl_query, then you don't need dt or tables,
you just have a symmetric(ish) API to complement opal_fadump_manage.

opal_mpipl_query(uint32_t index, uint64_t *src, uint64_t *dest, uint64_t *size)

Start index from zero and count up until it returns an error.

>        cmd :  0x01 -> OPAL_FADUMP_REGISTER , kernel should pass all 3 parameters
>                   0x02 -> OPAL_FADUMP_UNREGISTER, kernel should pass src and 
> dest parameter

Passing size might just help with error checking. Not much harm to
require it?

>                                if src = dest = 0, then UNREGISTER all kernel entries

Needed, or just add a new enum? This is the same as INVALIDATE isn't it?

>                   0x03 -> OPAL_FADUMP_INVALIDATE, No parameter is required.
> 
>      - OPAL will add new entry to MDST/MDST table
>      - If system crashes before kernel adds all entry, we will have partial 
> kernel dump
>      - Kernel should handle partial dumps

This is only if the kernel requests an MPIPL boot, right? So the kernel 
is free to keep its own flag and set it only when all entries are added
here, and only MPIPL when the flag is set.

>      - Kernel has to maintain its metadata. If in future, multiple driver wants 
> different dumps
>        they have to go through same interface (CONFIG_FADUMP).
> 
> Post MPIPL pass result table to kernel :
>      - Device tree indicates its MPIPL boot (/ibm,opal/dump/fadump)

s/fadump/mpipl-boot ?

>      - For OPAL dump, device tree contains crashing CPU PIR 
> (/ibm,opal/dump/crashing-pir)

Should just be in an opal-crash metadata area. Put all crash metadata
into memory, and none into APIs or device tree.

>      - Device tree contains size (in bytes) of result table 
> (/ibm,opal/dump/result-table-size)
>        data format : u8 type, u64 source , u64 destination, u64 size
>            --> type field is required to differentiate between OPAL and kernel 
> entries. In future
>                 if we add new dump support, we will add new type.
> 
>      - OPAL API : opal_fadump_result_table(u64 data, u64 size)
>          Kernel allocates memory for data and makes OPAL call
>          OPAL copies result table to kernel passed memory.
>          Kernel parses entry based on type field.

I still think you need a way to initially locate your root crash
metadata. Possibly that can fit within the above OPAL APIs with
some convention to identify it (e.g., if size==0 then source and
destination can be arbitrary and will not cause memory to be
copied).

Thanks,
Nick