[Skiboot] [PATCH v7 18/22] fadump: Add documentation
Hari Bathini
hbathini at linux.ibm.com
Tue Jun 4 04:22:47 AEST 2019
On 03/06/19 1:58 PM, Vasant Hegde wrote:
> On 05/31/2019 09:10 AM, Nicholas Piggin wrote:
>> Vasant Hegde's on May 29, 2019 4:01 am:
>>> On 05/20/2019 07:59 AM, Nicholas Piggin wrote:
>>>> Vasant Hegde's on May 18, 2019 9:12 pm:
>>>>> On 05/16/2019 11:05 AM, Nicholas Piggin wrote:
>>>>>> Vasant Hegde's on May 14, 2019 9:23 pm:
>>>>>>> On 05/09/2019 10:28 AM, Nicholas Piggin wrote:
>>>>>>>> Vasant Hegde's on April 13, 2019 7:15 pm:
>>>>>>>>> diff --git a/doc/opal-api/opal-fadump-manage-173.rst b/doc/opal-api/opal-fadump-manage-173.rst
>>>>>>>>> new file mode 100644
>>>>>>>>> index 000000000..916167503
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/doc/opal-api/opal-fadump-manage-173.rst
>>>>>>>>> @@ -0,0 +1,73 @@
>>>>>>>>> +.. _opal-api-fadump-manage:
>>>>>>>>> +
>
> .../...
>
>>>>>
>>>>> Type : Identifies memory content type (like OPAL, kernel, etc). During MPIPL
>>>>> registration
>>>>> we pass this data to HDAT.. Hostboot will just copy this back to Result table
>>>>> inside HDAT.
>>>>> During MPIPL boot, OPAL passes this information to kernel.. so that kernel can
>>>>> generate
>>>>> proper dumps.
>>>>
>>>> Right. But it's all metadata that "MPIPL" does not need to know. We want
>>>> a service that preserves memory over reboot. Then Linux can create its
>>>> own metadata to use that for fadump crashes, for example.
>>>
>>> OPAL/firmware is not using this metadata. It just passes it back.
>>
>> That's a good hint to just leave it out and let Linux define its own
>> metadata.
>>
>>> But I got your point. We can skip passing "type" information to OPAL.
>>>
>>>>
>>>>>>
>>>>>>> After reboot, we will result range from hostboot . We pass that to kernel via
>>>>>>> device tree.
>>>>>>>
>>>>>>>>
>>>>>>>> Why not just an API which can add a range, and delete a range, and
>>>>>>>> that's it? Range would just be physical start, end, plus an arbitrary
>>>>>>>> tag (which caller can use to retrieve metadata that is used to
>>>>>>>> decipher the dump).
>>>>>>>
>>>>>>> We want one to one mapping between source and destination.
>>>>>>
>>>>>> Ah yes, sure that too. So two calls, one which adds or removes
>>>>>> (source, dest, length) entries, and another which sets a tag.
>>>>>
>>>>> Sorry. I'm still not getting what we gain by multiple calls here.
>>>>
>>>> No ugly structure that's tied to some internal dump metadata.
>>>
>>> I understand the metadata part. But structure is not tied to anything else.
>>> The data format used to pass information from OPAL to hostboot is completely
>>> different.
>>> This structure is used to pass data between OPAL and kernel.
>>
>> It's just ugly. Start with an API which is as simple and minimal as
>> possible, and then explain why you need to complicate it.
>>
>> Performance and race windows are not good reasons AFAIKS. We can do
>
> This API is mostly used during system boot time. So we were not considering
> performance.
>
>
>> about 1-2 million OPAL calls per second last I checked, and the
>> race/atomicity issue is better solved in the kernel, where you can
>> set a flag after registration is done, and only issue the MPIPL reboot
>> if that flag is set.
>>
>>>>> - With structure we can pass all information in one call. So kernel can make
>>>>> single call for registration.
>>>>
>>>> We don't gain much there AFAIKS.
>>>>
>>>>> - Its controlled by version field (we realized need of version field during
>>>>> review and I will
>>>>> add that in v8) . So makes it easy to handle comparability issues.
>>>>
>>>> But you only have backward compatibility concerns because you're
>>>> exposing the structure and putting crash metadata into it in the first
>>>> place.
>>>
>>> Otherwise for any change in interface we will endup creating new API
>>> (like we did OPAL_DUMP_INFO2).
>>
>> If you take all the unnecessary stuff out of your API, and you have a
>> better chance you get it right and won't need to change.
>>
>> "Preserve these memory ranges across boot" is the be all and end all of
>> preserving memory ranges across boot :) But if we do need to change
>> things, a new OPAL call is not worse than versioning a structure.
>
> Both as bad anyway.
>
>>
>>> I don't think we gain much with multiple APIs (for registration, un-registration,
>>> invalidate, complete). Anyway let me come up with new API which passes
>>> src, dest, size.
>>
>> Thanks for considering it. Not sure if you need all those. One OPAL call
>> number that takes a range and flag to preserve/unpreserve, and another
>> to initiate the MPIPL. Possibly another to save a tag/pointer.
>>
>>>> No, just make it an arbitrary tag. Then the caller can use that as a
>>>> pointer and use that to find its own metadata.
>>>
>>> Sorry. I didn't get this. What you mean by "arbitrary tag"?
>>
>> Well it lets the kernel preserve a pointer to its crash metadata within
>> a preserved memory range. Although maybe that still has a chicken and egg
>> problem when you retrieve the data if the memory has been moved. I guess
>> rather than arbitrary tag, it could be a pointer offset that will get
>> updated to point to the destination location when you boot up.
>
> We don't need such tags. Based on destination memory address, kernel can
> retrieve meta data.
>
>>
>>> What I was telling is, post crash kernel needs below information so that it can
>>> create proper
>>> dump.
>>> - Crashing CPU --> In case of OPAL crash
>>> - source_addr, dest_addr, size, type
>>> Kernel uses type field to separate out the entry and to create dump.
>>>
>>> Now either I have to pass all these data via device tree (then I can skip type
>>> field)
>>> 1 . something like below:
>>> `opal-result-entry` -> src, dest, size for OPAL dump (and it can have
>>> multiple entries)
>>> `kernel-result-entry` -> src, dest, size for kernel DUMP (and it can have
>>> multiple entries)
>>>
>>> But then as Oliver pointed, this is not static data. Once kernel consumes
>>> it, data becomes
>>> invalid. So device tree may not be the right place.
>>>
>>> -OR-
>>>
>>> 2. Pass entry size via device tree (like kernel-result-table-size,
>>> opal-result-table-size).
>>> Kernel will use API to get the data. But here kernel has to pass the type
>>> information
>>> to get the data.
>>>
>>> -OR-
>>>
>>> 3. Use structure to pass entire data. Device tree will contain total
>>> result-table size.
>>> Kernel will make API call to get data.
>>
>> Does it need any device tree?
>
> Yes. At least we should indicate we its MPIPL boot and size of result table.
>
>
>> How about OPAL calls to retrieve the
>> preserved ranges?
>
> Yeah. We can consider OPAL API.
>
[...]
>>
>> So you retrieve you kernel metadata pointer that's in dest memory,
>> and that contains all your saved ranges and other metadata.
>
> During MPIPL, hostboot takes care of moving data from source to destination memory.
> Hostboot point of view its just memory preserving (copying memory from source to
> destination). It doesn't differentiate whether its kernel memory -OR- OPAL memory.
> Once it completes it boots OPAL and passes result table to OPAL. This result table contains both OPAL and kernel entries. We have to pass this information to kernel. Also kernel needs a way
> to differentiate between OPAL and kernel entries.
>
>
> So here is my new interface proposal. Does this looks ok?
>
> Registration :
> opal_fadump_manage(u8 cmd, u64 src, u64 dest, u64 size)
> cmd : 0x01 -> OPAL_FADUMP_REGISTER , kernel should pass all 3 parameters
> 0x02 -> OPAL_FADUMP_UNREGISTER, kernel should pass src and dest parameter
> if src = dest = 0, then UNREGISTER all kernel entries
> 0x03 -> OPAL_FADUMP_INVALIDATE, No parameter is required.
>
> - OPAL will add new entry to MDST/MDST table
> - If system crashes before kernel adds all entry, we will have partial kernel dump
> - Kernel should handle partial dumps
>
> - Kernel has to maintain its metadata. If in future, multiple driver wants different dumps
> they have to go through same interface (CONFIG_FADUMP).
>
The interface looks nice but is it not worth having a field that differentiates different callers
of opal_fadump/mpipl_manage() from kernel to avoid enforcing a single owner to this call in
kernel that has to keep a tab of all callers to this function with metadata..
More information about the Skiboot
mailing list