[PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware

Wed Sep 11 00:05:36 AEST 2019

Hari Bathini <hbathini at linux.ibm.com> writes:
> On 09/09/19 9:03 PM, Oliver O'Halloran wrote:
>> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini at linux.ibm.com> wrote:
>>> On 04/09/19 5:50 PM, Michael Ellerman wrote:
>>>> Hari Bathini <hbathini at linux.ibm.com> writes:
>>> [...]
>>>
>>>>> +/*
>>>>> + * CPU state data is provided by f/w. Below are the definitions
>>>>> + * provided in HDAT spec. Refer to latest HDAT specification for
>>>>> + * any update to this format.
>>>>> + */
>>>>
>>>> How is this meant to work? If HDAT ever changes the format they will
>>>> break all existing kernels in the field.
>>>>
>>>>> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>>>
>>> Changes are not expected here. But this is just to cover for such scenario,
>>> if that ever happens.
>> 
>> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
>> As far as I can tell the values you've assumed here are chip-specific,
>> non-architected SPR numbers that come from an array buried somewhere
>> in the SBE codebase. I don't believe you for a second when you say
>> that this will never change.
>
> At least, the understanding is that this numbers not change across processor
> generations. If something changes, it is supposed to be handled in SBE. Also,
> I am told this numbers would be listed in the HDAT Spec. Not sure if that
> happened yet though. Vasant, you have anything to add?

That doesn't help much because the HDAT spec is not public.

The point is with the code written the way it is, these values *must
not* change, or else all existing kernels will be broken, which is not
acceptable.

>>> Also, I think it is a bit far-fetched to error out if versions mismatch.
>>> Warning and proceeding sounds worthier because the changes are usually
>>> backward compatible, if and when there are any. Will update accordingly...
>> 
>> Literally the only reason I didn't drop the CPU DATA parts of the OPAL
>> MPIPL series was because I assumed the kernel would do the sensible
>> thing and reject or ignore the structure if it did not know how to
>> parse the data.
>
> I think, the changes if any, would have to be backward compatible for the sake
> of sanity.

People need to understand that this is an ABI between firmware and
in-the-field distribution kernels which are only updated at customer
discretion, or possibly never.

Any changes *must be* backward compatible.

Looking at the header struct:

+struct hdat_fadump_thread_hdr {
+	__be32  pir;
+	/* 0x00 - 0x0F - The corresponding stop state of the core */
+	u8      core_state;
+	u8      reserved[3];

You have those 3 reserved bytes, so a future revision could repurpose
one of those as a flag to indicate a new format. And/or the hdr could be
made bigger and new kernels could be taught to look for new things in
the space after the hdr but before the reg entries.

So I think there is a reasonable mechanism for extending the format in
future, but my point is people must understand that this is an ABI and
changes must be made accordingly.

> Even if they are not, we are better off exporting the /proc/vmcore
> with a warning and some crazy CPU register data (if parsing goes alright) than
> no dump at all? 

If it's just a case of reg entries that we don't recognise then yes I
think it would be OK to just skip them and continue exporting. But if
there's any deeper misunderstanding of the format then we should bail
out.

I notice now that you don't do anything in opal_fadump_set_regval_regnum()
if you are passed a register we don't understand, so that probably needs
fixing.

cheers