[PATCH v5 21/31] powernv/fadump: process architected register state data provided by firmware

Hari Bathini hbathini at linux.ibm.com
Wed Sep 11 02:10:44 AEST 2019



On 10/09/19 7:35 PM, Michael Ellerman wrote:
> Hari Bathini <hbathini at linux.ibm.com> writes:
>> On 09/09/19 9:03 PM, Oliver O'Halloran wrote:
>>> On Mon, Sep 9, 2019 at 11:23 PM Hari Bathini <hbathini at linux.ibm.com> wrote:
>>>> On 04/09/19 5:50 PM, Michael Ellerman wrote:
>>>>> Hari Bathini <hbathini at linux.ibm.com> writes:
>>>> [...]
>>>>
>>>>>> +/*
>>>>>> + * CPU state data is provided by f/w. Below are the definitions
>>>>>> + * provided in HDAT spec. Refer to latest HDAT specification for
>>>>>> + * any update to this format.
>>>>>> + */
>>>>>
>>>>> How is this meant to work? If HDAT ever changes the format they will
>>>>> break all existing kernels in the field.
>>>>>
>>>>>> +#define HDAT_FADUMP_CPU_DATA_VERSION                1
>>>>
>>>> Changes are not expected here. But this is just to cover for such scenario,
>>>> if that ever happens.
>>>
>>> The HDAT spec doesn't define the SPR numbers for NIA, MSR and the CR.
>>> As far as I can tell the values you've assumed here are chip-specific,
>>> non-architected SPR numbers that come from an array buried somewhere
>>> in the SBE codebase. I don't believe you for a second when you say
>>> that this will never change.
>>
>> At least, the understanding is that this numbers not change across processor
>> generations. If something changes, it is supposed to be handled in SBE. Also,
>> I am told this numbers would be listed in the HDAT Spec. Not sure if that
>> happened yet though. Vasant, you have anything to add?
> 
> That doesn't help much because the HDAT spec is not public.
> 
> The point is with the code written the way it is, these values *must
> not* change, or else all existing kernels will be broken, which is not
> acceptable.

Yeah. It is absurd to error out just by looking at version number...

> 
>>>> Also, I think it is a bit far-fetched to error out if versions mismatch.
>>>> Warning and proceeding sounds worthier because the changes are usually
>>>> backward compatible, if and when there are any. Will update accordingly...
>>>
>>> Literally the only reason I didn't drop the CPU DATA parts of the OPAL
>>> MPIPL series was because I assumed the kernel would do the sensible
>>> thing and reject or ignore the structure if it did not know how to
>>> parse the data.
>>
>> I think, the changes if any, would have to be backward compatible for the sake
>> of sanity.
> 
> People need to understand that this is an ABI between firmware and
> in-the-field distribution kernels which are only updated at customer
> discretion, or possibly never.
> 
> Any changes *must be* backward compatible.
> 
> Looking at the header struct:
> 
> +struct hdat_fadump_thread_hdr {
> +	__be32  pir;
> +	/* 0x00 - 0x0F - The corresponding stop state of the core */
> +	u8      core_state;
> +	u8      reserved[3];
> 
> You have those 3 reserved bytes, so a future revision could repurpose
> one of those as a flag to indicate a new format. And/or the hdr could be
> made bigger and new kernels could be taught to look for new things in
> the space after the hdr but before the reg entries.
> 
> So I think there is a reasonable mechanism for extending the format in
> future, but my point is people must understand that this is an ABI and
> changes must be made accordingly.

True. The folks who make the changes to this format should be aware that
breaking kernel ABI is not going to be pretty and I think they are :)

> 
>> Even if they are not, we are better off exporting the /proc/vmcore
>> with a warning and some crazy CPU register data (if parsing goes alright) than
>> no dump at all? 
> 
> If it's just a case of reg entries that we don't recognise then yes I
> think it would be OK to just skip them and continue exporting. But if
> there's any deeper misunderstanding of the format then we should bail
> out.

Sure. Will try and fix that by first trying to do a sanity check on the
fields that are needed for parsing the data and proceed with a warning if
nothing weird is detected and fallback to just appending crashing cpu as
done in patch 16/31, if anything weird is observed. That should hopefully
take care of all cases in the best possible way..

> 
> I notice now that you don't do anything in opal_fadump_set_regval_regnum()
> if you are passed a register we don't understand, so that probably needs
> fixing.

f/w provides about 100 odd registers in the CPU state data. Most of them
pt_regs doesn't care about. So, opal_fadump_set_regval_regnum is happy as
long as it find the registers listed in it. Unless, pt_regs changes, we
can stick to this and ignore the rest of them?

- Hari



More information about the Linuxppc-dev mailing list