[PATCH v5 1/3] powerpc: make fadump resilient with memory add/remove events

Michael Ellerman mpe at ellerman.id.au
Wed Nov 22 23:52:10 AEDT 2023


Sourabh Jain <sourabhjain at linux.ibm.com> writes:
> On 22/11/23 10:47, Michael Ellerman wrote:
>> Aneesh Kumar K V <aneesh.kumar at linux.ibm.com> writes:
>> ...
>>> I am not sure whether we need to add all the complexity to enable supporting different fadump kernel
>>> version. Is that even a possible use case with fadump? Can't we always assume that with fadump the
>>> crash kernel and fadump kernel will be same version?
>> How sure are we of that?
>>
>> Don't we go through grub when we boot into the 2nd kernel. And so
>> couldn't it choose to boot a different kernel, for whatever reason.
>>
>> I don't think we need to support different pt_reg / cpumask sizes, but
>> requiring the exact same kernel version is too strict I think.
> Agree.
>>
>> But maybe I'm wrong. Would be good to hear what distro folks think.
>
> How about checking fadump crash info header compatibility in the 
> following way?
>
> static bool is_fadump_header_compatible(struct fadump_crash_info_header 
> *fdh)
> {
>      if (fdh->magic_number == FADUMP_CRASH_INFO_MAGIC_OLD) {
>          pr_err("Old magic number, can't process the dump.");
>          return false;
>      }
>
>      if (fdh->magic_number != FADUMP_CRASH_INFO_MAGIC) {
>          pr_err("Fadump header is corrupted.");
>          return false;
>      }
>
>      /*
>       * If the kernel version of the first/crashed kernel and the 
> second/fadump
>       * kernel is not same, then only collect the dump if the size of all
>       * non-primitive type members of the fadump header is the same 
> across kernels.
>       */
>      if (strcmp(fdh->kernel_version, init_uts_ns.name.release)) {
 
I don't think the kernel version check is necessary?

>          if (fdh->pt_regs_sz != sizeof(struct pt_regs) || fdh->cpu_mask_sz != sizeof(struct cpumask)) {
>              pr_err("Fadump header size mismatch.\n")
>              return false;

Yeah I think that works.

>          } else
>              pr_warn("Kernel version mismatch; dump data is unreliable.\n");
>      }
>
>      return true;
> }
>
> And the new fadump crash info header will be: As suggested by Hari.
>
> /* fadump crash info structure */
> struct fadump_crash_info_header {
>      u64        magic_number;
> +  u32        version;
 
Do we need version? We're effectively using the magic_number as a
version already. Having an actual version would allow us to make
backward compatible changes in future, but it's not clear we'll need or
want to do that.

>      u32        crashing_cpu;
>      u64        elfcorehdr_addr;
> +  u64        elfcorehdr_size;
> +  u64        vmcoreinfo_raddr;
> +  u64        vmcoreinfo_size;
> +  u8          kernel_version[__NEW_UTS_LEN + 1];
> +  u32        pt_regs_sz;
>      struct pt_regs    regs;
> +  u32        cpu_mask_sz;
 
Typically you would put all the size fields before the variable sized
fields, because otherwise the later size fields may not be where you
expect them to be. But because we're bailing out if any of the sizes
don't match it doesn't actually matter.

>      struct cpumask    cpu_mask;
> };

The other issue is endian. I assume we're just declaring that the
1st/2nd kernel must be the same endian? We should still make that
explicit though.

To make it clearer that this struct is somewhat an ABI, I think we
should move the definition into arch/powerpc/include/uapi/asm/fadump.h

We don't expect userspace to actually use the header, but it will
hopefully remind everyone that the struct needs to be changed with care :)

cheers


More information about the Linuxppc-dev mailing list