[PATCH v7 1/3] powerpc: make fadump resilient with memory add/remove events
Sourabh Jain
sourabhjain at linux.ibm.com
Mon Feb 19 21:32:24 AEDT 2024
Hello Hari,
On 23/01/24 15:39, Hari Bathini wrote:
>
>
> On 11/01/24 7:39 pm, Sourabh Jain wrote:
>> Due to changes in memory resources caused by either memory hotplug or
>> online/offline events, the elfcorehdr, which describes the CPUs and
>> memory of the crashed kernel to the kernel that collects the dump (known
>> as second/fadump kernel), becomes outdated. Consequently, attempting
>> dump collection with an outdated elfcorehdr can lead to failed or
>> inaccurate dump collection.
>>
>> Memory hotplug or online/offline events is referred as memory add/remove
>> events in reset of the commit message.
>>
>> The current solution to address the aforementioned issue is as follows:
>> Monitor memory add/remove events in userspace using udev rules, and
>> re-register fadump whenever there are changes in memory resources. This
>> leads to the creation of a new elfcorehdr with updated system memory
>> information.
>>
>> There are several notable issues associated with re-registering fadump
>> for every memory add/remove events.
>>
>> 1. Bulk memory add/remove events with udev-based fadump re-registration
>> can lead to race conditions and, more importantly, it creates a wide
>> window during which fadump is inactive until all memory add/remove
>> events are settled.
>> 2. Re-registering fadump for every memory add/remove event is
>> inefficient.
>> 3. The memory for elfcorehdr is allocated based on the memblock regions
>> available during early boot and remains fixed thereafter.
>> However, if
>> elfcorehdr is later recreated with additional memblock regions, its
>> size will increase, potentially leading to memory corruption.
>>
>> Address the aforementioned challenges by shifting the creation of
>> elfcorehdr from the first kernel (also referred as the crashed kernel),
>> where it was created and frequently recreated for every memory
>> add/remove event, to the fadump kernel. As a result, the elfcorehdr only
>> needs to be created once, thus eliminating the necessity to re-register
>> fadump during memory add/remove events.
>>
>> At present, the first kernel prepares the fadump header and stores it in
>> the fadump reserved area. The fadump header contains start address of
>> the elfcorehd, crashing CPU details, etc. In the event of first kernel
>
> "elfcorehd" used instead of "elfcorehdr" at a couple of places..
Fixed it now. Thanks.
>
>> crash, the second/fadump boots and access the fadump header prepared by
>> first kernel and do the following in a platform-specific function
>> [rtas|opal]_fadump_process:
>>
>> At present, the first kernel is responsible for preparing the fadump
>> header and storing it in the fadump reserved area. The fadump header
>> includes the start address of the elfcorehd, crashing CPU details, and
>> other relevant information. In the event of a crash in the first kernel,
>> the second/fadump boots and accesses the fadump header prepared by the
>> first kernel. It then performs the following steps in a
>> platform-specific function [rtas|opal]_fadump_process:
>>
>> 1. Sanity check for fadump header
>> 2. Update CPU notes in elfcorehdr
>> 3. Set the global variable elfcorehdr_addr to the address of the
>> fadump header's elfcorehdr. For vmcore module to process it later
>> on.
>>
>> Along with the above, update the setup_fadump()/fadump.c to create
>> elfcorehdr in second/fadump kernel.
>>
>> Section below outlines the information required to create the elfcorehdr
>> and the changes made to make it available to the fadump kernel if it's
>> not already.
>>
>> To create elfcorehdr, the following crashed kernel information is
>> required: CPU notes, vmcoreinfo, and memory ranges.
>>
>> At present, the CPU notes are already prepared in the fadump kernel, so
>> no changes are needed in that regard. The fadump kernel has access to
>> all crashed kernel memory regions, including boot memory regions that
>> are relocated by firmware to fadump reserved areas, so no changes for
>> that either. However, it is necessary to add new members to the fadump
>> header, i.e., the 'fadump_crash_info_header' structure, in order to pass
>> the crashed kernel's vmcoreinfo address and its size to fadump kernel.
>>
>> In addition to the vmcoreinfo address and size, there are a few other
>> attributes also added to the fadump_crash_info_header structure.
>>
>> 1. version:
>> It stores the fadump header version, which is currently set to 1.
>> This provides flexibility to update the fadump crash info header in
>> the future without changing the magic number. For each change in the
>> fadump header, the version will be increased. This will help the
>> updated kernel determine how to handle kernel dumps from older
>> kernels. The magic number remains relevant for checking fadump
>> header
>> corruption.
>>
>> 2. elfcorehdr_size:
>> since elfcorehdr is now prepared in the fadump/second kernel and
>> it is not part of the reserved area, this attribute is needed to
>> track the memory allocated for elfcorehdr to do the deallocation
>> properly.
>>
>> 3. pt_regs_sz/cpu_mask_sz:
>> Store size of pt_regs and cpu_mask strucutre in first kernel. These
>> attributes are used avoid processing the dump if the sizes of
>> pt_regs
>> and cpu_mask are not the same across the crashed and fadump kernel.
>>
>> Note: if either first/crashed kernel or second/fadump kernel do not have
>> the changes introduced here then kernel fail to collect the dump and
>> prints relevant error message on the console.
>>
>> Signed-off-by: Sourabh Jain <sourabhjain at linux.ibm.com>
>> Cc: Aditya Gupta <adityag at linux.ibm.com>
>> Cc: Aneesh Kumar K.V <aneesh.kumar at kernel.org>
>> Cc: Hari Bathini <hbathini at linux.ibm.com>
>> Cc: Mahesh Salgaonkar <mahesh at linux.ibm.com>
>> Cc: Michael Ellerman <mpe at ellerman.id.au>
>> Cc: Naveen N Rao <naveen at kernel.org>
>> ---
>> arch/powerpc/include/asm/fadump-internal.h | 31 +-
>> arch/powerpc/kernel/fadump.c | 355 +++++++++++--------
>> arch/powerpc/platforms/powernv/opal-fadump.c | 18 +-
>> arch/powerpc/platforms/pseries/rtas-fadump.c | 23 +-
>> 4 files changed, 242 insertions(+), 185 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/fadump-internal.h
>> b/arch/powerpc/include/asm/fadump-internal.h
>> index 27f9e11eda28..a632e9708610 100644
>> --- a/arch/powerpc/include/asm/fadump-internal.h
>> +++ b/arch/powerpc/include/asm/fadump-internal.h
>> @@ -42,13 +42,40 @@ static inline u64 fadump_str_to_u64(const char *str)
>> #define FADUMP_CPU_UNKNOWN (~((u32)0))
>> -#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPINF")
>> +/*
>> + * The introduction of new fields in the fadump crash info header has
>> + * led to a change in the magic key from `FADMPINF` to `FADMPSIG` for
>> + * identifying a kernel crash from an old kernel.
>> + *
>> + * To prevent the need for further changes to the magic number in the
>> + * event of future modifications to the fadump crash info header, a
>> + * version field has been introduced to track the fadump crash info
>> + * header version.
>> + *
>> + * Consider a few points before adding new members to the fadump
>> crash info
>> + * header structure:
>> + *
>> + * - Append new members; avoid adding them in between.
>> + * - Non-primitive members should have a size member as well.
>> + * - For every change in the fadump header, increment the
>> + * fadump header version. This helps the updated kernel decide
>> how to
>> + * handle kernel dumps from older kernels.
>> + */
>> +#define FADUMP_CRASH_INFO_MAGIC_OLD fadump_str_to_u64("FADMPINF")
>> +#define FADUMP_CRASH_INFO_MAGIC fadump_str_to_u64("FADMPSIG")
>> +#define FADUMP_HEADER_VERSION 1
>> /* fadump crash info structure */
>> struct fadump_crash_info_header {
>> u64 magic_number;
>> - u64 elfcorehdr_addr;
>> + u32 version;
>> u32 crashing_cpu;
>
>> + u64 elfcorehdr_addr;
>> + u64 elfcorehdr_size;
>
> fadump_crash_info_header structure is to share info across reboots.
> Now that elfcorehdr is prepared in second kernel and also dump capture
> of older kernel is not supported, get rid of elfcorehdr_addr &
> elfcorehdr_size from fadump_crash_info_header structure and put them
> in fw_dump structure instead..
Including elfcorehdr_addr and elfcorehdr_size in the fw_dump structure
removes the
dependency on address translation from physical to virtual."
I have included the above suggestion in v8.
https://lore.kernel.org/all/20240217072004.148293-1-sourabhjain@linux.ibm.com/
Thanks for the suggestion.
- Sourabh
More information about the Linuxppc-dev
mailing list