[PATCH v4 4/7] powerpc/fadump: Reservationless firmware assisted dump

Hari Bathini hbathini at linux.vnet.ibm.com
Mon Apr 23 22:53:35 AEST 2018



On Friday 20 April 2018 10:34 AM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
>
> One of the primary issues with Firmware Assisted Dump (fadump) on Power
> is that it needs a large amount of memory to be reserved. On large
> systems with TeraBytes of memory, this reservation can be quite
> significant.
>
> In some cases, fadump fails if the memory reserved is insufficient, or
> if the reserved memory was DLPAR hot-removed.
>
> In the normal case, post reboot, the preserved memory is filtered to
> extract only relevant areas of interest using the makedumpfile tool.
> While the tool provides flexibility to determine what needs to be part
> of the dump and what memory to filter out, all supported distributions
> default this to "Capture only kernel data and nothing else".
>
> We take advantage of this default and the Linux kernel's Contiguous
> Memory Allocator (CMA) to fundamentally change the memory reservation
> model for fadump.
>
> Instead of setting aside a significant chunk of memory nobody can use,
> this patch uses CMA instead, to reserve a significant chunk of memory
> that the kernel is prevented from using (due to MIGRATE_CMA), but
> applications are free to use it. With this fadump will still be able
> to capture all of the kernel memory and most of the user space memory
> except the user pages that were present in CMA region.
>
> Essentially, on a P9 LPAR with 2 cores, 8GB RAM and current upstream:
> [root at zzxx-yy10 ~]# free -m
>                total        used        free      shared  buff/cache   available
> Mem:           7557         193        6822          12         541        6725
> Swap:          4095           0        4095
>
> With this patch:
> [root at zzxx-yy10 ~]# free -m
>                total        used        free      shared  buff/cache   available
> Mem:           8133         194        7464          12         475        7338
> Swap:          4095           0        4095
>
> Changes made here are completely transparent to how fadump has
> traditionally worked.
>
> Thanks to Aneesh Kumar and Anshuman Khandual for helping us understand
> CMA and its usage.
>
> TODO:
> - Handle case where CMA reservation spans nodes.
>
> Signed-off-by: Ananth N Mavinakayanahalli <ananth at linux.vnet.ibm.com>
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
>   arch/powerpc/kernel/fadump.c |  120 ++++++++++++++++++++++++++++++++++++------
>   1 file changed, 103 insertions(+), 17 deletions(-)
>
> diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
> index 16b3e8c5cae0..7f76924ab190 100644
> --- a/arch/powerpc/kernel/fadump.c
> +++ b/arch/powerpc/kernel/fadump.c
> @@ -34,6 +34,7 @@
>   #include <linux/crash_dump.h>
>   #include <linux/kobject.h>
>   #include <linux/sysfs.h>
> +#include <linux/cma.h>
>
>   #include <asm/debugfs.h>
>   #include <asm/page.h>
> @@ -45,11 +46,57 @@
>   static struct fw_dump fw_dump;
>   static struct fadump_mem_struct fdm;
>   static const struct fadump_mem_struct *fdm_active;
> +static struct cma *fadump_cma;
>
>   static DEFINE_MUTEX(fadump_mutex);
>   struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
>   int crash_mem_ranges;
>
> +/*
> + * fadump_cma_reserve() - reserve area for fadump memory reservation
> + *
> + * This function reserves memory from early allocator. It should be
> + * called by arch specific code once the memblock allocator
> + * has been activated.
> + */
> +int __init fadump_cma_reserve(void)
> +{
> +	unsigned long long base, size;
> +	int rc;
> +
> +	if (!fw_dump.fadump_enabled)
> +		return 0;
> +
> +	base = fw_dump.reserve_dump_area_start;
> +	size = fw_dump.reserve_dump_area_size;

Mahesh, How about moving sections around instead:

Old:
   1. cpu state data region
   2. hpte region
   3. real memory region

New:
   2. cpu state data region
   3. hpte region
   1. real memory region

and using only boot memory size for cma reserve. The other regions, 
crashinfo header
& elfcorehdrs can still use memblock_reserve.

This achieves two things. One, ensures we don't waste memory in alignment
as cma uses hugepage(16MB)/maxorder as default alignment (we need to 
ensure boot
memory size is aligned by hugepage(16MB)/maxorder though). Two, we don't 
have to
move around meta data from end to start (patch 1/7)

To differentiate the old and new section order, we can overload crash 
info magic
(FADUMPINF -> FADUMPIV2), I guess. That differentiation may be needed for
re-registering after dump capture..

> +	pr_debug("Original reserve area base %ld, size %ld\n",
> +				(unsigned long)base >> 20,
> +				(unsigned long)size >> 20);
> +	if (!size)
> +		return 0;
> +
> +	rc = cma_declare_contiguous(base, size, 0, 0, 0, false,
> +						"fadump_cma", &fadump_cma);

Compilation fails when CONFIG_CMA is not set. A fallback when CONFIG_CMA
is not set or dependency enforced for FA_DUMP config option seems to be 
missing..

Also, considering we already deduce the base by looking for holes in 
fadump code, we could
have a 'fixed' ('true' for 6th parameter) cma region? Again, we have to 
ensure CMA alignment
for boot memory size in fadump_calculate_reserve_size() for doing all 
this seamlessly..

> +	if (rc) {
> +		printk(KERN_ERR "fadump: Failed to reserve cma area for "
> +				"firmware-assisted dump, %d\n", rc);
> +		fw_dump.reserve_dump_area_size = 0;
> +		return 0;
> +	}
> +	/*
> +	 * So we now have cma area reserved for fadump. base may be different
> +	 * from what we requested.
> +	 */
> +	fw_dump.reserve_dump_area_start = cma_get_base(fadump_cma);
> +	fw_dump.reserve_dump_area_size = cma_get_size(fadump_cma);
> +	printk("Reserved %ldMB cma area at %ldMB for firmware-assisted dump "
> +			"(System RAM: %ldMB)\n",
> +			cma_get_size(fadump_cma) >> 20,
> +			(unsigned long)cma_get_base(fadump_cma) >> 20,
> +			(unsigned long)(memblock_phys_mem_size() >> 20));
> +	return 1;
> +}
> +
>   /* Scan the Firmware Assisted dump configuration details. */
>   int __init early_init_dt_scan_fw_dump(unsigned long node,
>   			const char *uname, int depth, void *data)
> @@ -496,8 +543,9 @@ int __init fadump_reserve_mem(void)
>   		pr_info("Number of kernel Dump sections: %d\n",
>   			be16_to_cpu(fdm_active->header.dump_num_sections));
>   		fw_dump.fadumphdr_addr = get_fadump_metadata_base(fdm_active);
> -		pr_debug("fadumphdr_addr = %p\n",
> -				(void *) fw_dump.fadumphdr_addr);
> +		pr_debug("fadumphdr_addr = %pa\n", &fw_dump.fadumphdr_addr);
> +		fw_dump.reserve_dump_area_start = base;
> +		fw_dump.reserve_dump_area_size = size;
>   	} else {
>   		size = get_fadump_area_size();
>
> @@ -514,21 +562,10 @@ int __init fadump_reserve_mem(void)
>   			    !memblock_is_region_reserved(base, size))
>   				break;
>   		}
> -		if ((base > (memory_boundary - size)) ||
> -		    memblock_reserve(base, size)) {
> -			pr_err("Failed to reserve memory\n");
> -			return 0;
> -		}
> -
> -		pr_info("Reserved %ldMB of memory at %ldMB for firmware-"
> -			"assisted dump (System RAM: %ldMB)\n",
> -			(unsigned long)(size >> 20),
> -			(unsigned long)(base >> 20),
> -			(unsigned long)(memblock_phys_mem_size() >> 20));
> +		fw_dump.reserve_dump_area_start = base;
> +		fw_dump.reserve_dump_area_size = size;
> +		return fadump_cma_reserve();
>   	}
> -
> -	fw_dump.reserve_dump_area_start = base;
> -	fw_dump.reserve_dump_area_size = size;
>   	return 1;
>   }
>
> @@ -1191,6 +1228,39 @@ static unsigned long init_fadump_header(unsigned long addr)
>   	return addr;
>   }
>
> +static unsigned long allocate_metadata_area(void)
> +{
> +	int nr_pages;
> +	unsigned long size;
> +	struct page *page = NULL;
> +
> +	/*
> +	 * Check if fadump cma region is activated.
> +	 * fadump_cma->count == 0 means cma activation has failed. This means
> +	 * that the fadump reserved memory now will not be visible/available
> +	 * for user applications to use. It will be as good as old fadump
> +	 * behaviour of blocking this memory chunk from production system
> +	 * use. CMA activation failure does not mean that fadump will not
> +	 * work. Will continue to setup fadump.
> +	 */
> +	if (!fadump_cma || !cma_get_size(fadump_cma)) {
> +		pr_warn("fadump cma region activation failed.\n");
> +		return 0;
> +	}
> +
> +	size = get_fadump_metadata_size();
> +	nr_pages = ALIGN(size, PAGE_SIZE) >> PAGE_SHIFT;
> +	pr_info("Fadump metadata size = %ld (nr_pages = %d)\n", size, nr_pages);
> +
> +	page = cma_alloc(fadump_cma, nr_pages, 0, GFP_KERNEL);
> +	if (page) {
> +		pr_debug("Allocated fadump metadata area at %ldMB (cma)\n",
> +				(unsigned long)page_to_phys(page) >> 20);
> +		return page_to_phys(page);
> +	}
> +	return 0;
> +}
> +

We shouldn't be needing this function with the above mentioned change..

Thanks
Hari



More information about the Linuxppc-dev mailing list