[PATCH v2 3/8] powerpc/mm: Separate ibm, dynamic-memory data from DT format

Nathan Fontenot nfont at linux.vnet.ibm.com
Tue Nov 14 01:51:59 AEDT 2017


On 11/12/2017 06:43 AM, Michael Ellerman wrote:
> Hi Nathan,
> 
> Nathan Fontenot <nfont at linux.vnet.ibm.com> writes:
>> diff --git a/arch/powerpc/kernel/prom.c b/arch/powerpc/kernel/prom.c
>> index f83056297441..917184c13890 100644
>> --- a/arch/powerpc/kernel/prom.c
>> +++ b/arch/powerpc/kernel/prom.c
>> @@ -454,92 +455,93 @@ static int __init early_init_dt_scan_chosen_ppc(unsigned long node,
> ...
>>  
>>  static int __init early_init_dt_scan_memory_ppc(unsigned long node,
>>  						const char *uname,
>>  						int depth, void *data)
>>  {
>> +	int rc;
>> +
>>  	if (depth == 1 &&
>> -	    strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0)
>> -		return early_init_dt_scan_drconf_memory(node);
>> +	    strcmp(uname, "ibm,dynamic-reconfiguration-memory") == 0) {
>> +		rc = init_drmem_lmbs(node);
>> +		if (!rc)
>> +			early_init_dt_scan_drmem_lmbs(node);
>> +
>> +		return rc;
>> +	}
>>  	
>>  	return early_init_dt_scan_memory(node, uname, depth, data);
>>  }
> 
> There's one bug in here which is that you return rc as returned by
> init_drmem_lmbs(). Returning non-zero from these scan routines
> terminates the scan, which means if anything goes wrong in
> init_drmem_lmbs() we may not call early_init_dt_scan_memory()
> in which case we won't have any memory at all.
> 

I didn't know this would stop scanning the device tree, thanks for letting me know.

> I say "may not" because it depends on the order of the nodes in the
> device tree whether you hit the memory nodes or the dynamic reconfig mem
> info first. And the order of the nodes in the device tree is arbitrary
> so we can't rely on that.
> 
> 
>> diff --git a/arch/powerpc/mm/drmem.c b/arch/powerpc/mm/drmem.c
>> new file mode 100644
>> index 000000000000..8ad7cf36b2c4
>> --- /dev/null
>> +++ b/arch/powerpc/mm/drmem.c
>> @@ -0,0 +1,84 @@
> ...
>> +
>> +int __init init_drmem_lmbs(unsigned long node)
>> +{
>> +	struct drmem_lmb *lmb;
>> +	const __be32 *prop;
>> +	int prop_sz;
>> +	u32 len;
>> +
>> +	prop = of_get_flat_dt_prop(node, "ibm,lmb-size", &len);
>> +	if (!prop || len < dt_root_size_cells * sizeof(__be32))
>> +		return -1;
>> +
>> +	drmem_info->lmb_size = dt_mem_next_cell(dt_root_size_cells, &prop);
>> +
>> +	prop = of_get_flat_dt_prop(node, "ibm,dynamic-memory", &len);
>> +	if (!prop || len < dt_root_size_cells * sizeof(__be32))
>> +		return -1;
>> +
>> +	drmem_info->n_lmbs = of_read_number(prop++, 1);
>> +	prop_sz = drmem_info->n_lmbs * sizeof(struct of_drconf_cell)
>> +		  + sizeof(__be32);
>> +	if (prop_sz < len)
>> +		return -1;
>> +
>> +	drmem_info->lmbs = alloc_bootmem(drmem_info->n_lmbs * sizeof(*lmb));
>> +	if (!drmem_info->lmbs)
>> +		return -1;
> 
> The bigger problem we have though is that you're trying to allocate
> memory, in order to find out what memory we have :)
> 
> I suspect it works in some cases because you hit the memory at 0 node first
> in the device tree, and add that memory to memblock, which means
> init_drmem_lmbs() *can* allocate memory, and everything's good.
> 
> But if we hit init_drmem_lmbs() first, or there's not enough space in
> memory at 0, then allocating memory in order to discover memory is not
> going to work.
> 
> I'm not sure what the best solution is. One option would be to
> statically allocate some space, so that we can discover some of the LMBs
> without doing an allocation. But we wouldn't be able to guarantee that
> we had enough space i nthat static allocation, so the code would need to
> handle doing that and then potentially finding more LMBs later using a
> dynamic alloc. So that could be a bit messy.
> 
> The other option would be for the early_init_dt_scan_drmem_lmbs() code
> to still work on the device tree directly, rather than using the
> drmem_info array. That would make for uglier code, but may be necessary.
> 

I have been thinking about my initial approach, and the more I look at it
the more I do not like trying to do the bootmem allocation. As you mention
there is just too much that could go wrong with that.

I have started looking at a design where an interface similar to
walk_memory_range() is used for the prom and numa code so we do not have to rely
on making the allocation for the lmb array early in boot. The lmb array
could then be allocated in the late_initcall in drmem.c at which point
the general kernel allocator is available.

I'm still working on getting this coded up and when send out a new patch set
once it's ready unless anyone has objections to this approach.

-Nathan



More information about the Linuxppc-dev mailing list