[RFC FIX PATCH v0] powerpc,numa: Fix memory_hotplug_max()

Bharata B Rao bharata at linux.vnet.ibm.com
Sat Apr 9 20:14:31 AEST 2016


On Fri, Apr 08, 2016 at 12:27:44AM -0500, Nathan Fontenot wrote:
> On 04/06/2016 04:44 AM, Bharata B Rao wrote:
> > memory_hotplug_max() uses hot_add_drconf_memory_max() to get maxmimum
> > addressable memory by referring to ibm,dyanamic-memory property. There
> > are three problems with the current approach:
> > 
> > 1 hot_add_drconf_memory_max() assumes that ibm,dynamic-memory includes
> >   all the LMBs of the guest, but that is not true for PowerKVM which
> >   populates only DR LMBs (LMBs that can be hotplugged/removed) in that
> >   property.
> > 2 hot_add_drconf_memory_max() multiplies lmb-size with lmb-count to arrive
> >   at the max possible address. Since ibm,dynamic-memory doesn't include
> >   RMA LMBs, the address thus obtained will be less than the actual max
> >   address. For example, if max possible memory size is 32G, with lmb-size
> >   of 256MB there can be 127 LMBs in ibm,dynamic-memory (1 LMB for RMA
> >   which won't be present here).  hot_add_drconf_memory_max() would then
> >   return the max addressable memory as 127 * 256MB = 31.75GB, the max
> >   address should have been 32G which is what ibm,lrdr-capacity shows.
> > 3 In PowerKVM, there can be a gap between the end of boot time RAM and
> >   beginning of hotplug RAM area. So just multiplying lmb-count with
> >   lmb-size will not provide the correct max possible address for PowerKVM.
> > 
> > This patch fixes 1 by using ibm,lrdr-capacity property to return the max
> > addressable memory whenever the property is present. Then it fixes 2 & 3
> > by fetching the address of the last LMB in ibm,dynamic-memory property.
> > 
> > NOTE: There are some unnecessary changes in the patch because of converting
> > spaces to tabs w/o which checkpatch.pl complains.
> > 
> > Signed-off-by: Bharata B Rao <bharata at linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/mm/numa.c | 29 ++++++++++++++++++++++-------
> >  1 file changed, 22 insertions(+), 7 deletions(-)
> > 
> > diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> > index 669a15e..57d5877 100644
> > --- a/arch/powerpc/mm/numa.c
> > +++ b/arch/powerpc/mm/numa.c
> > @@ -1164,17 +1164,32 @@ int hot_add_scn_to_nid(unsigned long scn_addr)
> >  static u64 hot_add_drconf_memory_max(void)
> >  {
> >          struct device_node *memory = NULL;
> > -        unsigned int drconf_cell_cnt = 0;
> > -        u64 lmb_size = 0;
> > +	struct device_node *dn = NULL;
> > +	unsigned int drconf_cell_cnt = 0;
> > +	u64 lmb_size = 0;
> >  	const __be32 *dm = NULL;
> > +	const __be64 *lrdr = NULL;
> > +	struct of_drconf_cell drmem;
> > +
> > +	dn = of_find_node_by_path("/rtas");
> > +	if (dn) {
> > +		lrdr = of_get_property(dn, "ibm,lrdr-capacity", NULL);
> > +		of_node_put(dn);
> > +		if (lrdr)
> > +			return be64_to_cpup(lrdr);
> > +	}
> >  
> >          memory = of_find_node_by_path("/ibm,dynamic-reconfiguration-memory");
> >          if (memory) {
> > -                drconf_cell_cnt = of_get_drconf_memory(memory, &dm);
> > -                lmb_size = of_get_lmb_size(memory);
> > -                of_node_put(memory);
> > -        }
> > -        return lmb_size * drconf_cell_cnt;
> > +		drconf_cell_cnt = of_get_drconf_memory(memory, &dm);
> > +		lmb_size = of_get_lmb_size(memory);
> > +
> > +		/* Advance to the last cell, each cell has 6 32 bit integers */
> > +		dm += (drconf_cell_cnt - 1) * 6;
> 
> You could do this as follows to avoid hard-coding 6

Can't do that since dm is of type __be32 pointer.

> 		dm += (drconf_cell_cnt - 1) * sizeof(struct of_drconf_cell)
> 
> > +		read_drconf_cell(&drmem, &dm);
> > +		of_node_put(memory);
> > +	}
> > +	return drmem.base_addr + lmb_size;
> 
> I assume it is a safe assumption that there will only be 1 RMA LMB?

No, I am not assuming RMA to have 1 LMB here. I fetch the last LMB and
get the max possible address from it by adding the base address of the
last LMB with the lmb_size.

> 
> I do see that the PAPR defines a bit in the flags field for each LMB
> in ibm,dynamic-memory as 'reserved'. Is this something you could use
> to flag RMA LMBs and put them in the ibm,dynamic-memory property?
> 
> I'm just curious why these LMBs are not in this property.

Not sure about both the above observations.

Section B.6.6 of LoPAPR mentions "... called the RMA, that is represented
by the first value of the reg property of this first /memory node. Additional
storage regions may each be represented by their own /memory node that
includes dynamic reconfiguration (DR) properties or by an entry in
/ibm,dynamic-reconfiguration-memory nodes"

Section B.6.6.2 says "All memory which is not subject to dynamic
reconfiguration (such as the RMA) is represented in /memory node(s).

Currently PowerKVM puts RMA region and boot time memory (which is
currently non DR in case of PowerKVM) in to their own /memory nodes and
puts only DR LMBs (hot pluggable) RAM into
ibm,dynamic-reconfiguration-memory node.

Regards,
Bharata.



More information about the Linuxppc-dev mailing list