[PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug

Thu Nov 16 05:28:19 AEDT 2017

Hello:
    See below.

On 10/16/2017 07:54 AM, Michael Ellerman wrote:
> Michael Bringmann <mwb at linux.vnet.ibm.com> writes:
> 
>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>> it may occur that the new resources are to be inserted into nodes
>> that were not used for memory resources at bootup.  Many different
>> configurations of PowerPC resources may need to be supported depending
>> upon the environment.
> 
> Give me some detail please?!

The most important characteristics that I have observed are:

* Dedicated vs. shared resources.  Shared resources require information
  such as the VPHN hcall for CPU assignment to nodes.
* memoryless nodes at boot.  Nodes need to be defined as 'possible' at
  boot for operation with other code modules.  Previously, the powerpc
  code would limit the set of possible/online nodes to those which have
  memory assigned at boot.  Subsequent add/remove of CPUs or memory would
  only work with this subset of possible nodes.
* memoryless nodes with CPUs at boot.  Due to the previous restriction on
  nodes, nodes that had CPUs but no memory were being collapsed into other
  nodes that did have memory at boot.  In practice this meant that the
  node assignment presented by the runtime kernel differed from the affinity
  and associativity attirbutes presented by the device tree or VPHN hcalls.
  Nodes that might be known to the pHyp were not 'possible' in the runtime
  kernel because they did not have memory at boot.

> 
>> This patch fixes some problems encountered at
> 
> What problems?

This patch set fixes a couple of problems.

* Nodes known to powerpc to be memoryless at boot, but to have CPUs in them
  are allowed to be 'possible' and 'online'.  Memory allocations for those
  nodes are taken from another node that does have memory until and if memory
  is hot-added to the node.
* Nodes which have no resources assigned at boot, but which may still be
  referenced subsequently by affinity or associativity attributes, are kept
  in the list of 'possible' nodes for powerpc.  Hot-add of memory or CPUs
  to the system can reference these nodes and bring them online instead of
  redirecting the resources to the set of nodes known to have memory at boot.

> 
>> runtime with configurations that support memory-less nodes, but which
>> allow CPUs to be added at and after boot.
> 
> How does it fix those problems?

This problem was fixed in a couple of ways.  First, the code now checks
whether the node to which a CPU is mapped by 'numa_update_cpu_topology' /
'arch_update_cpu_topology' has been initialized and has memory available.
If either test is false, a call is made to 'try_online_node()' to finish
the data structure initialization.  Only if we are unable to initialize
the node at this point will the CPU node assignment be collapsed into an
existing node.  After initialization by 'try_online_node()', calls to
'local_memory_node' no longer crash for these memoryless nodes.

> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index b385cd0..e811dd1 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>  	return rc;
>>  }
>>  
>> +static int verify_node_preparation(int nid)
>> +{
> 
> I would not expect a function called "verify" ...
> 
>> +	if ((NODE_DATA(nid) == NULL) ||
>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>> +		if (try_online_node(nid))
> 
> .. to do something like online a node.

We have changed the function name to 'find_cpu_nid'.

> 
>> +			return first_online_node;
>> +	}
>> +
>> +	return nid;
>> +}
>> +
>>  /*
>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>   * characteristics change. This function doesn't perform any locking and is
>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>  		/* Use associativity from first thread for all siblings */
>>  		vphn_get_associativity(cpu, associativity);
>>  		new_nid = associativity_to_nid(associativity);
>> -		if (new_nid < 0 || !node_online(new_nid))
>> +		if (new_nid < 0 || !node_possible(new_nid))
>>  			new_nid = first_online_node;
>>  
>> +		new_nid = verify_node_preparation(new_nid);
> 
> You're being called part-way through CPU hotplug here, are we sure it's
> safe to go and do memory hotplug from there? What's the locking
> situation?

We are not doing memory hotplug.  We are initializing a node that may be used
by CPUs or memory before it can be referenced as invalid by a CPU hotplug
operation.  CPU hotplug operations are protected by a range of APIs including
cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock,
device locks, and more.  Memory hotplug operations, including try_online_node,
are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more.
In the case of CPUs being hot-added to a previously memoryless node, the
try_online_node operation occurs wholly within the CPU locks with no overlap.
Using HMC hot-add/hot-remove operations, I have been able to add and remove
CPUs to any possible node without failures.  HMC operations involve a degree
self-serialization, though.

> 
> cheers
> 
> 

-- 
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line  363-5196
External: (512) 286-5196
Cell:       (512) 466-0650
mwb at linux.vnet.ibm.com