[PATCH 2/2] powerpc/hotplug: Ensure nodes initialized for hotplug

Fri Nov 17 03:57:54 AEDT 2017

On 11/15/2017 12:28 PM, Michael Bringmann wrote:
> Hello:
>     See below.
> 
> On 10/16/2017 07:54 AM, Michael Ellerman wrote:
>> Michael Bringmann <mwb at linux.vnet.ibm.com> writes:
>>
>>> powerpc/hotplug: On systems like PowerPC which allow 'hot-add' of CPU,
>>> it may occur that the new resources are to be inserted into nodes
>>> that were not used for memory resources at bootup.  Many different
>>> configurations of PowerPC resources may need to be supported depending
>>> upon the environment.
>>
>> Give me some detail please?!
> 
> The most important characteristics that I have observed are:
> 
> * Dedicated vs. shared resources.  Shared resources require information
>   such as the VPHN hcall for CPU assignment to nodes.
> * memoryless nodes at boot.  Nodes need to be defined as 'possible' at
>   boot for operation with other code modules.  Previously, the powerpc
>   code would limit the set of possible/online nodes to those which have
>   memory assigned at boot.  Subsequent add/remove of CPUs or memory would
>   only work with this subset of possible nodes.
> * memoryless nodes with CPUs at boot.  Due to the previous restriction on
>   nodes, nodes that had CPUs but no memory were being collapsed into other
>   nodes that did have memory at boot.  In practice this meant that the
>   node assignment presented by the runtime kernel differed from the affinity
>   and associativity attirbutes presented by the device tree or VPHN hcalls.
>   Nodes that might be known to the pHyp were not 'possible' in the runtime
>   kernel because they did not have memory at boot.
> 
>>
>>> This patch fixes some problems encountered at
>>
>> What problems?
> 
> This patch set fixes a couple of problems.
> 
> * Nodes known to powerpc to be memoryless at boot, but to have CPUs in them
>   are allowed to be 'possible' and 'online'.  Memory allocations for those
>   nodes are taken from another node that does have memory until and if memory
>   is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still be
>   referenced subsequently by affinity or associativity attributes, are kept
>   in the list of 'possible' nodes for powerpc.  Hot-add of memory or CPUs
>   to the system can reference these nodes and bring them online instead of
>   redirecting the resources to the set of nodes known to have memory at boot.
> 
>>
>>> runtime with configurations that support memory-less nodes, but which
>>> allow CPUs to be added at and after boot.
>>
>> How does it fix those problems?
> 
> This problem was fixed in a couple of ways.  First, the code now checks
> whether the node to which a CPU is mapped by 'numa_update_cpu_topology' /
> 'arch_update_cpu_topology' has been initialized and has memory available.
> If either test is false, a call is made to 'try_online_node()' to finish
> the data structure initialization.  Only if we are unable to initialize
> the node at this point will the CPU node assignment be collapsed into an
> existing node.  After initialization by 'try_online_node()', calls to
> 'local_memory_node' no longer crash for these memoryless nodes.
> 
>>
>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>> index b385cd0..e811dd1 100644
>>> --- a/arch/powerpc/mm/numa.c
>>> +++ b/arch/powerpc/mm/numa.c
>>> @@ -1325,6 +1325,17 @@ static long vphn_get_associativity(unsigned long cpu,
>>>  	return rc;
>>>  }
>>>  
>>> +static int verify_node_preparation(int nid)
>>> +{
>>
>> I would not expect a function called "verify" ...
>>
>>> +	if ((NODE_DATA(nid) == NULL) ||
>>> +	    (NODE_DATA(nid)->node_spanned_pages == 0)) {
>>> +		if (try_online_node(nid))
>>
>> .. to do something like online a node.
> 
> We have changed the function name to 'find_cpu_nid'.

Ok, but I would still not expect 'find_cpu_nid' to online the node.

> 
>>
>>> +			return first_online_node;
>>> +	}
>>> +
>>> +	return nid;
>>> +}
>>> +
>>>  /*
>>>   * Update the CPU maps and sysfs entries for a single CPU when its NUMA
>>>   * characteristics change. This function doesn't perform any locking and is
>>> @@ -1433,9 +1444,11 @@ int numa_update_cpu_topology(bool cpus_locked)
>>>  		/* Use associativity from first thread for all siblings */
>>>  		vphn_get_associativity(cpu, associativity);
>>>  		new_nid = associativity_to_nid(associativity);
>>> -		if (new_nid < 0 || !node_online(new_nid))
>>> +		if (new_nid < 0 || !node_possible(new_nid))
>>>  			new_nid = first_online_node;
>>>  
>>> +		new_nid = verify_node_preparation(new_nid);
>>
>> You're being called part-way through CPU hotplug here, are we sure it's
>> safe to go and do memory hotplug from there? What's the locking
>> situation?
> 
> We are not doing memory hotplug.  We are initializing a node that may be used
> by CPUs or memory before it can be referenced as invalid by a CPU hotplug
> operation.  CPU hotplug operations are protected by a range of APIs including
> cpu_maps_update_begin/cpu_maps_update_done, cpus_read/write_lock / cpus_read/write_unlock,
> device locks, and more.  Memory hotplug operations, including try_online_node,
> are protected by mem_hotplug_begin/mem_hotplug_done, device locks, and more.
> In the case of CPUs being hot-added to a previously memoryless node, the
> try_online_node operation occurs wholly within the CPU locks with no overlap.
> Using HMC hot-add/hot-remove operations, I have been able to add and remove
> CPUs to any possible node without failures.  HMC operations involve a degree
> self-serialization, though.

For both memory and cpu DLPAR operations we hold the device hotplug lock.

-Nathan

> 
>>
>> cheers
>>
>>
>