[PATCH V7 1/3] powerpc/nodes: Ensure enough nodes avail for operations
Michael Bringmann
mwb at linux.vnet.ibm.com
Tue Nov 28 07:02:23 AEDT 2017
See below.
On 11/20/2017 10:33 AM, Nathan Fontenot wrote:
>
>
> On 11/16/2017 11:24 AM, Michael Bringmann wrote:
>> On powerpc systems which allow 'hot-add' of CPU or memory resources,
>> it may occur that the new resources are to be inserted into nodes
>> that were not used for these resources at bootup. In the kernel,
>> any node that is used must be defined and initialized. These empty
>> nodes may occur when,
>>
>> * Dedicated vs. shared resources. Shared resources require
>> information such as the VPHN hcall for CPU assignment to nodes.
>> Associativity decisions made based on dedicated resource rules,
>> such as associativity properties in the device tree, may vary
>> from decisions made using the values returned by the VPHN hcall.
>> * memoryless nodes at boot. Nodes need to be defined as 'possible'
>> at boot for operation with other code modules. Previously, the
>> powerpc code would limit the set of possible nodes to those which
>> have memory assigned at boot, and were thus online. Subsequent
>> add/remove of CPUs or memory would only work with this subset of
>> possible nodes.
>> * memoryless nodes with CPUs at boot. Due to the previous restriction
>> on nodes, nodes that had CPUs but no memory were being collapsed
>> into other nodes that did have memory at boot. In practice this
>> meant that the node assignment presented by the runtime kernel
>> differed from the affinity and associativity attributes presented
>> by the device tree or VPHN hcalls. Nodes that might be known to
>> the pHyp were not 'possible' in the runtime kernel because they did
>> not have memory at boot.
>>
>> This patch ensures that sufficient nodes are defined to support
>> configuration requirements after boot, as well as at boot. This
>> patch set fixes a couple of problems.
>>
>> * Nodes known to powerpc to be memoryless at boot, but to have
>> CPUs in them are allowed to be 'possible' and 'online'. Memory
>> allocations for those nodes are taken from another node that does
>> have memory until and if memory is hot-added to the node.
>> * Nodes which have no resources assigned at boot, but which may still
>> be referenced subsequently by affinity or associativity attributes,
>> are kept in the list of 'possible' nodes for powerpc. Hot-add of
>> memory or CPUs to the system can reference these nodes and bring
>> them online instead of redirecting to one of the set of nodes that
>> were known to have memory at boot.
>>
>> This patch extracts the value of the lowest domain level (number of
>> allocable resources) from the device tree property
>> "ibm,max-associativity-domains" to use as the maximum number of nodes
>> to setup as possibly available in the system. This new setting will
>> override the instruction,
>>
>> nodes_and(node_possible_map, node_possible_map, node_online_map);
>>
>> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>>
>> If the "ibm,max-associativity-domains" property is not present at boot,
>> no operation will be performed to define or enable additional nodes, or
>> enable the above 'nodes_and()'.
>>
>> Signed-off-by: Michael Bringmann <mwb at linux.vnet.ibm.com>
>> ---
>> Changes in V6:
>> -- Remove some node initialization/allocation from boot setup
>> to later in runtime to try to limit memory needs early on
>> -- Augment descriptive documentation for patch
>> ---
>> arch/powerpc/mm/numa.c | 40 +++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 37 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index eb604b3..334a1ff 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -892,6 +892,37 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
>> NODE_DATA(nid)->node_spanned_pages = spanned_pages;
>> }
>>
>> +static void __init find_possible_nodes(void)
>> +{
>> + struct device_node *rtas;
>> + u32 numnodes, i;
>> +
>> + if (min_common_depth <= 0)
>> + return;
>> +
>> + rtas = of_find_node_by_path("/rtas");
>> + if (!rtas)
>> + return;
>> +
>> + if (of_property_read_u32_index(rtas,
>> + "ibm,max-associativity-domains",
>> + min_common_depth, &numnodes))
>> + goto out;
>> +
>> + pr_info("numa: Nodes = %d (mcd = %d)\n", numnodes,
>> + min_common_depth);
>
> numa.c already has a pr_fmt define, no need to pre-pend "numa:" to the
> information message.
>
> -Nathan
Okay.
>
>> +
>> + for (i = 0; i < numnodes; i++) {
>> + if (!node_possible(i)) {
>> + setup_node_data(i, 0, 0);
>> + node_set(i, node_possible_map);
>> + }
>> + }
>> +
>> +out:
>> + of_node_put(rtas);
>> +}
>> +
>> void __init initmem_init(void)
>> {
>> int nid, cpu;
>> @@ -905,12 +936,15 @@ void __init initmem_init(void)
>> memblock_dump_all();
>>
>> /*
>> - * Reduce the possible NUMA nodes to the online NUMA nodes,
>> - * since we do not support node hotplug. This ensures that we
>> - * lower the maximum NUMA node ID to what is actually present.
>> + * Modify the set of possible NUMA nodes to reflect information
>> + * available about the set of online nodes, and the set of nodes
>> + * that we expect to make use of for this platform's affinity
>> + * calculations.
>> */
>> nodes_and(node_possible_map, node_possible_map, node_online_map);
>>
>> + find_possible_nodes();
>> +
>> for_each_online_node(nid) {
>> unsigned long start_pfn, end_pfn;
>>
>
>
--
Michael W. Bringmann
Linux Technology Center
IBM Corporation
Tie-Line 363-5196
External: (512) 286-5196
Cell: (512) 466-0650
mwb at linux.vnet.ibm.com
More information about the Linuxppc-dev
mailing list