[PATCH V8 1/3] powerpc/nodes: Ensure enough nodes avail for operations
Nathan Fontenot
nfont at linux.vnet.ibm.com
Tue Jan 9 06:13:34 AEDT 2018
On 11/28/2017 04:58 PM, Michael Bringmann wrote:
> On powerpc systems which allow 'hot-add' of CPU or memory resources,
> it may occur that the new resources are to be inserted into nodes
> that were not used for these resources at bootup. In the kernel,
> any node that is used must be defined and initialized. These empty
> nodes may occur when,
>
> * Dedicated vs. shared resources. Shared resources require
> information such as the VPHN hcall for CPU assignment to nodes.
> Associativity decisions made based on dedicated resource rules,
> such as associativity properties in the device tree, may vary
> from decisions made using the values returned by the VPHN hcall.
> * memoryless nodes at boot. Nodes need to be defined as 'possible'
> at boot for operation with other code modules. Previously, the
> powerpc code would limit the set of possible nodes to those which
> have memory assigned at boot, and were thus online. Subsequent
> add/remove of CPUs or memory would only work with this subset of
> possible nodes.
> * memoryless nodes with CPUs at boot. Due to the previous restriction
> on nodes, nodes that had CPUs but no memory were being collapsed
> into other nodes that did have memory at boot. In practice this
> meant that the node assignment presented by the runtime kernel
> differed from the affinity and associativity attributes presented
> by the device tree or VPHN hcalls. Nodes that might be known to
> the pHyp were not 'possible' in the runtime kernel because they did
> not have memory at boot.
>
> This patch ensures that sufficient nodes are defined to support
> configuration requirements after boot, as well as at boot. This
> patch set fixes a couple of problems.
>
> * Nodes known to powerpc to be memoryless at boot, but to have
> CPUs in them are allowed to be 'possible' and 'online'. Memory
> allocations for those nodes are taken from another node that does
> have memory until and if memory is hot-added to the node.
> * Nodes which have no resources assigned at boot, but which may still
> be referenced subsequently by affinity or associativity attributes,
> are kept in the list of 'possible' nodes for powerpc. Hot-add of
> memory or CPUs to the system can reference these nodes and bring
> them online instead of redirecting to one of the set of nodes that
> were known to have memory at boot.
>
> This patch extracts the value of the lowest domain level (number of
> allocable resources) from the device tree property
> "ibm,max-associativity-domains" to use as the maximum number of nodes
> to setup as possibly available in the system. This new setting will
> override the instruction,
>
> nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> presently seen in the function arch/powerpc/mm/numa.c:initmem_init().
>
> If the "ibm,max-associativity-domains" property is not present at boot,
> no operation will be performed to define or enable additional nodes, or
> enable the above 'nodes_and()'.
>
> Signed-off-by: Michael Bringmann <mwb at linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont at linux.vnet.ibm.com>
> ---
> Changes in V8:
> -- Remove unneeded pr_info() statement
> ---
> arch/powerpc/mm/numa.c | 37 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 34 insertions(+), 3 deletions(-)
>
> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
> index adb6364f..735e3fd 100644
> --- a/arch/powerpc/mm/numa.c
> +++ b/arch/powerpc/mm/numa.c
> @@ -892,6 +892,34 @@ static void __init setup_node_data(int nid, u64 start_pfn, u64 end_pfn)
> NODE_DATA(nid)->node_spanned_pages = spanned_pages;
> }
>
> +static void __init find_possible_nodes(void)
> +{
> + struct device_node *rtas;
> + u32 numnodes, i;
> +
> + if (min_common_depth <= 0)
> + return;
> +
> + rtas = of_find_node_by_path("/rtas");
> + if (!rtas)
> + return;
> +
> + if (of_property_read_u32_index(rtas,
> + "ibm,max-associativity-domains",
> + min_common_depth, &numnodes))
> + goto out;
> +
> + for (i = 0; i < numnodes; i++) {
> + if (!node_possible(i)) {
> + setup_node_data(i, 0, 0);
> + node_set(i, node_possible_map);
> + }
> + }
> +
> +out:
> + of_node_put(rtas);
> +}
> +
> void __init initmem_init(void)
> {
> int nid, cpu;
> @@ -905,12 +933,15 @@ void __init initmem_init(void)
> memblock_dump_all();
>
> /*
> - * Reduce the possible NUMA nodes to the online NUMA nodes,
> - * since we do not support node hotplug. This ensures that we
> - * lower the maximum NUMA node ID to what is actually present.
> + * Modify the set of possible NUMA nodes to reflect information
> + * available about the set of online nodes, and the set of nodes
> + * that we expect to make use of for this platform's affinity
> + * calculations.
> */
> nodes_and(node_possible_map, node_possible_map, node_online_map);
>
> + find_possible_nodes();
> +
> for_each_online_node(nid) {
> unsigned long start_pfn, end_pfn;
>
More information about the Linuxppc-dev
mailing list