[PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Mon Jun 27 15:27:08 AEST 2022


Srikar Dronamraju <srikar at linux.vnet.ibm.com> writes:

> * Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com> [2022-06-23 18:24:42]:
>
>> While building the cpu_to_node map make sure we always use the online node
>> to build the mapping table. In general this should not be an issue
>> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
>> nodes online (setup_node_data()). Hence NUMA nodes we find during
>> lookup in numa_setup_cpu() will always be found online.
>> 
>> To keep logic simpler/correct, make sure that if the hypervisor
>> or device tree returned a not online node, don't use that to build
>> the map table. Instead, use the first_online_node.
>
> Why should the returned nid be already online. Are we facing any problem
> with the current code?

There is no issue with the current upstream code. The problem was
detected in a distro kernel where we had partial backport. We didn't
initialize NUMA node details correctly. So only a subset of actual
NUMA nodes got marked online. This resulted in a crash at completely
independent part of the kernel because the kernel try to derefer NODE_DATA()
which was not initialized. 

>
> Since general idea is to keep cpu_to_node constant, By assigining it the
> first_online_node, we may be ending up assigning a wrong node, resulting a
> performance penalty later on. i.e CPU may actually belong to node 4 but we
> assigned it to node 1. Also the coregroup id is derived from associavity
> array. If there is a mismatch between the coregroup id and nid,
> scheduler will crib.

The changed code will never be executed unless something broke between
the NUMA node init (setup_node_data() and numa_setup_cpu()). The reason
the code change was made was to keep it consistent with the fact that we
only initialize NODE_DATA for online NUMA nodes and hence don't return
a possible NUMA node value in numa_setup_cpu(). 

>
>> 
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>>  arch/powerpc/mm/numa.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 0801b2ce9b7d..f387b9eb9dc9 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>>  	of_node_put(cpu);
>> 
>>  out_present:
>> -	if (nid < 0 || !node_possible(nid))
>> +	if (nid < 0 || !node_online(nid))
>>  		nid = first_online_node;
>> 
>>  	/*
>> -- 
>> 2.36.1
>> 
>
> -- 
> Thanks and Regards
> Srikar Dronamraju


More information about the Linuxppc-dev mailing list