[PATCH 2/2] powerpc/numa: Return the first online node if device tree mapping returns a not online node
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Mon Jun 27 15:27:08 AEST 2022
Srikar Dronamraju <srikar at linux.vnet.ibm.com> writes:
> * Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com> [2022-06-23 18:24:42]:
>
>> While building the cpu_to_node map make sure we always use the online node
>> to build the mapping table. In general this should not be an issue
>> because the kernel use similar lookup mechanism (vphn_get_nid()) to mark
>> nodes online (setup_node_data()). Hence NUMA nodes we find during
>> lookup in numa_setup_cpu() will always be found online.
>>
>> To keep logic simpler/correct, make sure that if the hypervisor
>> or device tree returned a not online node, don't use that to build
>> the map table. Instead, use the first_online_node.
>
> Why should the returned nid be already online. Are we facing any problem
> with the current code?
There is no issue with the current upstream code. The problem was
detected in a distro kernel where we had partial backport. We didn't
initialize NUMA node details correctly. So only a subset of actual
NUMA nodes got marked online. This resulted in a crash at completely
independent part of the kernel because the kernel try to derefer NODE_DATA()
which was not initialized.
>
> Since general idea is to keep cpu_to_node constant, By assigining it the
> first_online_node, we may be ending up assigning a wrong node, resulting a
> performance penalty later on. i.e CPU may actually belong to node 4 but we
> assigned it to node 1. Also the coregroup id is derived from associavity
> array. If there is a mismatch between the coregroup id and nid,
> scheduler will crib.
The changed code will never be executed unless something broke between
the NUMA node init (setup_node_data() and numa_setup_cpu()). The reason
the code change was made was to keep it consistent with the fact that we
only initialize NODE_DATA for online NUMA nodes and hence don't return
a possible NUMA node value in numa_setup_cpu().
>
>>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>> arch/powerpc/mm/numa.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>> index 0801b2ce9b7d..f387b9eb9dc9 100644
>> --- a/arch/powerpc/mm/numa.c
>> +++ b/arch/powerpc/mm/numa.c
>> @@ -741,7 +741,7 @@ static int numa_setup_cpu(unsigned long lcpu)
>> of_node_put(cpu);
>>
>> out_present:
>> - if (nid < 0 || !node_possible(nid))
>> + if (nid < 0 || !node_online(nid))
>> nid = first_online_node;
>>
>> /*
>> --
>> 2.36.1
>>
>
> --
> Thanks and Regards
> Srikar Dronamraju
More information about the Linuxppc-dev
mailing list