[RFC PATCH 1/2] powerpc/numa: Introduce logical numa id
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Mon Aug 10 00:12:51 AEST 2020
On 8/8/20 2:15 AM, Nathan Lynch wrote:
> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>> On 8/7/20 9:54 AM, Nathan Lynch wrote:
>>> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>> index e437a9ac4956..6c659aada55b 100644
>>>> --- a/arch/powerpc/mm/numa.c
>>>> +++ b/arch/powerpc/mm/numa.c
>>>> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
>>>> }
>>>> }
>>>>
>>>> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] = NUMA_NO_NODE};
>>>
>>> It's odd to me to use MAX_NUMNODES for this array when it's going to be
>>> indexed not by Linux's logical node IDs but by the platform-provided
>>> domain number, which has no relation to MAX_NUMNODES.
>>
>>
>> I didn't want to dynamically allocate this. We could fetch
>> "ibm,max-associativity-domains" to find the size for that. The current
>> code do assume firmware group id to not exceed MAX_NUMNODES. Hence kept
>> the array size to be MAX_NUMNODEs. I do agree that it is confusing. May
>> be we can do #define MAX_AFFINITY_DOMAIN MAX_NUMNODES?
>
> Well, consider:
>
> - ibm,max-associativity-domains can change at runtime with LPM. This
> doesn't happen in practice yet, but we should probably start thinking
> about how to support that.
> - The domain numbering isn't clearly specified to have any particular
> properties such as beginning at zero or a contiguous range.
>
> While the current code likely contains assumptions contrary to these
> points, a change such as this is an opportunity to think about whether
> those assumptions can be reduced or removed. In particular I think it
> would be good to gracefully degrade when the number of NUMA affinity
> domains can exceed MAX_NUMNODES. Using the platform-supplied domain
> numbers to directly index Linux data structures will make that
> impossible.
>
> So, maybe genradix or even xarray wouldn't actually be overengineering
> here.
>
One of the challenges with such a data structure is that we initialize
the nid_map before the slab is available. This means a memblock based
allocation and we would end up implementing such a sparse data structure
ourselves here.
As you mentioned above, since we know that hypervisor as of now limits
the max affinity domain id below ibm,max-associativity-domains we are
good with an array-like nid_map we have here. This keeps the code simpler.
This will also allow us to switch to a more sparse data structure as you
requested here in the future because the main change that is pushed in
this series is the usage of firmare_group_id_to_nid(). The details of
the data structure we use to keep track of that mapping are pretty much
internal to that function.
-aneesh
More information about the Linuxppc-dev
mailing list