[RFC PATCH 1/2] powerpc/numa: Introduce logical numa id

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Mon Aug 10 04:40:56 AEST 2020


"Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:

> On 8/8/20 2:15 AM, Nathan Lynch wrote:
>> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>>> On 8/7/20 9:54 AM, Nathan Lynch wrote:
>>>> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>>> index e437a9ac4956..6c659aada55b 100644
>>>>> --- a/arch/powerpc/mm/numa.c
>>>>> +++ b/arch/powerpc/mm/numa.c
>>>>> @@ -221,25 +221,51 @@ static void initialize_distance_lookup_table(int nid,
>>>>>    	}
>>>>>    }
>>>>>    
>>>>> +static u32 nid_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  NUMA_NO_NODE};
>>>>
>>>> It's odd to me to use MAX_NUMNODES for this array when it's going to be
>>>> indexed not by Linux's logical node IDs but by the platform-provided
>>>> domain number, which has no relation to MAX_NUMNODES.
>>>
>>>
>>> I didn't want to dynamically allocate this. We could fetch
>>> "ibm,max-associativity-domains" to find the size for that. The current
>>> code do assume  firmware group id to not exceed MAX_NUMNODES. Hence kept
>>> the array size to be MAX_NUMNODEs. I do agree that it is confusing. May
>>> be we can do #define MAX_AFFINITY_DOMAIN MAX_NUMNODES?
>> 
>> Well, consider:
>> 
>> - ibm,max-associativity-domains can change at runtime with LPM. This
>>    doesn't happen in practice yet, but we should probably start thinking
>>    about how to support that.
>> - The domain numbering isn't clearly specified to have any particular
>>    properties such as beginning at zero or a contiguous range.
>> 
>> While the current code likely contains assumptions contrary to these
>> points, a change such as this is an opportunity to think about whether
>> those assumptions can be reduced or removed. In particular I think it
>> would be good to gracefully degrade when the number of NUMA affinity
>> domains can exceed MAX_NUMNODES. Using the platform-supplied domain
>> numbers to directly index Linux data structures will make that
>> impossible.
>> 
>> So, maybe genradix or even xarray wouldn't actually be overengineering
>> here.
>> 
>
> One of the challenges with such a data structure is that we initialize 
> the nid_map before the slab is available. This means a memblock based 
> allocation and we would end up implementing such a sparse data structure 
> ourselves here.
>
> As you mentioned above, since we know that hypervisor as of now limits 
> the max affinity domain id below ibm,max-associativity-domains we are 
> good with an array-like nid_map we have here. This keeps the code simpler.
>
> This will also allow us to switch to a more sparse data structure as you 
> requested here in the future because the main change that is pushed in 
> this series is the usage of firmare_group_id_to_nid(). The details of 
> the data structure we use to keep track of that mapping are pretty much 
> internal to that function.

How about this? This makes it not a direct index. But it do limit the
search to max numa node on the system. 

static int domain_id_map[MAX_NUMNODES] = {[0 ... MAX_NUMNODES - 1] =  -1 };

static int __affinity_domain_to_nid(int domain_id, int max_nid)
{
	int i;

	for (i = 0; i < max_nid; i++) {
		if (domain_id_map[i] == domain_id)
			return i;
	}
	return NUMA_NO_NODE;
}

int affinity_domain_to_nid(struct affinity_domain *domain)
{
	int nid, domain_id;
	static int last_nid = 0;
	static DEFINE_SPINLOCK(node_id_lock);

	domain_id = domain->id;
	/*
	 * For PowerNV we don't change the node id. This helps to avoid
	 * confusion w.r.t the expected node ids. On pseries, node numbers
	 * are virtualized. Hence do logical node id for pseries.
	 */
	if (!firmware_has_feature(FW_FEATURE_LPAR))
		return domain_id;

	if (domain_id ==  -1 || last_nid == MAX_NUMNODES)
		return NUMA_NO_NODE;

	nid = __affinity_domain_to_nid(domain_id, last_nid);

	if (nid == NUMA_NO_NODE) {
		spin_lock(&node_id_lock);
		/*  recheck with lock held */
		nid = __affinity_domain_to_nid(domain_id, last_nid);
		if (nid == NUMA_NO_NODE) {
			nid = last_nid++;
			domain_id_map[nid] = domain_id;
		}
		spin_unlock(&node_id_lock);
	}

	return nid;
}


More information about the Linuxppc-dev mailing list