[RFC PATCH 8/8] powerpc/papr_scm: Use FORM2 associativity details
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Thu Jun 17 21:46:05 AEST 2021
On 6/17/21 4:41 PM, Aneesh Kumar K.V wrote:
> Daniel Henrique Barboza <danielhb413 at gmail.com> writes:
>
>> On 6/17/21 4:46 AM, David Gibson wrote:
>>> On Tue, Jun 15, 2021 at 12:35:17PM +0530, Aneesh Kumar K.V wrote:
>>>> David Gibson <david at gibson.dropbear.id.au> writes:
>>>>
>>>>> On Tue, Jun 15, 2021 at 11:27:50AM +0530, Aneesh Kumar K.V wrote:
>>>>>> David Gibson <david at gibson.dropbear.id.au> writes:
>>>>>>
>>>>>>> On Mon, Jun 14, 2021 at 10:10:03PM +0530, Aneesh Kumar K.V wrote:
>>>>>>>> FORM2 introduce a concept of secondary domain which is identical to the
>>>>>>>> conceept of FORM1 primary domain. Use secondary domain as the numa node
>>>>>>>> when using persistent memory device. For DAX kmem use the logical domain
>>>>>>>> id introduced in FORM2. This new numa node
>>>>>>>>
>>>>>>>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>>>>>>>> ---
>>>>>>>> arch/powerpc/mm/numa.c | 28 +++++++++++++++++++++++
>>>>>>>> arch/powerpc/platforms/pseries/papr_scm.c | 26 +++++++++++++--------
>>>>>>>> arch/powerpc/platforms/pseries/pseries.h | 1 +
>>>>>>>> 3 files changed, 45 insertions(+), 10 deletions(-)
>>>>>>>>
>>>>>>>> diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
>>>>>>>> index 86cd2af014f7..b9ac6d02e944 100644
>>>>>>>> --- a/arch/powerpc/mm/numa.c
>>>>>>>> +++ b/arch/powerpc/mm/numa.c
>>>>>>>> @@ -265,6 +265,34 @@ static int associativity_to_nid(const __be32 *associativity)
>>>>>>>> return nid;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +int get_primary_and_secondary_domain(struct device_node *node, int *primary, int *secondary)
>>>>>>>> +{
>>>>>>>> + int secondary_index;
>>>>>>>> + const __be32 *associativity;
>>>>>>>> +
>>>>>>>> + if (!numa_enabled) {
>>>>>>>> + *primary = NUMA_NO_NODE;
>>>>>>>> + *secondary = NUMA_NO_NODE;
>>>>>>>> + return 0;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + associativity = of_get_associativity(node);
>>>>>>>> + if (!associativity)
>>>>>>>> + return -ENODEV;
>>>>>>>> +
>>>>>>>> + if (of_read_number(associativity, 1) >= primary_domain_index) {
>>>>>>>> + *primary = of_read_number(&associativity[primary_domain_index], 1);
>>>>>>>> + secondary_index = of_read_number(&distance_ref_points[1], 1);
>>>>>>>
>>>>>>> Secondary ID is always the second reference point, but primary depends
>>>>>>> on the length of resources? That seems very weird.
>>>>>>
>>>>>> primary_domain_index is distance_ref_point[0]. With Form2 we would find
>>>>>> both primary and secondary domain ID same for all resources other than
>>>>>> persistent memory device. The usage w.r.t. persistent memory is
>>>>>> explained in patch 7.
>>>>>
>>>>> Right, I misunderstood
>>>>>
>>>>>>
>>>>>> With Form2 the primary domainID and secondary domainID are used to identify the NUMA nodes
>>>>>> the kernel should use when using persistent memory devices.
>>>>>
>>>>> This seems kind of bogus. With Form1, the primary/secondary ID are a
>>>>> sort of heirarchy of distance (things with same primary ID are very
>>>>> close, things with same secondary are kinda-close, etc.). With Form2,
>>>>> it's referring to their effective node for different purposes.
>>>>>
>>>>> Using the same terms for different meanings seems unnecessarily
>>>>> confusing.
>>>>
>>>> They are essentially domainIDs. The interpretation of them are different
>>>> between Form1 and Form2. Hence I kept referring to them as primary and
>>>> secondary domainID. Any suggestion on what to name them with Form2?
>>>
>>> My point is that reusing associativity-reference-points for something
>>> with completely unrelated semantics seems like a very poor choice.
>>
>>
>> I agree that this reuse can be confusing. I could argue that there is
>> precedent for that in PAPR - FORM0 puts a different spin on the same
>> property as well - but there is no need to keep following existing PAPR
>> practices in new spec (and some might argue it's best not to).
>>
>> As far as QEMU goes, renaming this property to "numa-associativity-mode"
>> (just an example) is a quick change to do since we separated FORM1 and FORM2
>> code over there.
>>
>> Doing such a rename can also help with the issue of having to describe new
>> FORM2 semantics using "least significant boundary" or "primary domain" or
>> any FORM0|FORM1 related terminology.
>>
>
> It is not just changing the name, we will then have to explain the
> meaning of ibm,associativity-reference-points with FORM2 right?
>
> With FORM2 we want to represent the topology better
>
> --------------------------------------------------------------------------------
> | domainID 20 |
> | --------------------------------------- |
> | | NUMA node1 | |
> | | | -------------------- |
> | | ProcB -------> MEMC | | NUMA node40 | |
> | | | | | | |
> | | ---------------------------------- |--------> | PMEMD | |
> | | | -------------------- |
> | | | |
> | --------------------------------------- |
> --------------------------------------------------------------------------------
>
> ibm,associativity:
> { 20, 1, 40} -> PMEMD
> { 20, 1, 1} -> PROCB/MEMC
>
> is the suggested FORM2 representation.
>
>
We can simplify this as below too
ibm,associativity:
{ 20, 1, 40} -> PMEMD
{ 20, 1 } -> PROCB/MEMC
-aneesh
More information about the Linuxppc-dev
mailing list