[PATCH v4 7/7] powerpc/pseries: Add support for FORM2 associativity

Aneesh Kumar K.V aneesh.kumar at linux.ibm.com
Thu Jun 24 20:55:48 AEST 2021


On 6/24/21 4:03 PM, Laurent Dufour wrote:
> Hi Aneesh,
> 
> A little bit of wordsmithing below...
> 
> Le 17/06/2021 à 18:51, Aneesh Kumar K.V a écrit :
>> PAPR interface currently supports two different ways of communicating 
>> resource
>> grouping details to the OS. These are referred to as Form 0 and Form 1
>> associativity grouping. Form 0 is the older format and is now considered
>> deprecated. This patch adds another resource grouping named FORM2.
>>
>> Signed-off-by: Daniel Henrique Barboza <danielhb413 at gmail.com>
>> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar at linux.ibm.com>
>> ---
>>   Documentation/powerpc/associativity.rst   | 135 ++++++++++++++++++++
>>   arch/powerpc/include/asm/firmware.h       |   3 +-
>>   arch/powerpc/include/asm/prom.h           |   1 +
>>   arch/powerpc/kernel/prom_init.c           |   3 +-
>>   arch/powerpc/mm/numa.c                    | 149 +++++++++++++++++++++-
>>   arch/powerpc/platforms/pseries/firmware.c |   1 +
>>   6 files changed, 286 insertions(+), 6 deletions(-)
>>   create mode 100644 Documentation/powerpc/associativity.rst
>>
>> diff --git a/Documentation/powerpc/associativity.rst 
>> b/Documentation/powerpc/associativity.rst
>> new file mode 100644
>> index 000000000000..93be604ac54d
>> --- /dev/null
>> +++ b/Documentation/powerpc/associativity.rst
>> @@ -0,0 +1,135 @@
>> +============================
>> +NUMA resource associativity
>> +=============================
>> +
>> +Associativity represents the groupings of the various platform 
>> resources into
>> +domains of substantially similar mean performance relative to 
>> resources outside
>> +of that domain. Resources subsets of a given domain that exhibit better
>> +performance relative to each other than relative to other resources 
>> subsets
>> +are represented as being members of a sub-grouping domain. This 
>> performance
>> +characteristic is presented in terms of NUMA node distance within the 
>> Linux kernel.
>> +From the platform view, these groups are also referred to as domains.
>> +
>> +PAPR interface currently supports different ways of communicating 
>> these resource
>> +grouping details to the OS. These are referred to as Form 0, Form 1 
>> and Form2
>> +associativity grouping. Form 0 is the older format and is now 
>> considered deprecated.
>> +
>> +Hypervisor indicates the type/form of associativity used via 
>> "ibm,arcitecture-vec-5 property".
>                                                             architecture ^
> 

fixed

>> +Bit 0 of byte 5 in the "ibm,architecture-vec-5" property indicates 
>> usage of Form 0 or Form 1.
>> +A value of 1 indicates the usage of Form 1 associativity. For Form 2 
>> associativity
>> +bit 2 of byte 5 in the "ibm,architecture-vec-5" property is used.
>> +
>> +Form 0
>> +-----
>> +Form 0 associativity supports only two NUMA distance (LOCAL and REMOTE).
>> +
>> +Form 1
>> +-----
>> +With Form 1 a combination of ibm,associativity-reference-points and 
>> ibm,associativity
>> +device tree properties are used to determine the NUMA distance 
>> between resource groups/domains.
>> +
>> +The “ibm,associativity” property contains one or more lists of 
>> numbers (domainID)
>> +representing the resource’s platform grouping domains.
>> +
>> +The “ibm,associativity-reference-points” property contains one or 
>> more list of numbers
>> +(domainID index) that represents the 1 based ordinal in the 
>> associativity lists.
>> +The list of domainID index represnets increasing hierachy of resource 
>> grouping.
>                          represents ^
> 

fixed

>> +
>> +ex:
>> +{ primary domainID index, secondary domainID index, tertiary domainID 
>> index.. }
>> +
>> +Linux kernel uses the domainID at the primary domainID index as the 
>> NUMA node id.
>> +Linux kernel computes NUMA distance between two domains by 
>> recursively comparing
>> +if they belong to the same higher-level domains. For mismatch at 
>> every higher
>> +level of the resource group, the kernel doubles the NUMA distance 
>> between the
>> +comparing domains.
>> +
>> +Form 2
>> +-------
>> +Form 2 associativity format adds separate device tree properties 
>> representing NUMA node distance
>> +thereby making the node distance computation flexible. Form 2 also 
>> allows flexible primary
>> +domain numbering. With numa distance computation now detached from 
>> the index value of
>> +"ibm,associativity" property, Form 2 allows a large number of primary 
>> domain ids at the
>> +same domainID index representing resource groups of different 
>> performance/latency characteristics.
>> +
>> +Hypervisor indicates the usage of FORM2 associativity using bit 2 of 
>> byte 5 in the
>> +"ibm,architecture-vec-5" property.
>> +
>> +"ibm,numa-lookup-index-table" property contains one or more list 
>> numbers representing
>> +the domainIDs present in the system. The offset of the domainID in 
>> this property is considered
>> +the domainID index.
>> +
>> +prop-encoded-array: The number N of the domainIDs encoded as with 
>> encode-int, followed by
>> +N domainID encoded as with encode-int
>> +
>> +For ex:
>> +ibm,numa-lookup-index-table =  {4, 0, 8, 250, 252}, domainID index 
>> for domainID 8 is 1.
>> +
>> +"ibm,numa-distance-table" property contains one or more list of 
>> numbers representing the NUMA
>> +distance between resource groups/domains present in the system.
>> +
>> +prop-encoded-array: The number N of the distance values encoded as 
>> with encode-int, followed by
>> +N distance values encoded as with encode-bytes. The max distance 
>> value we could encode is 255.
>> +
>> +For ex:
>> +ibm,numa-lookup-index-table =  {3, 0, 8, 40}
>> +ibm,numa-distance-table     =  {9, 10, 20, 80, 20, 10, 160, 80, 160, 10}
>> +
>> +  | 0    8   40
>> +--|------------
>> +  |
>> +0 | 10   20  80
>> +  |
>> +8 | 20   10  160
>> +  |
>> +40| 80   160  10
>> +
>> +
>> +"ibm,associativity" property for resources in node 0, 8 and 40
>> +
>> +{ 3, 6, 7, 0 }
>> +{ 3, 6, 9, 8 }
>> +{ 3, 6, 7, 40}
>> +
>> +With "ibm,associativity-reference-points"  { 0x3 }
>> +
>> +Each resource (drcIndex) now also supports additional optional device 
>> tree properties.
>> +These properties are marked optional because the platform can choose 
>> not to export
>> +them and provide the system topology details using the earlier 
>> defined device tree
>> +properties alone. The optional device tree properties are used when 
>> adding new resources
>> +(DLPAR) and when the platform didn't provide the topology details of 
>> the domain which
>> +contains the newly added resource during boot.
>> +
>> +"ibm,numa-lookup-index" property contains a number representing the 
>> domainID index to be used
>> +when building the NUMA distance of the numa node to which this 
>> resource belongs. This can
>> +be looked at as the index at which this new domainID would have 
>> appeared in
>> +"ibm,numa-lookup-index-table" if the domain was present during boot. 
>> The domainID
>> +of the new resource can be obtained from the existing 
>> "ibm,associativity" property. This
>> +can be used to build distance information of a newly onlined NUMA 
>> node via DLPAR operation.
>> +The value is 1 based array index value.
>> +
>> +prop-encoded-array: An integer encoded as with encode-int specifying 
>> the domainID index
>> +
>> +"ibm,numa-distance" property contains one or more list of numbers 
>> presenting the NUMA distance
>> +from this resource domain to other resources.
>> +
>> +prop-encoded-array: The number N of the distance values encoded as 
>> with encode-int, followed by
>> +N distance values encoded as with encode-bytes. The max distance 
>> value we could encode is 255.
>> +
>> +For ex:
>> +ibm,associativity     = { 4, 5, 10, 50}
> 
> Is missing the first byte of the property (length) or an associativity 
> number?
> 

that should be {3, 5,10,50}  fixed.

>> +ibm,numa-lookup-index = { 4 }
>> +ibm,numa-distance   =  {8, 160, 255, 80, 10, 160, 255, 80, 10}
>> +
>> +resulting in a new toplogy as below.
>> +  | 0    8   40   50
>> +--|------------------
>> +  |
>> +0 | 10   20  80   160
>> +  |
>> +8 | 20   10  160  255
>> +  |
>> +40| 80   160  10  80
>> +  |
>> +50| 160  255  80  10
>> +
>> diff --git a/arch/powerpc/include/asm/firmware.h 
>> b/arch/powerpc/include/asm/firmware.h
>> index 60b631161360..97a3bd9ffeb9 100644
>> --- a/arch/powerpc/include/asm/firmware.h
>> +++ b/arch/powerpc/include/asm/firmware.h
>

...

>> +    numa_distancep = of_get_property(node, "ibm,numa-distance", NULL);
>> +    if (!numa_distancep)
>> +        return;
>> +
>> +    numa_indexp = of_get_property(node, "ibm,numa-lookup-index", NULL);
>> +    if (!numa_indexp)
>> +        return;
>> +
>> +    numa_index = of_read_number(numa_indexp, 1);
>> +    /*
>> +     * update the numa_id_index_table. Device tree look at index 
>> table as
>> +     * 1 based array indexing.
>> +     */
>> +    numa_id_index_table[numa_index - 1] = nid;
>> +
>> +    max_numa_index = of_read_number((const __be32 *)numa_distancep, 1);
>> +    VM_WARN_ON(max_numa_index != 2 * numa_index);
> 
> Could you explain shortly in a comment the meaning of this VM_WARN_ON 
> check?
> 

Based on the other review feedback this is dropped. We now derive domain 
distance offset based on the number of elements in "ibm,numa-distance"

>> +    /* Skip the size which is encoded int */
>> +    numa_distancep += sizeof(__be32);
>> +
>> +    /*
>> +     * First fill the distance information from other node to this node.
>> +     */
>> +    other_nid_index = 0;
>> +    for (i = 0; i < numa_index; i++) {
>> +        numa_distance = numa_distancep[i];
>> +        other_nid = numa_id_index_table[other_nid_index++];
>> +        numa_distance_table[other_nid][nid] = numa_distance;
>> +    }
>> +
>> +    other_nid_index = 0;
>> +    for (; i < max_numa_index; i++) {
>> +        numa_distance = numa_distancep[i];
>> +        other_nid = numa_id_index_table[other_nid_index++];
>> +        numa_distance_table[nid][other_nid] = numa_distance;
>> +    }
>> +}
>> +

Thanks for reviewing the patch.

-aneesh


More information about the Linuxppc-dev mailing list