[Skiboot] [PATCH v2 09/15] platforms/astbmc/witherspoon: Rework NPU presence detection

Mon Feb 4 15:29:57 AEDT 2019

On 01/02/2019 17:55, Andrew Donnellan wrote:
> On 21/1/19 4:30 pm, Alexey Kardashevskiy wrote:
>>
>>
>> On 11/01/2019 12:09, Andrew Donnellan wrote:
>>> Rework NPU presence detection in preparation for supporting both
>>> NVLink and
>>> OpenCAPI devices operating simultaneously on the same NPU.
>>>
>>> If an OpenCAPI card is connected to GPU#0, and an NVLink GPU is
>>> connected
>>> to GPU#1,
>>
>> What is "NVLink GPU"? A peer NVLink of GPU#0 connected to GPU#1?
> 
> It's a GPU that is connected to the system via NVLink. The simplest
> possible meaning. Not a "peer" or anything.

Then what are GPU#0 and GPU#1? Are these connectors which can receive
opencapi or "nvlink gpu" cards? We do not call them "slots" whyyyy? :)

>> What is the supported hardware configuration? In other words, is there a
>> spec describing it such as a nice drawing from
>> Witherspoon_Design_Workbook_v1.7_19June2018.pdf, page 40 with the wiring
>> diagram?
> 
> Wait, you're asking for sensible documentation? That's not how any of
> this works :) The diagrams in the Witherspoon workbook are a good start.
> 
> On a Redbud system, any combination of GPUs and OpenCAPI cards in the 4
> connectors provided across 2 CPUs should work, however as documented
> below there are some combinations where the GPU will only receive 2
> links rather than 3 links. To understand this there are some diagrams
> you have to decipher in the P9 Fabric workbook as well.
> 
>>
>> The patch could be much simpler:
>>
>> diff --git a/platforms/astbmc/witherspoon.c
>> b/platforms/astbmc/witherspoon.c
>> index fe13899..b592a67 100644
>> --- a/platforms/astbmc/witherspoon.c
>> +++ b/platforms/astbmc/witherspoon.c
>> @@ -233,6 +233,7 @@ static void witherspoon_npu2_device_detect(struct
>> npu2 *npu)
>>          int rc;
>>
>>          bool gpu0_present, gpu1_present;
>> +       enum npu2_dev_type gpu0_type = NPU2_DEV_TYPE_UNKNOWN;
>>
>>          if (witherspoon_type != WITHERSPOON_TYPE_REDBUD) {
>>                  prlog(PR_DEBUG, "PLAT: Setting all NPU links to NVLink,
>> OpenCAPI only supported on Redbud\n");
>> @@ -305,12 +306,14 @@ static void witherspoon_npu2_device_detect(struct
>> npu2 *npu)
>>                          set_link_details(npu, 1, 3,
>> NPU2_DEV_TYPE_OPENCAPI);
>>                          /* We current don't support using the second
>> link */
>>                          set_link_details(npu, 0, 2,
>> NPU2_DEV_TYPE_UNKNOWN);
>> +                       gpu0_type = npu2_dev_type_opencapi;
>>                  } else {
>>                          prlog(PR_DEBUG, "PLAT: Chip %d GPU#0 is
>> NVLink\n",
>>                                chip->id);
>>                          set_link_details(npu, 0, 0,
>> NPU2_DEV_TYPE_NVLINK);
>>                          set_link_details(npu, 1, 1,
>> NPU2_DEV_TYPE_NVLINK);
>>                          set_link_details(npu, 2, 2,
>> NPU2_DEV_TYPE_NVLINK);
>> +                       gpu0_type = NPU2_DEV_TYPE_NVLINK;
>>                  }
>>          }
>>
>> @@ -324,7 +327,10 @@ static void witherspoon_npu2_device_detect(struct
>> npu2 *npu)
>>                  } else {
>>                          prlog(PR_DEBUG, "PLAT: Chip %d GPU#1 is
>> NVLink\n",
>>                                chip->id);
>> -                       set_link_details(npu, 3, 3,
>> NPU2_DEV_TYPE_NVLINK);
>> +                       if (gpu0_type == NPU2_DEV_TYPE_OPENCAPI)
>> +                               prlog(PR_WARNING, "PLAT: Chip %d GPU#1
>> will operate at reduced performance due to presence of OpenCAPI device.
>> For optimal performance, swap device locations\n", chip->id);
>> +                       else
>> +                               set_link_details(npu, 3, 3,
>> NPU2_DEV_TYPE_NVLINK);
>>                          set_link_details(npu, 4, 4,
>> NPU2_DEV_TYPE_NVLINK);
>>                          set_link_details(npu, 5, 5,
>> NPU2_DEV_TYPE_NVLINK);
>>                  }
>>
>>
>> Otherwise it is harder to spot the exact change the patch is doing which
>> is skipping one call to set_link_details().
>>
> 
> I see your point, though the change is very clearly mentioned in the
> commit and has a giant comment block around it.

And? Do you really expect people to read the comment, the commit log and
believe that there is no other change whatsoever so we can skip on
reading the code movements? Hm.

And I fail to see how new code layout is any better than old one but ok,
may be it is my poor taste.

>>> the GPU will only receive 2 links rather than the usual 3.
>>>
>>> The reason for this is that without the OpenCAPI card, the GPU would be
>>> use links 3-5, connected to NPU bricks 3-5, which needs both stacks 1
>>> and 2
>>> to be in NVLink mode.
>>>
>>> However, with an OpenCAPI card in the GPU#0 slot that uses links 0-1, we
>>> need to use NPU bricks 2-3, which means stack 1 must be set in OpenCAPI
>>> mode. As such, the GPU will be restricted to using links 4 and 5.
>>>
>>> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>>> Reviewed-by: Frederic Barrat <fbarrat at linux.ibm.com>
>>> ---
>>>   platforms/astbmc/witherspoon.c | 62
>>> ++++++++++++++++++++++++++---------
>>>   1 file changed, 47 insertions(+), 15 deletions(-)
>>>
>>> diff --git a/platforms/astbmc/witherspoon.c
>>> b/platforms/astbmc/witherspoon.c
>>> index fe138991696f..c41f0c5b1971 100644
>>> --- a/platforms/astbmc/witherspoon.c
>>> +++ b/platforms/astbmc/witherspoon.c
>>> @@ -233,6 +233,8 @@ static void witherspoon_npu2_device_detect(struct
>>> npu2 *npu)
>>>       int rc;
>>>         bool gpu0_present, gpu1_present;
>>> +    enum npu2_dev_type gpu0_type = NPU2_DEV_TYPE_UNKNOWN;
>>> +    enum npu2_dev_type gpu1_type = NPU2_DEV_TYPE_UNKNOWN;
>>>         if (witherspoon_type != WITHERSPOON_TYPE_REDBUD) {
>>>           prlog(PR_DEBUG, "PLAT: Setting all NPU links to NVLink,
>>> OpenCAPI only supported on Redbud\n");
>>> @@ -298,19 +300,11 @@ static void
>>> witherspoon_npu2_device_detect(struct npu2 *npu)
>>>           if (state & (1 << 0)) {
>>>               prlog(PR_DEBUG, "PLAT: Chip %d GPU#0 is OpenCAPI\n",
>>>                     chip->id);
>>> -            /*
>>> -             * On witherspoon, bricks 2 and 3 are connected to
>>> -             * the lanes matching links 0 and 1 in OpenCAPI mode.
>>> -             */
>>> -            set_link_details(npu, 1, 3, NPU2_DEV_TYPE_OPENCAPI);
>>> -            /* We current don't support using the second link */
>>> -            set_link_details(npu, 0, 2, NPU2_DEV_TYPE_UNKNOWN);
>>> +            gpu0_type = NPU2_DEV_TYPE_OPENCAPI;
>>>           } else {
>>>               prlog(PR_DEBUG, "PLAT: Chip %d GPU#0 is NVLink\n",
>>>                     chip->id);
>>> -            set_link_details(npu, 0, 0, NPU2_DEV_TYPE_NVLINK);
>>> -            set_link_details(npu, 1, 1, NPU2_DEV_TYPE_NVLINK);
>>> -            set_link_details(npu, 2, 2, NPU2_DEV_TYPE_NVLINK);
>>> +            gpu0_type = NPU2_DEV_TYPE_NVLINK;
>>>           }
>>>       }
>>>   @@ -318,16 +312,54 @@ static void
>>> witherspoon_npu2_device_detect(struct npu2 *npu)
>>>           if (state & (1 << 1)) {
>>>               prlog(PR_DEBUG, "PLAT: Chip %d GPU#1 is OpenCAPI\n",
>>>                     chip->id);
>>> -            set_link_details(npu, 4, 4, NPU2_DEV_TYPE_OPENCAPI);
>>> -            /* We current don't support using the second link */
>>> -            set_link_details(npu, 5, 5, NPU2_DEV_TYPE_UNKNOWN);
>>> +            gpu1_type = NPU2_DEV_TYPE_OPENCAPI;
>>>           } else {
>>>               prlog(PR_DEBUG, "PLAT: Chip %d GPU#1 is NVLink\n",
>>>                     chip->id);
>>> +            gpu1_type = NPU2_DEV_TYPE_NVLINK;
>>> +        }
>>> +    }
>>> +
>>> +    if (gpu0_type == NPU2_DEV_TYPE_OPENCAPI) {
>>> +        set_link_details(npu, 1, 3, NPU2_DEV_TYPE_OPENCAPI);
>>> +        /* We currently don't support using the second link */
>>> +        set_link_details(npu, 0, 2, NPU2_DEV_TYPE_UNKNOWN);
>>> +    }
>>> +
>>> +    if (gpu0_type == NPU2_DEV_TYPE_NVLINK) {
>>> +        set_link_details(npu, 0, 0, NPU2_DEV_TYPE_NVLINK);
>>> +        set_link_details(npu, 1, 1, NPU2_DEV_TYPE_NVLINK);
>>> +        set_link_details(npu, 2, 2, NPU2_DEV_TYPE_NVLINK);
>>> +    }
>>> +
>>> +    if (gpu1_type == NPU2_DEV_TYPE_OPENCAPI) {
>>> +        set_link_details(npu, 4, 4, NPU2_DEV_TYPE_OPENCAPI);
>>> +        /* We currently don't support using the second link */
>>> +        set_link_details(npu, 5, 5, NPU2_DEV_TYPE_UNKNOWN);
>>> +    }
>>> +
>>> +    /*
>>> +     * If an OpenCAPI card is connected to GPU#0, and an NVLink GPU is
>>> +     * connected to GPU#1, the GPU will only receive 2 links rather
>>> than the
>>> +     * usual 3.
>>> +     *
>>> +     * The reason for this is that without the OpenCAPI card, the
>>> GPU would
>>> +     * be use links 3-5, connected to NPU bricks 3-5, which needs both
>>> +     * stacks 1 and 2 to be in NVLink mode.
>>> +     *
>>> +     * However, with an OpenCAPI card in the GPU#0 slot that uses
>>> links 0-1,
>>> +     * we need to use NPU bricks 2-3, which means stack 1 must be
>>> set in
>>> +     * OpenCAPI mode. As such, the GPU will be restricted to using
>>> links 4
>>> +     * and 5.
>>
>>
>> Out of curiosity - what exactly does set the OpenCAPI mode for a stack?
>>
>> set_link_details(npu, 1, 3, NPU2_DEV_TYPE_OPENCAPI);?
> 
> That will set the type in struct npu2_dev::type, nothing else. The
> actual setup of the NPU to assign a particular stack to OpenCAPI or
> NVLink takes place in the main init flow, not the platform code, based
> off whatever device types have been assigned.

Right. Good. Thanks.

>>
>>
>>> +     */
>>> +    if (gpu1_type == NPU2_DEV_TYPE_NVLINK) {
>>> +        if (gpu0_type == NPU2_DEV_TYPE_OPENCAPI) {
>>> +            prlog(PR_WARNING, "PLAT: Chip %d GPU#1 will operate at
>>> reduced performance due to presence of OpenCAPI device. For optimal
>>> performance, swap device locations\n", chip->id);
>>> +        } else {
>>>               set_link_details(npu, 3, 3, NPU2_DEV_TYPE_NVLINK);
>>> -            set_link_details(npu, 4, 4, NPU2_DEV_TYPE_NVLINK);
>>> -            set_link_details(npu, 5, 5, NPU2_DEV_TYPE_NVLINK);
>>>           }
>>> +        set_link_details(npu, 4, 4, NPU2_DEV_TYPE_NVLINK);
>>> +        set_link_details(npu, 5, 5, NPU2_DEV_TYPE_NVLINK);
>>
>>
>>
>> btw after this rework you could add this, right?
>>
>> diff --git a/platforms/astbmc/witherspoon.c
>> b/platforms/astbmc/witherspoon.c
>> index fe13899..7ce525c 100644
>> --- a/platforms/astbmc/witherspoon.c
>> +++ b/platforms/astbmc/witherspoon.c
>> @@ -219,6 +219,7 @@ static void set_link_details(struct npu2 *npu,
>> uint32_t link_index,
>>                        link_index);
>>                  return;
>>          }
>> +       assert(dev->brick_index != NPU2_DEV_TYPE_UNKNOWN);
> 
> I don't think you meant this?
> 
> Assuming you meant assert(dev->type != NPU2_DEV_TYPE_UNKNOWN) instead -

Correct, my bad.

> as you can see above I do explicitly set some links to type unknown when
> we're not using them, which strictly speaking I don't have to do, but it
> keeps it very explicit.

I think what I really meant was:
assert(dev->type == NPU2_DEV_TYPE_UNKNOWN)
:)

I must have been thinking of "bug_on" hence "!=" instead of "==" and I
cut-n-pasted wrong "dev->brick_index". Sorry about that. The idea was to
alert about rewriting the type to ensure it is set in one place only.

> 
>>          dev->brick_index = brick_index;
>>          dev->type = type;
>>   }
>>
>>
>>
>>
>>
>>>       }
>>>         return;
>>>
>>
> 

-- 
Alexey