[Skiboot] [PATCH 4/4] hw/npu2-common: Allow mixed mode NPU setups

Andrew Donnellan andrew.donnellan at au1.ibm.com
Fri Oct 19 16:24:39 AEDT 2018


On 19/10/18 3:10 am, Frederic Barrat wrote:
> 
> 
> Le 18/10/2018 à 03:16, Andrew Donnellan a écrit :
>> Now that we have all the support in place for NPUs with both NVLink and
>> OpenCAPI devices, get rid of the error that aborts NPU init when a mixed
>> setup is detected.
>>
>> While we're there, rename setup_devices() to more accurately reflect what
>> it does, and move the calls to the NVLink/OpenCAPI setup code out of 
>> there
>> into the main probe function.
>>
>> Signed-off-by: Andrew Donnellan <andrew.donnellan at au1.ibm.com>
>> ---
>>   hw/npu2-common.c   | 23 ++++-------------------
>>   hw/npu2-opencapi.c |  5 -----
>>   2 files changed, 4 insertions(+), 24 deletions(-)
>>
>> diff --git a/hw/npu2-common.c b/hw/npu2-common.c
>> index 1b07d157b744..16533286ca79 100644
>> --- a/hw/npu2-common.c
>> +++ b/hw/npu2-common.c
>> @@ -361,26 +361,19 @@ failed:
>>       return NULL;
>>   }
>>
>> -static void setup_devices(struct npu2 *npu)
>> +static void add_link_type_properties(struct npu2 *npu)
>>   {
>> -    bool nvlink_detected = false, ocapi_detected = false;
>>       struct npu2_dev *dev;
>>
>> -    /*
>> -     * TODO: In future, we'll do brick configuration here to support 
>> mixed
>> -     * setups.
>> -     */
>>       for (int i = 0; i < npu->total_devices; i++) {
>>           dev = &npu->devices[i];
>>           switch (dev->type) {
>>           case NPU2_DEV_TYPE_NVLINK:
>> -            nvlink_detected = true;
>>               dt_add_property_strings(dev->dt_node,
>>                           "ibm,npu-link-type",
>>                           "nvlink");
>>               break;
>>           case NPU2_DEV_TYPE_OPENCAPI:
>> -            ocapi_detected = true;
>>               dt_add_property_strings(dev->dt_node,
>>                           "ibm,npu-link-type",
>>                           "opencapi");
>> @@ -393,16 +386,6 @@ static void setup_devices(struct npu2 *npu)
>>                           "unknown");
>>           }
>>       }
>> -
>> -    if (nvlink_detected && ocapi_detected) {
>> -        prlog(PR_ERR, "NPU: NVLink and OpenCAPI devices on same chip 
>> not supported, aborting NPU init\n");
>> -        return;
>> -    }
>> -
>> -    if (nvlink_detected)
>> -        npu2_nvlink_init_npu(npu);
>> -    else if (ocapi_detected)
>> -        npu2_opencapi_init_npu(npu);
>>   }
>>
>>   void probe_npu2(void)
>> @@ -438,7 +421,9 @@ void probe_npu2(void)
>>           if (!npu)
>>               continue;
>>           platform.npu2_device_detect(npu);
>> +        add_link_type_properties(npu);
>>           set_brick_config(npu);
>> -        setup_devices(npu);
>> +        npu2_nvlink_init_npu(npu);
>> +        npu2_opencapi_init_npu(npu);
> 
> We're always calling npu2_nvlink_init_npu() and 
> npu2_opencapi_init_npu(). We may consider exiting early from those 
> functions if there's no device of that type under the npu. Otherwise, 
> we're creating extra PHBs for nothing (at least npu2_nvlink_init_npu() 
> would).

Good catch, and we're also unnecessarily disabling fast reboot etc etc...

> 
> I think we're also allocating/setting up the IRQs twice. I can imagine 
> it's breaking things on the nvlink side of things (since we'll overwrite 
> the NPU interrupt vector address with the opencapi one).

Yep this should have been obvious... I'll need to handle that in common 
code.

> 
> Don't we also have some overlap on the global mmio bar assignments 
> (setup_global_mmio_bar() and assign_mmio_bars)? Though I think we end up 
> overwriting the same content.

I want to merge the MMIO BAR assignments... hmm, I really should stop 
putting it off and just do it in this series. It would make things much 
neater I think.

As you can probably guess, at the moment we rely on the OpenCAPI init 
being done second else you end up with all the NVLink BAR values instead.

Also it looks like for some reason, the OpenCAPI assignment of the PHY 
BARs is being done in reverse (stack 2 gets phys map index 0, stack 1 
gets index 1, it's the reverse for NVLink). Need to look into that...

-- 
Andrew Donnellan              OzLabs, ADL Canberra
andrew.donnellan at au1.ibm.com  IBM Australia Limited



More information about the Skiboot mailing list