[Very RFC 40/46] powernv/npu: Don't drop refcount when looking up GPU pci_devs

Frederic Barrat fbarrat at linux.ibm.com
Wed Nov 27 20:10:13 AEDT 2019



Le 27/11/2019 à 09:24, Greg Kurz a écrit :
> On Wed, 27 Nov 2019 18:09:40 +1100
> Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
> 
>>
>>
>> On 20/11/2019 12:28, Oliver O'Halloran wrote:
>>> The comment here implies that we don't need to take a ref to the pci_dev
>>> because the ioda_pe will always have one. This implies that the current
>>> expection is that the pci_dev for an NPU device will *never* be torn
>>> down since the ioda_pe having a ref to the device will prevent the
>>> release function from being called.
>>>
>>> In other words, the desired behaviour here appears to be leaking a ref.
>>>
>>> Nice!
>>
>>
>> There is a history: https://patchwork.ozlabs.org/patch/1088078/
>>
>> We did not fix anything in particular then, we do not seem to be fixing
>> anything now (in other words - we cannot test it in a normal natural
>> way). I'd drop this one.
>>
> 
> Yeah, I didn't fix anything at the time. Just reverted to the ref
> count behavior we had before:
> 
> https://patchwork.ozlabs.org/patch/829172/
> 
> Frederic recently posted his take on the same topic from the OpenCAPI
> point of view:
> 
> http://patchwork.ozlabs.org/patch/1198947/
> 
> He seems to indicate the NPU devices as the real culprit because
> nobody ever cared for them to be removable. Fixing that seems be
> a chore nobody really wants to address obviously... :-\


I had taken a stab at not leaking a ref for the nvlink devices and do 
the proper thing regarding ref counting (i.e. fixing all the callers of 
get_pci_dev() to drop the reference when they were done). With that, I 
could see that the ref count of the nvlink devices could drop to 0 
(calling remove for the device in /sys) and that the devices could go away.

But then, I realized it's not necessarily desirable at this point. There 
are several comments in the code saying the npu devices (for nvlink) 
don't go away, there's no device release callback defined when it seems 
there should be, at least to handle releasing PEs.... All in all, it 
seems that some work would be needed. And if it hasn't been required by 
now...

   Fred


>>
>>
>>>
>>> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
>>> ---
>>>   arch/powerpc/platforms/powernv/npu-dma.c | 11 +++--------
>>>   1 file changed, 3 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/npu-dma.c b/arch/powerpc/platforms/powernv/npu-dma.c
>>> index 72d3749da02c..2eb6e6d45a98 100644
>>> --- a/arch/powerpc/platforms/powernv/npu-dma.c
>>> +++ b/arch/powerpc/platforms/powernv/npu-dma.c
>>> @@ -28,15 +28,10 @@ static struct pci_dev *get_pci_dev(struct device_node *dn)
>>>   			break;
>>>   
>>>   	/*
>>> -	 * pci_get_domain_bus_and_slot() increased the reference count of
>>> -	 * the PCI device, but callers don't need that actually as the PE
>>> -	 * already holds a reference to the device. Since callers aren't
>>> -	 * aware of the reference count change, call pci_dev_put() now to
>>> -	 * avoid leaks.
>>> +	 * NB: for_each_pci_dev() elevates the pci_dev refcount.
>>> +	 * Caller is responsible for dropping the ref when it's
>>> +	 * finished with it.
>>>   	 */
>>> -	if (pdev)
>>> -		pci_dev_put(pdev);
>>> -
>>>   	return pdev;
>>>   }
>>>   
>>>
>>
> 



More information about the Linuxppc-dev mailing list