[PATCH 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE

Frederic Barrat fbarrat at linux.ibm.com
Wed Nov 13 04:38:21 AEDT 2019



Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit :
> 
> 
> On 27/09/2019 03:15, Frederic Barrat wrote:
>>
>>
>> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit :
>>> On 9/9/19 5:45 pm, Frederic Barrat wrote:
>>>> Taking a reference on the pci_dev structure was required with initial
>>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under
>>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure.
>>>>
>>>> However, the pci_dev was later removed from the pci_dn structure, but
>>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa:
>>>> Remove unnecessary pcidev from pci_dn").
>>>>
>>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev
>>>> structure, the PE is freed when the device is released. So we don't
>>>> need a reference for the pci_dev stored in the PE, otherwise the
>>>> pci_dev will never be released. Which is not really a surprise as the
>>>> comment (removed here as no longer needed) was stating as much.
>>>>
>>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
>>>> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
>>>> ---
>>>>    arch/powerpc/platforms/powernv/pci-ioda.c | 11 +----------
>>>>    1 file changed, 1 insertion(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> index d8080558d020..92767f006f20 100644
>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>>>            return NULL;
>>>>        }
>>>> -    /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
>>>> -     * pointer in the PE data structure, both should be destroyed at the
>>>> -     * same time. However, this needs to be looked at more closely again
>>>> -     * once we actually start removing things (Hotplug, SR-IOV, ...)
>>>> -     *
>>>> -     * At some point we want to remove the PDN completely anyways
>>>> -     */
>>>> -    pci_dev_get(dev);
>>>>        pdn->pe_number = pe->pe_number;
>>>>        pe->flags = PNV_IODA_PE_DEV;
>>>>        pe->pdev = dev;
>>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>>>            pnv_ioda_free_pe(pe);
>>>>            pdn->pe_number = IODA_INVALID_PE;
>>>>            pe->pdev = NULL;
>>>> -        pci_dev_put(dev);
>>>>            return NULL;
>>>>        }
>>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>>>>                 */
>>>>                dev_info(&npu_pdev->dev,
>>>>                    "Associating to existing PE %x\n", pe_num);
>>>> -            pci_dev_get(npu_pdev);
>>>> +            pci_dev_get(npu_pdev); // still needed after 902bdc57451c ?
>>>
>>> Did you mean to leave that comment in?
>>
>> Yes, I assumed the series wouldn't get in on the first try and a nvlink-minded developer would weigh in :)
> 
> I am looking at it and I am still not so sure what exactly guarantees that lifetime(@dev) == lifetime(@pe). For example,
> sriov_disable() removes VFs first, and only then releases PEs so there is a window when we may have a stale pdev. Not sure.


Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully 
correctly, but I don’t think this patch alters it.

I was discussing with Greg about it, as he had to look in that area in 
the past. My understanding is that this patch is ok as it follows the 
initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a 
reference for the PE, it makes sense and I could trace the origin of the 
pci_dev_get() and it seemed like it should have been removed in a 
previous patch (902bdc57451c).

However, one question is whether this patch breaks nvlink and if nvlink 
assumes the devices won’t go away because we explicitly take a reference 
forever. In npu_dma.c, there are 2 functions which allow to find the GPU 
associated to a npu device, and vice-versa. Both functions return a 
pointer to a struct pci_dev, but they don’t take a reference on the 
device being returned. So that seems dangerous. I’m probably missing 
something.

Alexey, Alistair: what, if anything, guarantees, that the npu or gpu 
devices stay valid. Is it because we simply don’t provide any means to 
get rid of them ? Otherwise, don’t we need the callers of 
pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference 
counting ? I’ve started looking into it and the changes are scary, which 
explains Greg’s related commit 02c5f5394918b.

   Fred



> 
>>
>>    Fred
>>
>>
>>>>                npu_pdn = pci_get_pdn(npu_pdev);
>>>>                rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
>>>>                npu_pdn->pe_number = pe_num;
>>>>
>>>
>>
> 



More information about the Linuxppc-dev mailing list