[PATCH 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE
Frederic Barrat
fbarrat at linux.ibm.com
Wed Nov 13 04:38:21 AEDT 2019
Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit :
>
>
> On 27/09/2019 03:15, Frederic Barrat wrote:
>>
>>
>> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit :
>>> On 9/9/19 5:45 pm, Frederic Barrat wrote:
>>>> Taking a reference on the pci_dev structure was required with initial
>>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under
>>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure.
>>>>
>>>> However, the pci_dev was later removed from the pci_dn structure, but
>>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa:
>>>> Remove unnecessary pcidev from pci_dn").
>>>>
>>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev
>>>> structure, the PE is freed when the device is released. So we don't
>>>> need a reference for the pci_dev stored in the PE, otherwise the
>>>> pci_dev will never be released. Which is not really a surprise as the
>>>> comment (removed here as no longer needed) was stating as much.
>>>>
>>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev from pci_dn")
>>>> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
>>>> ---
>>>> arch/powerpc/platforms/powernv/pci-ioda.c | 11 +----------
>>>> 1 file changed, 1 insertion(+), 10 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> index d8080558d020..92767f006f20 100644
>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>>> return NULL;
>>>> }
>>>> - /* NOTE: We get only one ref to the pci_dev for the pdn, not for the
>>>> - * pointer in the PE data structure, both should be destroyed at the
>>>> - * same time. However, this needs to be looked at more closely again
>>>> - * once we actually start removing things (Hotplug, SR-IOV, ...)
>>>> - *
>>>> - * At some point we want to remove the PDN completely anyways
>>>> - */
>>>> - pci_dev_get(dev);
>>>> pdn->pe_number = pe->pe_number;
>>>> pe->flags = PNV_IODA_PE_DEV;
>>>> pe->pdev = dev;
>>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe *pnv_ioda_setup_dev_PE(struct pci_dev *dev)
>>>> pnv_ioda_free_pe(pe);
>>>> pdn->pe_number = IODA_INVALID_PE;
>>>> pe->pdev = NULL;
>>>> - pci_dev_put(dev);
>>>> return NULL;
>>>> }
>>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe *pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
>>>> */
>>>> dev_info(&npu_pdev->dev,
>>>> "Associating to existing PE %x\n", pe_num);
>>>> - pci_dev_get(npu_pdev);
>>>> + pci_dev_get(npu_pdev); // still needed after 902bdc57451c ?
>>>
>>> Did you mean to leave that comment in?
>>
>> Yes, I assumed the series wouldn't get in on the first try and a nvlink-minded developer would weigh in :)
>
> I am looking at it and I am still not so sure what exactly guarantees that lifetime(@dev) == lifetime(@pe). For example,
> sriov_disable() removes VFs first, and only then releases PEs so there is a window when we may have a stale pdev. Not sure.
Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully
correctly, but I don’t think this patch alters it.
I was discussing with Greg about it, as he had to look in that area in
the past. My understanding is that this patch is ok as it follows the
initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a
reference for the PE, it makes sense and I could trace the origin of the
pci_dev_get() and it seemed like it should have been removed in a
previous patch (902bdc57451c).
However, one question is whether this patch breaks nvlink and if nvlink
assumes the devices won’t go away because we explicitly take a reference
forever. In npu_dma.c, there are 2 functions which allow to find the GPU
associated to a npu device, and vice-versa. Both functions return a
pointer to a struct pci_dev, but they don’t take a reference on the
device being returned. So that seems dangerous. I’m probably missing
something.
Alexey, Alistair: what, if anything, guarantees, that the npu or gpu
devices stay valid. Is it because we simply don’t provide any means to
get rid of them ? Otherwise, don’t we need the callers of
pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference
counting ? I’ve started looking into it and the changes are scary, which
explains Greg’s related commit 02c5f5394918b.
Fred
>
>>
>> Fred
>>
>>
>>>> npu_pdn = pci_get_pdn(npu_pdev);
>>>> rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
>>>> npu_pdn->pe_number = pe_num;
>>>>
>>>
>>
>
More information about the Linuxppc-dev
mailing list