[PATCH 01/11] powerpc/powernv/ioda: Fix ref count for devices with their own PE

Alistair Popple alistair at popple.id.au
Mon Nov 18 12:04:53 AEDT 2019


On Wednesday, 13 November 2019 4:38:21 AM AEDT Frederic Barrat wrote:
> 
> Le 27/09/2019 à 08:54, Alexey Kardashevskiy a écrit :
> > 
> > 
> > On 27/09/2019 03:15, Frederic Barrat wrote:
> >>
> >>
> >> Le 26/09/2019 à 18:44, Andrew Donnellan a écrit :
> >>> On 9/9/19 5:45 pm, Frederic Barrat wrote:
> >>>> Taking a reference on the pci_dev structure was required with initial
> >>>> commit 184cd4a3b962 ("powerpc/powernv: PCI support for p7IOC under
> >>>> OPAL v2"), where we were storing the pci_dev in the pci_dn structure.
> >>>>
> >>>> However, the pci_dev was later removed from the pci_dn structure, but
> >>>> the reference was kept. See commit 902bdc57451c ("powerpc/powernv/idoa:
> >>>> Remove unnecessary pcidev from pci_dn").
> >>>>
> >>>> The pnv_ioda_pe structure life cycle is the same as the pci_dev
> >>>> structure, the PE is freed when the device is released. So we don't
> >>>> need a reference for the pci_dev stored in the PE, otherwise the
> >>>> pci_dev will never be released. Which is not really a surprise as the
> >>>> comment (removed here as no longer needed) was stating as much.
> >>>>
> >>>> Fixes: 902bdc57451c ("powerpc/powernv/idoa: Remove unnecessary pcidev 
from pci_dn")
> >>>> Signed-off-by: Frederic Barrat <fbarrat at linux.ibm.com>
> >>>> ---
> >>>>    arch/powerpc/platforms/powernv/pci-ioda.c | 11 +----------
> >>>>    1 file changed, 1 insertion(+), 10 deletions(-)
> >>>>
> >>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/
platforms/powernv/pci-ioda.c
> >>>> index d8080558d020..92767f006f20 100644
> >>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>> @@ -1062,14 +1062,6 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> >>>>            return NULL;
> >>>>        }
> >>>> -    /* NOTE: We get only one ref to the pci_dev for the pdn, not for 
the
> >>>> -     * pointer in the PE data structure, both should be destroyed at 
the
> >>>> -     * same time. However, this needs to be looked at more closely 
again
> >>>> -     * once we actually start removing things (Hotplug, SR-IOV, ...)
> >>>> -     *
> >>>> -     * At some point we want to remove the PDN completely anyways
> >>>> -     */
> >>>> -    pci_dev_get(dev);
> >>>>        pdn->pe_number = pe->pe_number;
> >>>>        pe->flags = PNV_IODA_PE_DEV;
> >>>>        pe->pdev = dev;
> >>>> @@ -1084,7 +1076,6 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_dev_PE(struct pci_dev *dev)
> >>>>            pnv_ioda_free_pe(pe);
> >>>>            pdn->pe_number = IODA_INVALID_PE;
> >>>>            pe->pdev = NULL;
> >>>> -        pci_dev_put(dev);
> >>>>            return NULL;
> >>>>        }
> >>>> @@ -1228,7 +1219,7 @@ static struct pnv_ioda_pe 
*pnv_ioda_setup_npu_PE(struct pci_dev *npu_pdev)
> >>>>                 */
> >>>>                dev_info(&npu_pdev->dev,
> >>>>                    "Associating to existing PE %x\n", pe_num);
> >>>> -            pci_dev_get(npu_pdev);
> >>>> +            pci_dev_get(npu_pdev); // still needed after 902bdc57451c 
?
> >>>
> >>> Did you mean to leave that comment in?
> >>
> >> Yes, I assumed the series wouldn't get in on the first try and a nvlink-
minded developer would weigh in :)
> > 
> > I am looking at it and I am still not so sure what exactly guarantees that 
lifetime(@dev) == lifetime(@pe). For example,
> > sriov_disable() removes VFs first, and only then releases PEs so there is a 
window when we may have a stale pdev. Not sure.
> 
> 
> Indeed, for SRIOV, PE life-cycle is handled differently. And hopefully 
> correctly, but I don’t think this patch alters it.
> 
> I was discussing with Greg about it, as he had to look in that area in 
> the past. My understanding is that this patch is ok as it follows the 
> initial comment in pnv_ioda_setup_dev_PE() that we don’t need to get a 
> reference for the PE, it makes sense and I could trace the origin of the 
> pci_dev_get() and it seemed like it should have been removed in a 
> previous patch (902bdc57451c).
> 
> However, one question is whether this patch breaks nvlink and if nvlink 
> assumes the devices won’t go away because we explicitly take a reference 
> forever. In npu_dma.c, there are 2 functions which allow to find the GPU 
> associated to a npu device, and vice-versa. Both functions return a 
> pointer to a struct pci_dev, but they don’t take a reference on the 
> device being returned. So that seems dangerous. I’m probably missing 
> something.
> 
> Alexey, Alistair: what, if anything, guarantees, that the npu or gpu 
> devices stay valid. Is it because we simply don’t provide any means to 
> get rid of them ? Otherwise, don’t we need the callers of 
> pnv_pci_get_gpu_dev() and pnv_pci_get_npu_dev() to worry about reference 
> counting ? I’ve started looking into it and the changes are scary, which 
> explains Greg’s related commit 02c5f5394918b.

To be honest the reference counting looks like it has evolved into something 
quite suspect and I don't think you're missing anything. In practice though we 
likely haven't hit any issues because the original callers didn't store 
references to the pdev which would make the window quite small (although the 
pass through work may have changed that). And as you say there simply wasn't 
any means to get rid of them anyway (EEH, hotplug, etc. has never been 
implemented or supported for GPUs and all sorts of terrible things happen if 
you try).

The best solution would likely be to review the reference counting and to 
teach callers of get_*_dev() to call pci_put_dev(), etc.

- Alistair

>    Fred
> 
> 
> 
> > 
> >>
> >>    Fred
> >>
> >>
> >>>>                npu_pdn = pci_get_pdn(npu_pdev);
> >>>>                rid = npu_pdev->bus->number << 8 | npu_pdn->devfn;
> >>>>                npu_pdn->pe_number = pe_num;
> >>>>
> >>>
> >>
> > 
> 
> 






More information about the Linuxppc-dev mailing list