[PATCH 1/3] powernv/iov: Ensure the pdn for VFs always contains a valid PE number

Oliver O'Halloran oohall at gmail.com
Tue Oct 1 11:33:01 AEST 2019


On Tue, Oct 1, 2019 at 3:09 AM Bjorn Helgaas <helgaas at kernel.org> wrote:
>
> On Mon, Sep 30, 2019 at 12:08:46PM +1000, Oliver O'Halloran wrote:
>
> This is all powerpc, so I assume Michael will handle this.  Just
> random things I noticed; ignore if they don't make sense:
>
> > On PowerNV we use the pcibios_sriov_enable() hook to do two things:
> >
> > 1. Create a pci_dn structure for each of the VFs, and
> > 2. Configure the PHB's internal BARs that map MMIO ranges to PEs
> >    so that each VF has it's own PE. Note that the PE also determines
>
> s/it's/its/
>
> >    the IOMMU table the HW uses for the device.
> >
> > Currently we do not set the pe_number field of the pci_dn immediately after
> > assigning the PE number for the VF that it represents. Instead, we do that
> > in a fixup (see pnv_pci_dma_dev_setup) which is run inside the
> > pcibios_add_device() hook which is run prior to adding the device to the
> > bus.
> >
> > On PowerNV we add the device to it's IOMMU group using a bus notifier and
>
> s/it's/its/
>
> > in order for this to work the PE number needs to be known when the bus
> > notifier is run. This works today since the PE number is set in the fixup
> > which runs before adding the device to the bus. However, if we want to move
> > the fixup to a later stage this will break.
> >
> > We can fix this by setting the pdn->pe_number inside of
> > pcibios_sriov_enable(). There's no good to avoid this since we already have
>
> s/no good/no good reason/ ?
>
> Not quite sure what "this" refers to ... "no good reason to avoid
> setting pdn->pe_number in pcibios_sriov_enable()"?  The double
> negative makes it a little hard to parse.

I agree it's a bit vague, I'll re-word it.

> > all the required information at that point, so... do that. Moving this
> > earlier does cause two problems:
> >
> > 1. We trip the WARN_ON() in the fixup code, and
> > 2. The EEH core will clear pdn->pe_number while recovering VFs.
> >
> > The only justification for either of these is a comment in eeh_rmv_device()
> > suggesting that pdn->pe_number *must* be set to IODA_INVALID_PE in order
> > for the VF to be scanned. However, this comment appears to have no basis in
> > reality so just delete it.
> >
> > Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
> > ---
> > Can't get rid of the fixup entirely since we need it to set the
> > ioda_pe->pdev back-pointer. I'll look at killing that another time.
> > ---
> >  arch/powerpc/kernel/eeh_driver.c          |  6 ------
> >  arch/powerpc/platforms/powernv/pci-ioda.c | 19 +++++++++++++++----
> >  arch/powerpc/platforms/powernv/pci.c      |  4 ----
> >  3 files changed, 15 insertions(+), 14 deletions(-)
> >
> > diff --git a/arch/powerpc/kernel/eeh_driver.c b/arch/powerpc/kernel/eeh_driver.c
> > index d9279d0..7955fba 100644
> > --- a/arch/powerpc/kernel/eeh_driver.c
> > +++ b/arch/powerpc/kernel/eeh_driver.c
> > @@ -541,12 +541,6 @@ static void eeh_rmv_device(struct eeh_dev *edev, void *userdata)
> >
> >               pci_iov_remove_virtfn(edev->physfn, pdn->vf_index);
> >               edev->pdev = NULL;
> > -
> > -             /*
> > -              * We have to set the VF PE number to invalid one, which is
> > -              * required to plug the VF successfully.
> > -              */
> > -             pdn->pe_number = IODA_INVALID_PE;
> >  #endif
> >               if (rmv_data)
> >                       list_add(&edev->rmv_entry, &rmv_data->removed_vf_list);
> > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> > index 5e3172d..70508b3 100644
> > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > @@ -1558,6 +1558,10 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> >
> >       /* Reserve PE for each VF */
> >       for (vf_index = 0; vf_index < num_vfs; vf_index++) {
> > +             int vf_devfn = pci_iov_virtfn_devfn(pdev, vf_index);
> > +             int vf_bus = pci_iov_virtfn_bus(pdev, vf_index);
> > +             struct pci_dn *vf_pdn;
> > +
> >               if (pdn->m64_single_mode)
> >                       pe_num = pdn->pe_num_map[vf_index];
> >               else
> > @@ -1570,13 +1574,11 @@ static void pnv_ioda_setup_vf_PE(struct pci_dev *pdev, u16 num_vfs)
> >               pe->pbus = NULL;
> >               pe->parent_dev = pdev;
> >               pe->mve_number = -1;
> > -             pe->rid = (pci_iov_virtfn_bus(pdev, vf_index) << 8) |
> > -                        pci_iov_virtfn_devfn(pdev, vf_index);
> > +             pe->rid = (vf_bus << 8) | vf_devfn;
> >
> >               pe_info(pe, "VF %04d:%02d:%02d.%d associated with PE#%x\n",
>
> Not related to *this* patch, but this looks like maybe it's supposed
> to match the pci_name(), e.g., "%04x:%02x:%02x.%d" from
> pci_setup_device()?  If so, the "%04d:%02d:%02d" here will be
> confusing since the decimal & hex won't always match.

That looks plain wrong. I'll send a separate patch to fix it.

> >                       hose->global_number, pdev->bus->number,
>
> Consider pci_domain_nr(bus) instead of hose->global_number?  It would
> be nice if you had the pci_dev * for each VF so you could just use
> pci_name(vf) instead of all this domain/bus/PCI_SLOT/FUNC.

Unfortunately, we don't have pci_devs for the VFs when
pcibios_sriov_enable() is called. On powernv (and pseries) we only
permit config accesses to a BDF when a pci_dn exists for that BDF
because the platform code assumes that one will exist. As a result we
can't scan the VFs until after pcibios_sriov_enable() is called since
that's where pci_dn's are created for the VFs. I'm working on removing
the use of pci_dn from powernv entirely though. Once that's done we
should revisit whether any of this infrastructure is necessary...

Oliver


More information about the Linuxppc-dev mailing list