[PATCH kernel 2/2] powerpc/powernv/ioda2: Delay PE disposal

David Gibson david at gibson.dropbear.id.au
Fri Apr 15 12:26:43 AEST 2016


On Fri, Apr 15, 2016 at 11:29:32AM +1000, Alexey Kardashevskiy wrote:
> On 04/14/2016 11:40 AM, David Gibson wrote:
> >On Fri, Apr 08, 2016 at 04:36:44PM +1000, Alexey Kardashevskiy wrote:
> >>When SRIOV is disabled, the existing code presumes there is no
> >>virtual function (VF) in use and destroys all associated PEs.
> >>However it is possible to get into the situation when the user
> >>activated SRIOV disabling while a VF is still in use via VFIO.
> >>For example, unbinding a physical function (PF) while there is a guest
> >>running with a VF passed throuhgh via VFIO will trigger the bug.
> >>
> >>This defines an IODA2-specific IOMMU group release() callback.
> >>This moves all the disposal code from pnv_ioda_release_vf_PE() to this
> >>new callback so the cleanup happens when the last user of an IOMMU
> >>group released the reference.
> >>
> >>As pnv_pci_ioda2_release_dma_pe() was reduced to just calling
> >>iommu_group_put(), this merges pnv_pci_ioda2_release_dma_pe()
> >>into pnv_ioda_release_vf_PE().
> >>
> >>Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> >>---
> >>  arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++++++++++++------------------
> >>  1 file changed, 14 insertions(+), 19 deletions(-)
> >>
> >>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>index ce9f2bf..8108c54 100644
> >>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>@@ -1333,27 +1333,25 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> >>  static void pnv_pci_ioda2_group_release(void *iommu_data)
> >>  {
> >>  	struct iommu_table_group *table_group = iommu_data;
> >>+	struct pnv_ioda_pe *pe = container_of(table_group,
> >>+			struct pnv_ioda_pe, table_group);
> >>+	struct pci_controller *hose = pci_bus_to_host(pe->parent_dev->bus);
> >>+	struct pnv_phb *phb = hose->private_data;
> >>+	struct iommu_table *tbl = pe->table_group.tables[0];
> >>+	int64_t rc;
> >>
> >>-	table_group->group = NULL;
> >>-}
> >>-
> >>-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
> >>-{
> >>-	struct iommu_table    *tbl;
> >>-	int64_t               rc;
> >>-
> >>-	tbl = pe->table_group.tables[0];
> >>  	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> >
> >Is it safe to go manipulating the PE windows, etc. after SR-IOV is
> >disabled?
> 
> Manipulating windows in this case is just updating 8 bytes in the TVT. At
> this point a VF is expected to be destroyed but PE is expected to remain not
> free so pnv_ioda2_pick_m64_pe() (or pnv_ioda2_reserve_m64_pe()?) won't use
> it.

Ok.

> >When SR-IOV is disabled, you need to immediately disable the VF (I'm
> >guessing that happens somewhere) and stop all access to the VF
> >"hardware".
> 
> drivers/pci/iov.c
> ===
> static void sriov_disable(struct pci_dev *dev)
> {
> ...
> for (i = 0; i < iov->num_VFs; i++)
>         pci_iov_remove_virtfn(dev, i, 0);
> ...
> pcibios_sriov_disable(dev);
> ===
> 
> pcibios_sriov_disable() is where pnv_pci_ioda2_release_dma_pe() is called from.
> 
> >Only the iommu group structure *has* to stick around
> >until the reference count drops to zero.  I think other structures and
> >hardware reconfiguration can be deferred or done immediately,
> >whichever is more convenient.
> 
> I deferred everything because of convenience as iommu_table_group is
> embedded into pnv_ioda struct, not a pointer.

Ok.


With those queries answered,

Reviewed-by: David Gibson <david at gibson.dropbear.id.au>

> >>  	if (rc)
> >>  		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
> >>
> >>  	pnv_pci_ioda2_set_bypass(pe, false);
> >>-	if (pe->table_group.group) {
> >>-		iommu_group_put(pe->table_group.group);
> >>-		BUG_ON(pe->table_group.group);
> >>-	}
> >>+
> >>+	BUG_ON(!tbl);
> >>  	pnv_pci_ioda2_table_free_pages(tbl);
> >>-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
> >>+	iommu_free_table(tbl, of_node_full_name(pe->parent_dev->dev.of_node));
> >>+
> >>+	pnv_ioda_deconfigure_pe(phb, pe);
> >>+	pnv_ioda_free_pe(phb, pe->pe_number);
> >>  }
> >>
> >>  static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> >>@@ -1376,16 +1374,13 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> >>  		if (pe->parent_dev != pdev)
> >>  			continue;
> >>
> >>-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
> >>-
> >>  		/* Remove from list */
> >>  		mutex_lock(&phb->ioda.pe_list_mutex);
> >>  		list_del(&pe->list);
> >>  		mutex_unlock(&phb->ioda.pe_list_mutex);
> >>
> >>-		pnv_ioda_deconfigure_pe(phb, pe);
> >>-
> >>-		pnv_ioda_free_pe(phb, pe->pe_number);
> >>+		if (pe->table_group.group)
> >>+			iommu_group_put(pe->table_group.group);
> >>  	}
> >>  }
> >>
> >
> 
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20160415/67e099fc/attachment.sig>


More information about the Linuxppc-dev mailing list