[PATCH kernel 2/2] powerpc/powernv/ioda2: Delay PE disposal

Gavin Shan gwshan at linux.vnet.ibm.com
Thu Apr 21 10:21:07 AEST 2016


On Fri, Apr 08, 2016 at 04:36:44PM +1000, Alexey Kardashevskiy wrote:
>When SRIOV is disabled, the existing code presumes there is no
>virtual function (VF) in use and destroys all associated PEs.
>However it is possible to get into the situation when the user
>activated SRIOV disabling while a VF is still in use via VFIO.
>For example, unbinding a physical function (PF) while there is a guest
>running with a VF passed throuhgh via VFIO will trigger the bug.
>
>This defines an IODA2-specific IOMMU group release() callback.
>This moves all the disposal code from pnv_ioda_release_vf_PE() to this
>new callback so the cleanup happens when the last user of an IOMMU
>group released the reference.
>
>As pnv_pci_ioda2_release_dma_pe() was reduced to just calling
>iommu_group_put(), this merges pnv_pci_ioda2_release_dma_pe()
>into pnv_ioda_release_vf_PE().
>

Sorry, I don't understand how it works. When PF's driver disables
IOV capability, the VF cannnot work. The guest is unlikely to know
that and still continue accessing the VF's resources (e.g. config
space and MMIO registers). It would cause EEH errors.

>Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>---
> arch/powerpc/platforms/powernv/pci-ioda.c | 33 +++++++++++++------------------
> 1 file changed, 14 insertions(+), 19 deletions(-)
>
>diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
>index ce9f2bf..8108c54 100644
>--- a/arch/powerpc/platforms/powernv/pci-ioda.c
>+++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>@@ -1333,27 +1333,25 @@ static void pnv_pci_ioda2_set_bypass(struct pnv_ioda_pe *pe, bool enable);
> static void pnv_pci_ioda2_group_release(void *iommu_data)
> {
> 	struct iommu_table_group *table_group = iommu_data;
>+	struct pnv_ioda_pe *pe = container_of(table_group,
>+			struct pnv_ioda_pe, table_group);
>+	struct pci_controller *hose = pci_bus_to_host(pe->parent_dev->bus);

pe->parent_dev would be NULL for non-VF-PEs and it's protected by CONFIG_PCI_IOV
in pci.h.

>+	struct pnv_phb *phb = hose->private_data;
>+	struct iommu_table *tbl = pe->table_group.tables[0];
>+	int64_t rc;
>
>-	table_group->group = NULL;
>-}
>-
>-static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct pnv_ioda_pe *pe)
>-{
>-	struct iommu_table    *tbl;
>-	int64_t               rc;
>-
>-	tbl = pe->table_group.tables[0];
> 	rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
> 	if (rc)
> 		pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>
> 	pnv_pci_ioda2_set_bypass(pe, false);
>-	if (pe->table_group.group) {
>-		iommu_group_put(pe->table_group.group);
>-		BUG_ON(pe->table_group.group);
>-	}
>+
>+	BUG_ON(!tbl);
> 	pnv_pci_ioda2_table_free_pages(tbl);
>-	iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>+	iommu_free_table(tbl, of_node_full_name(pe->parent_dev->dev.of_node));
>+
>+	pnv_ioda_deconfigure_pe(phb, pe);
>+	pnv_ioda_free_pe(phb, pe->pe_number);
> }

It's not correct enough. One PE is comprised of DMA, MMIO, mapping info etc.
This function disposes all of them when DMA finishes its job. I don't figure
out a better way to represent all of them and their relationship. I guess it's
worthy to have something in long term though it's not trival work.

>
> static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>@@ -1376,16 +1374,13 @@ static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
> 		if (pe->parent_dev != pdev)
> 			continue;
>
>-		pnv_pci_ioda2_release_dma_pe(pdev, pe);
>-
> 		/* Remove from list */
> 		mutex_lock(&phb->ioda.pe_list_mutex);
> 		list_del(&pe->list);
> 		mutex_unlock(&phb->ioda.pe_list_mutex);
>
>-		pnv_ioda_deconfigure_pe(phb, pe);
>-
>-		pnv_ioda_free_pe(phb, pe->pe_number);
>+		if (pe->table_group.group)
>+			iommu_group_put(pe->table_group.group);
> 	}
> }
>
>-- 
>2.5.0.rc3
>



More information about the Linuxppc-dev mailing list