[PATCH V16 00/20] Enable SRIOV on POWER8

Wei Yang weiyang at linux.vnet.ibm.com
Wed Mar 25 19:23:41 AEDT 2015


This patchset enables the SRIOV on POWER8.

The general idea is put each VF into one individual PE and allocate required
resources like MMIO/DMA/MSI. The major difficulty comes from the MMIO
allocation and adjustment for PF's IOV BAR.

On P8, we use M64BT to cover a PF's IOV BAR, which could make an individual VF
sit in its own PE. This gives more flexiblity, while at the mean time it
brings on some restrictions on the PF's IOV BAR size and alignment.

To achieve this effect, we need to do some hack on pci devices's resources.
1. Expand the IOV BAR properly.
   Done by pnv_pci_ioda_fixup_iov_resources().
2. Shift the IOV BAR properly.
   Done by pnv_pci_vf_resource_shift().
3. IOV BAR alignment is calculated by arch dependent function instead of an
   individual VF BAR size.
   Done by pnv_pcibios_sriov_resource_alignment().
4. Take the IOV BAR alignment into consideration in the sizing and assigning.
   This is achieved by commit: "PCI: Take additional IOV BAR alignment in
   sizing and assigning"

Test Environment:
       The SRIOV device tested is Emulex Lancer(10df:e220) and
       Mellanox ConnectX-3(15b3:1003) on POWER8.

Examples on pass through a VF to guest through vfio:
	1. unbind the original driver and bind to vfio-pci driver
	   echo 0000:06:0d.0 > /sys/bus/pci/devices/0000:06:0d.0/driver/unbind
	   echo  1102 0002 > /sys/bus/pci/drivers/vfio-pci/new_id
	   Note: this should be done for each device in the same iommu_group
	2. Start qemu and pass device through vfio
	   /home/ywywyang/git/qemu-impreza/ppc64-softmmu/qemu-system-ppc64 \
		   -M pseries -m 2048 -enable-kvm -nographic \
		   -drive file=/home/ywywyang/kvm/fc19.img \
		   -monitor telnet:localhost:5435,server,nowait -boot cd \
		   -device "spapr-pci-vfio-host-bridge,id=CXGB3,iommu=26,index=6"

Verify this is the exact VF response:
	1. ping from a machine in the same subnet(the broadcast domain)
	2. run arp -n on this machine
	   9.115.251.20             ether   00:00:c9:df:ed:bf   C eth0
	3. ifconfig in the guest
	   # ifconfig eth1
	   eth1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
	        inet 9.115.251.20  netmask 255.255.255.0  broadcast 9.115.251.255
		inet6 fe80::200:c9ff:fedf:edbf  prefixlen 64  scopeid 0x20<link>
	        ether 00:00:c9:df:ed:bf  txqueuelen 1000 (Ethernet)
	        RX packets 175  bytes 13278 (12.9 KiB)
	        RX errors 0  dropped 0  overruns 0  frame 0
		TX packets 58  bytes 9276 (9.0 KiB)
	        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
	4. They have the same MAC address

	Note: make sure you shutdown other network interfaces in guest.

---
v16:
   * rebased on Ben's next-eeh
   * Following two patches have been divided into three. First two are already
     merged, the third one is renamed to "powerpc/pci: Create pci_dn for VFs"
     and sent in this patch set.
     8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
     a3460fc powerpc/pci: Refactor pci_dn
v15:
   * Add Ack from Bjorn
   * Make more detailed comment for pnv_pci_vf_resource_shift()
v14:
   * call ppc_md.pcibios_fixup_sriov() in pcibios_add_device
   * add more explanation in change log
   * Following patches have been reordered to the beginning.
     8ec20d6 powerpc/powernv: Use pci_dn, not device_node, in PCI config accessor
     a3460fc powerpc/pci: Refactor pci_dn
     These two patches will be modified to merge with other patches which are
     under discussion/review in ppc mail list. Some changes may also be made in
     other patches, which I didn't include them in this series, so that the
     auto build robot could work on this.
     There may have several changes in powerpc arch, which not effect the pci
     core. So after this patch set pass the review in pci community, I would
     rebase this series on ppc brach and send out for comment.
   * use add_res->min_align as the alignment in reassign_resources_sorted()
   * some cleanup in Document
v13:
   * fix error in pcibios_iov_resource_alignment(), use pdev instead of dev
   * rename vf_num to num_vfs in pcibios_sriov_enable(),
     pnv_pci_vf_resource_shift(), pnv_pci_sriov_disable(),
     pnv_pci_sriov_enable(), pnv_pci_ioda2_setup_dma_pe()
   * add more explanation in commit "powerpc/pci: Don't unset PCI resources
     for VFs"
   * fix IOV BAR in hotplug path as well, and don't fixup an already added
     device
   * use roundup_pow_of_two() instead of __roundup_pow_of_two()
   * this is based on v4.0-rc1
v12:
   * remove "align" parameter from pcibios_iov_resource_alignment()
     default version returns pci_iov_resource_size() instead of the
     "align" parameter
   * in powerpc pcibios_iov_resource_alignment(), return
     pci_iov_resource_size() if there's no ppc_md function pointer
   * in pci_sriov_resource_alignment(), don't re-read base, since we
     saved the required alignment when reading it the first time
   * remove "vf_num" parameter from add_dev_pci_info() and
     remove_dev_pci_info(); use pci_sriov_get_totalvfs() instead
   * use dev_warn() instead of pr_warn() when possible
   * check to be sure IOV BAR is still in range after shifting, change
     pnv_pci_vf_resource_shift() from void to int
   * improve sriov_enable() error message
   * improve SR-IOV BAR sizing message
   * index IOV resources in conventional style
   * include preamble patches (refresh offset/stride when updating numVFs,
     calculate max buses required
   * restructure pci_iov_max_bus_range() to return value instead of updating
     internally, rename to virtfn_max_buses()
   * fix typos & formatting
   * expand documentation
v11:
   * fix some compile warning
v10:
   * remove weak function pcibios_iov_resource_size()
     the VF BAR size is stored in pci_sriov structure and retrieved from
     pci_iov_resource_size()
   * Use "Reserve additional" instead of "Expand" to be more acurate in the
     change log
   * add log message to show the PF's IOV BAR final size
   * add pcibios_sriov_enable/disable() weak funcion in sriov_enable/disable()
     for arch setup before enable VFs. Like the arch could fix up the BDF for
     VFs, since the change of NumVFs would affect the BDF of VFs.
   * Add some explanation of PE on Power arch in the documentation
v9:
   * make the change log consistent in the terminology
     PF's IOV BAR -> the SRIOV BAR in PF
     VF's BAR -> the normal BAR in VF's view
   * rename all newly introduced function from _sriov_ to _iov_
   * rename the document to Documentation/powerpc/pci_iov_resource_on_powernv.txt
   * add the vendor id and device id of the tested devices
   * change return value from EINVAL to ENOSYS for pci_iov_virtfn_bus() and
     pci_iov_virtfn_devfn() when it is called on PF or SRIOV is not configured
   * rebase on 3.18-rc2 and tested
v8:
   * use weak funcion pcibios_sriov_resource_size() instead of some flag to
     retrieve the IOV BAR size.
   * add a document Documentation/powerpc/pci_resource.txt to explain the
     design.
   * make pci_iov_virtfn_bus()/pci_iov_virtfn_devfn() not inline.
   * extract a function res_to_dev_res(), so that it is more general to get
     additional size and alignment
   * fix one contention which is introduced in "powrepc/pci: Refactor pci_dn".
     the root cause is pci_get_slot() takes pci_bus_sem and leads to dead
     lock.
v7:
   * add IORESOURCE_ARCH flag for IOV BAR on powernv platform.
   * when IOV BAR has IORESOURCE_ARCH flag, the size is retrieved from
     hardware directly. If not, calculate as usual.
   * reorder the patch set, group them by subsystem:
     PCI, powerpc, powernv
   * rebase it on 3.16-rc6
v6:
   * remove pcibios_enable_sriov()/pcibios_disable_sriov() weak function
     similar function is moved to
     pnv_pci_enable_device_hook()/pnv_pci_disable_device_hook(). When PF is
     enabled, platform will try best to allocate resources for VFs.
   * remove pcibios_sriov_resource_size weak function
   * VF BAR size is retrieved from hardware directly in virtfn_add()
v5:
   * merge those SRIOV related platform functions in machdep_calls
     wrap them in one CONFIG_PCI_IOV marco
   * define IODA_INVALID_M64 to replace (-1)
     use this value to represent the m64_wins is not used
   * rename pnv_pci_release_dev_dma() to pnv_pci_ioda2_release_dma_pe()
     this function is a conterpart to pnv_pci_ioda2_setup_dma_pe()
   * change dev_info() to dev_dgb() in pnv_pci_ioda_fixup_iov_resources()
     reduce some log in kernel
   * release M64 window in pnv_pci_ioda2_release_dma_pe()
v4:
   * code format fix, eg. not exceed 80 chars
   * in commit "ppc/pnv: Add function to deconfig a PE"
     check the bus has a bridge before print the name
     remove a PE from its own PELTV
   * change the function name for sriov resource size/alignment
   * rebase on 3.16-rc3
   * VFs will not rely on device node
     As Grant Likely's comments, kernel should have the ability to handle the
     lack of device_node gracefully. Gavin restructure the pci_dn, which
     makes the VF will have pci_dn even when VF's device_node is not provided
     by firmware.
   * clean all the patch title to make them comply with one style
   * fix return value for pci_iov_virtfn_bus/pci_iov_virtfn_devfn
v3:
   * change the return type of virtfn_bus/virtfn_devfn to int
     change the name of these two functions to pci_iov_virtfn_bus/pci_iov_virtfn_devfn
   * reduce the second parameter or pcibios_sriov_disable()
   * use data instead of pe in "ppc/pnv: allocate pe->iommu_table dynamically"
   * rename __pci_sriov_resource_size to pcibios_sriov_resource_size
   * rename __pci_sriov_resource_alignment to pcibios_sriov_resource_alignment
v2:
   * change the return value of virtfn_bus/virtfn_devfn to 0
   * move some TCE related marco definition to
     arch/powerpc/platforms/powernv/pci.h
   * fix the __pci_sriov_resource_alignment on powernv platform
     During the sizing stage, the IOV BAR is truncated to 0, which will
     effect the order of allocation. Fix this, so that make sure BAR will be
     allocated ordered by their alignment.
v1:
   * improve the change log for
     "PCI: Add weak __pci_sriov_resource_size() interface"
     "PCI: Add weak __pci_sriov_resource_alignment() interface"
     "PCI: take additional IOV BAR alignment in sizing and assigning"
   * wrap VF PE code in CONFIG_PCI_IOV
   * did regression test on P7.

Bjorn Helgaas (2):
  PCI: Print more info in sriov_enable() error message
  PCI: Index IOV resources in the conventional style

Gavin Shan (1):
  powerpc/pci: Create pci_dn for VFs

Wei Yang (17):
  PCI: Print PF SR-IOV resource that contains all VF(n) BAR space
  PCI: Keep individual VF BAR size in struct pci_sriov
  PCI: Refresh First VF Offset and VF Stride when updating NumVFs
  PCI: Calculate maximum number of buses required for VFs
  PCI: Export pci_iov_virtfn_bus() and pci_iov_virtfn_devfn()
  PCI: Add pcibios_sriov_enable() and pcibios_sriov_disable()
  PCI: Add pcibios_iov_resource_alignment() interface
  PCI: Consider additional PF's IOV BAR alignment in sizing and
    assigning
  powerpc/pci: Don't unset PCI resources for VFs
  powerpc/powernv: Allocate struct pnv_ioda_pe iommu_table dynamically
  powerpc/powernv: Reserve additional space for IOV BAR according to
    the number of total_pe
  powerpc/powernv: Implement pcibios_iov_resource_alignment() on
    powernv
  powerpc/powernv: Shift VF resource with an offset
  powerpc/powernv: Reserve additional space for IOV BAR, with
    m64_per_iov supported
  powerpc/powernv: Group VF PE when IOV BAR is big on PHB3
  powerpc/pci: Remove unused struct pci_dn.pcidev field
  powerpc/pci: Add PCI resource alignment documentation

 .../powerpc/pci_iov_resource_on_powernv.txt        |  301 ++++++++
 arch/powerpc/include/asm/iommu.h                   |    3 +
 arch/powerpc/include/asm/machdep.h                 |    5 +
 arch/powerpc/include/asm/pci-bridge.h              |   13 +-
 arch/powerpc/kernel/pci-common.c                   |   20 +
 arch/powerpc/kernel/pci_dn.c                       |  129 ++++
 arch/powerpc/platforms/powernv/pci-ioda.c          |  770 +++++++++++++++++++-
 arch/powerpc/platforms/powernv/pci.c               |   18 +
 arch/powerpc/platforms/powernv/pci.h               |    9 +-
 drivers/pci/iov.c                                  |  155 ++--
 drivers/pci/pci.h                                  |    2 +
 drivers/pci/setup-bus.c                            |   95 ++-
 include/linux/pci.h                                |   15 +
 13 files changed, 1446 insertions(+), 89 deletions(-)
 create mode 100644 Documentation/powerpc/pci_iov_resource_on_powernv.txt

-- 
1.7.9.5



More information about the Linuxppc-dev mailing list