[PATCH V11 16/17] powerpc/powernv: Reserve additional space for IOV BAR, with m64_per_iov supported

Bjorn Helgaas bhelgaas at google.com
Thu Feb 5 09:05:18 AEDT 2015


On Thu, Jan 15, 2015 at 10:28:06AM +0800, Wei Yang wrote:
> M64 aperture size is limited on PHB3. When the IOV BAR is too big, this
> will exceed the limitation and failed to be assigned.
> 
> This patch introduce a different mechanism based on the IOV BAR size:
> 
> IOV BAR size is smaller than 64M, expand to total_pe.
> IOV BAR size is bigger than 64M, roundup power2.

Can you elaborate on this a little more?  I assume this is talking about an
M64 BAR.  What is the size limit for this?

If I understand correctly, hardware always splits an M64 BAR into 256
segments.  If you wanted each 128MB VF BAR in a separate PE, the M64 BAR
would have to be 256 * 128MB = 32GB.  So maybe the idea is that instead of
consuming 32GB of address space, you let each VF BAR span several segments.
If each 128MB VF BAR spanned 4 segments, each segment would be 32MB and the
whole M64 BAR would be 256 * 32MB = 8GB.  But you would have to use
"companion" PEs so all 4 segments would be in the same "domain," and that
would reduce the number of VFs you could support from 256 to 256/4 = 64.

If that were the intent, I would think TotalVFs would be involved, but it's
not.  For a device with TotalVFs=8 and a 128MB VF BAR, it would make sense
to dedicate 4 segments to each of those BARs because you could only use 8*4
= 32 segments total.  If the device had TotalVFs=128, your only choices
would be 1 segment per BAR (256 segments * 128MB/segment = 32GB) or 2
segments per BAR (256 segments * 64MB/segment = 16GB).

If you use 2 segments per BAR and you want NumVFs=128, you can't shift
anything, so the PE assignments are completely fixed (VF0 in PE0, VF1 in
PE1, etc.)

It seems like you'd make different choices about pdn->max_vfs for these two
devices, but currently you only look at the VF BAR size, not at TotalVFs.

> Signed-off-by: Wei Yang <weiyang at linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/pci-bridge.h     |    2 ++
>  arch/powerpc/platforms/powernv/pci-ioda.c |   33 ++++++++++++++++++++++++++---
>  2 files changed, 32 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/pci-bridge.h b/arch/powerpc/include/asm/pci-bridge.h
> index d61c384..7156486 100644
> --- a/arch/powerpc/include/asm/pci-bridge.h
> +++ b/arch/powerpc/include/asm/pci-bridge.h
> @@ -174,6 +174,8 @@ struct pci_dn {
>  	u16     max_vfs;		/* number of VFs IOV BAR expended */
>  	u16     vf_pes;			/* VF PE# under this PF */
>  	int     offset;			/* PE# for the first VF PE */
> +#define M64_PER_IOV 4
> +	int     m64_per_iov;
>  #define IODA_INVALID_M64        (-1)
>  	int     m64_wins[PCI_SRIOV_NUM_BARS];
>  #endif /* CONFIG_PCI_IOV */
> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c b/arch/powerpc/platforms/powernv/pci-ioda.c
> index 94fe6e1..23ea873 100644
> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> @@ -2180,6 +2180,7 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  	int i;
>  	resource_size_t size;
>  	struct pci_dn *pdn;
> +	int mul, total_vfs;
>  
>  	if (!pdev->is_physfn || pdev->is_added)
>  		return;
> @@ -2190,6 +2191,32 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  	pdn = pci_get_pdn(pdev);
>  	pdn->max_vfs = 0;
>  
> +	total_vfs = pci_sriov_get_totalvfs(pdev);
> +	pdn->m64_per_iov = 1;
> +	mul = phb->ioda.total_pe;
> +
> +	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
> +		res = &pdev->resource[i];
> +		if (!res->flags || res->parent)
> +			continue;
> +		if (!pnv_pci_is_mem_pref_64(res->flags)) {
> +			dev_warn(&pdev->dev, " non M64 IOV BAR %pR on %s\n",
> +					res, pci_name(pdev));
> +			continue;
> +		}
> +
> +		size = pci_iov_resource_size(pdev, i);
> +
> +		/* bigger than 64M */
> +		if (size > (1 << 26)) {
> +			dev_info(&pdev->dev, "PowerNV: VF BAR[%d] size "
> +					"is bigger than 64M, roundup power2\n", i);
> +			pdn->m64_per_iov = M64_PER_IOV;
> +			mul = __roundup_pow_of_two(total_vfs);

I think this might deserve more comment in dmesg.  "roundup power2" doesn't
really tell me anything, especially since you mention a BAR, but you're
actually rounding up total_vfs, not the BAR size.

Does this reduce the number of possible VFs?  We already can't set NumVFs
higher than TotalVFs.  Does this make it so we can't set NumVFs higher than
pdn->max_vfs?


> +			break;
> +		}
> +	}
> +
>  	for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++) {
>  		res = &pdev->resource[i];
>  		if (!res->flags || res->parent)
> @@ -2202,13 +2229,13 @@ static void pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
>  
>  		dev_dbg(&pdev->dev, " Fixing VF BAR[%d] %pR to\n", i, res);
>  		size = pci_iov_resource_size(pdev, i);
> -		res->end = res->start + size * phb->ioda.total_pe - 1;
> +		res->end = res->start + size * mul - 1;
>  		dev_dbg(&pdev->dev, "                       %pR\n", res);
>  		dev_info(&pdev->dev, "VF BAR%d: %pR (expanded to %d VFs for PE alignment)",
>  				i - PCI_IOV_RESOURCES,
> -				res, phb->ioda.total_pe);
> +				res, mul);
>  	}
> -	pdn->max_vfs = phb->ioda.total_pe;
> +	pdn->max_vfs = mul;

Maybe this is the part that makes it hard to compute the size in
sriov_init() -- you reduce pdn->max_vfs in some cases, and maybe that can
affect the space you reserve for IOV BARs?  E.g., maybe you reduce
pdn->max_vfs to 128 because VF BAR 3 is larger than 64MB, but you've
already expanded the IOV space for VF BAR 1 based on 256 PEs?

But then I'm still confused because the loop here in
pnv_pci_ioda_fixup_iov_resources() always expands the resource based on
phb->ioda.total_pe; that part doesn't depend on "mul" or pdn->max_vfs at
all.

Or maybe this is just a bug in the fixup loop, and it *should* depend on
"mul"?

>  }
>  
>  static void pnv_pci_ioda_fixup_sriov(struct pci_bus *bus)
> -- 
> 1.7.9.5
> 


More information about the Linuxppc-dev mailing list