[PATCH] powernv/pci: Fix m64 checks for SR-IOV and window alignment

Benjamin Herrenschmidt benh at kernel.crashing.org
Mon Sep 19 20:45:31 AEST 2016


On Mon, 2016-09-19 at 16:37 +1000, Russell Currey wrote:
> On Wed, 2016-09-14 at 21:30 +1000, Gavin Shan wrote:
> > 
> > On Wed, Sep 14, 2016 at 05:51:08PM +1000, Benjamin Herrenschmidt wrote:
> > > 
> > > 
> > > On Wed, 2016-09-14 at 16:37 +1000, Russell Currey wrote:
> > > > 
> > > > 
> > > > Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
> > > > addresses instead of using resource flags.  This broke SR-IOV as the
> > > > m64
> > > > check in pnv_pci_ioda_fixup_iov_resources() fails.
> > > > 
> > > > The condition in pnv_pci_window_alignment() also changed to checking
> > > > only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
> > > > IORESOURCE_PREFETCH.
> > > 
> > > CC'ing Gavin who might have some insight in the matter.
> > > 
> > > Why do we check for prefetch ? On PCIe, any 64-bit BAR can live under a
> > > prefetchable region afaik... Gavin, any idea ?
> > > 
> > Ben, what I understood for long time: non-prefetchable BAR cannot live under
> > a prefetchable region (window), but any BAR can live under non-prefetchable
> > region (window).

That is actually no longer true on PCIe I think. I need to double check but I
believe PCIe allows it because PCIe bridges aren't allowed to prefetch. 

That being said, our alignment hook is for bridge regions, and in that case, well,
the only 64-bit window is prefetchable...

> > > 
> > > 
> > > 
> > > > 
> > > > 
> > > > Revert these cases to the previous behaviour, adding a new helper
> > > > function
> > > > to do so.  This is named pnv_pci_is_m64_flags() to make it clear this
> > > > function is only looking at resource flags and should not be relied
> > > > on for
> > > > non-SRIOV resources.
> > > > 
> > > > Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some
> > > > 64-bit BARs")
> > > > > > > > Reported-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> > > > > > > > Signed-off-by: Russell Currey <ruscur at russell.cc>
> > > > ---
> > > >  arch/powerpc/platforms/powernv/pci-ioda.c | 11 +++++++++--
> > > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > index c16d790..2f25622 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > @@ -124,6 +124,13 @@ static inline bool pnv_pci_is_m64(struct pnv_phb
> > > > *phb, struct resource *r)
> > > > > > > >  		r->start < (phb->ioda.m64_base + phb-
> > > > > 
> > > > > 
> > > > > ioda.m64_size));
> > > >  }
> > > >  
> > > > +static inline bool pnv_pci_is_m64_flags(unsigned long
> > > > resource_flags)
> > > > +{
> > > > > > > > +	unsigned long flags = (IORESOURCE_MEM_64 |
> > > > IORESOURCE_PREFETCH);
> > > > +
> > > > > > > > +	return (resource_flags & flags) == flags;
> > > > +}
> > > > 
> > > I don't agree. See below.
> > > 
> > > > 
> > > > 
> > > >  static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int
> > > > pe_no)
> > > >  {
> > > > > > > >  	phb->ioda.pe_array[pe_no].phb = phb;
> > > > @@ -2871,7 +2878,7 @@ static void
> > > > pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> > > > > > > >  		res = &pdev->resource[i + PCI_IOV_RESOURCES];
> > > > > > > >  		if (!res->flags || res->parent)
> > > > > > > >  			continue;
> > > > > > > > -		if (!pnv_pci_is_m64(phb, res)) {
> > > > > > > > +		if (!pnv_pci_is_m64_flags(res->flags)) {
> > > > > > > >  			dev_warn(&pdev->dev, "Don't support SR-IOV
> > > > with"
> > > > > > > >  					" non M64 VF BAR%d: %pR.
> > > > \n",
> > > > > > > >  				 i, res);
> > > 
> > > What is that function actually doing ? Having IORESOURCE_64 and
> > > PREFETCHABLE is completely orthogonal to being in the M64 region. This
> > > is the bug my original patch was fixing in fact as it's possible for
> > > the allocator to put a 64-bit resource in the M32 region.
> > > 
> > 
> > This function is called before the resoureces are resized and assigned.
> > So using the resource's start/end addresses to judge it's in M64 or M32
> > windows are not reliable. Currently, all IOV BARs is required to have
> > (IORESOURCE_64 | PREFETCHABLE) which is covered by bridge's M64 window
> > and PHB's M64 windows (BARs).
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > @@ -3096,7 +3103,7 @@ static resource_size_t
> > > > pnv_pci_window_alignment(struct pci_bus *bus,
> > > > > > > >  	 * alignment for any 64-bit resource, PCIe doesn't care and
> > > > > > > >  	 * bridges only do 64-bit prefetchable anyway.
> > > > > > > >  	 */
> > > > > > > > -	if (phb->ioda.m64_segsize && (type & IORESOURCE_MEM_64))
> > > > > > > > +	if (phb->ioda.m64_segsize && pnv_pci_is_m64_flags(type))
> > > > > > > >  		return phb->ioda.m64_segsize;
> > > 
> > > I disagree similarly. 64-bit non-prefetchable resources should live in
> > > the M64 space as well.
> > > 
> > 
> > As I understood, 64-bits non-prefetchable BARs cannot live behind
> > M64 (64-bits prefetchable) windows.
> > 
> > > 
> > > 
> > > > 
> > > > 
> > > > > > > >  	if (type & IORESOURCE_MEM)
> > > > > > > >  		return phb->ioda.m32_segsize;
> > > 
> > > Something seems to be deeply wrong here and this patch looks to me that
> > > it's just papering over the problem in way that could bring back the
> > > bugs I've seen if the generic allocator decides to put things in the
> > > M32 window.
> > > 
> > > We need to look at this more closely and understand WTF that code
> > > intends means to do.
> > > 
> > 
> > Yeah, it seems it partially reverts your changes. The start/end addresses
> > are usable after resource resizing/assignment is finished. Before that,
> > we still need to use the flags.
> 
> I agree with Ben that we need to look at this more closely to find a proper fix
> rather than this hacky partial revert, but for now it's important that we fix
> SR-IOV and thus I think this patch should be carried forward.

Yes, this might be enough for 4.8

> This patch is a bandaid, but I believe completely fixing the underlying problem
> is not achievable given we're at rc7. 
> 
> As a side note, I am going to prototype a heavy refactor of the allocation code
> that simplifies things from an EEH perspective and allows us to use more generic
> PCI code.
> 
> > 
> > 
> > Thanks,
> > Gavin
> > 
> > 
> > > 
> > > 
> > > Cheers,
> > > Ben.
> > > 


More information about the Linuxppc-dev mailing list