[PATCH] powernv/pci: Fix m64 checks for SR-IOV and window alignment
Benjamin Herrenschmidt
benh at kernel.crashing.org
Mon Sep 19 20:45:31 AEST 2016
On Mon, 2016-09-19 at 16:37 +1000, Russell Currey wrote:
> On Wed, 2016-09-14 at 21:30 +1000, Gavin Shan wrote:
> >
> > On Wed, Sep 14, 2016 at 05:51:08PM +1000, Benjamin Herrenschmidt wrote:
> > >
> > >
> > > On Wed, 2016-09-14 at 16:37 +1000, Russell Currey wrote:
> > > >
> > > >
> > > > Commit 5958d19a143e checks for prefetchable m64 BARs by comparing the
> > > > addresses instead of using resource flags. This broke SR-IOV as the
> > > > m64
> > > > check in pnv_pci_ioda_fixup_iov_resources() fails.
> > > >
> > > > The condition in pnv_pci_window_alignment() also changed to checking
> > > > only IORESOURCE_MEM_64 instead of both IORESOURCE_MEM_64 and
> > > > IORESOURCE_PREFETCH.
> > >
> > > CC'ing Gavin who might have some insight in the matter.
> > >
> > > Why do we check for prefetch ? On PCIe, any 64-bit BAR can live under a
> > > prefetchable region afaik... Gavin, any idea ?
> > >
> > Ben, what I understood for long time: non-prefetchable BAR cannot live under
> > a prefetchable region (window), but any BAR can live under non-prefetchable
> > region (window).
That is actually no longer true on PCIe I think. I need to double check but I
believe PCIe allows it because PCIe bridges aren't allowed to prefetch.
That being said, our alignment hook is for bridge regions, and in that case, well,
the only 64-bit window is prefetchable...
> > >
> > >
> > >
> > > >
> > > >
> > > > Revert these cases to the previous behaviour, adding a new helper
> > > > function
> > > > to do so. This is named pnv_pci_is_m64_flags() to make it clear this
> > > > function is only looking at resource flags and should not be relied
> > > > on for
> > > > non-SRIOV resources.
> > > >
> > > > Fixes: 5958d19a143e ("Fix incorrect PE reservation attempt on some
> > > > 64-bit BARs")
> > > > > > > > Reported-by: Alexey Kardashevskiy <aik at ozlabs.ru>
> > > > > > > > Signed-off-by: Russell Currey <ruscur at russell.cc>
> > > > ---
> > > > arch/powerpc/platforms/powernv/pci-ioda.c | 11 +++++++++--
> > > > 1 file changed, 9 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > index c16d790..2f25622 100644
> > > > --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> > > > @@ -124,6 +124,13 @@ static inline bool pnv_pci_is_m64(struct pnv_phb
> > > > *phb, struct resource *r)
> > > > > > > > r->start < (phb->ioda.m64_base + phb-
> > > > >
> > > > >
> > > > > ioda.m64_size));
> > > > }
> > > >
> > > > +static inline bool pnv_pci_is_m64_flags(unsigned long
> > > > resource_flags)
> > > > +{
> > > > > > > > + unsigned long flags = (IORESOURCE_MEM_64 |
> > > > IORESOURCE_PREFETCH);
> > > > +
> > > > > > > > + return (resource_flags & flags) == flags;
> > > > +}
> > > >
> > > I don't agree. See below.
> > >
> > > >
> > > >
> > > > static struct pnv_ioda_pe *pnv_ioda_init_pe(struct pnv_phb *phb, int
> > > > pe_no)
> > > > {
> > > > > > > > phb->ioda.pe_array[pe_no].phb = phb;
> > > > @@ -2871,7 +2878,7 @@ static void
> > > > pnv_pci_ioda_fixup_iov_resources(struct pci_dev *pdev)
> > > > > > > > res = &pdev->resource[i + PCI_IOV_RESOURCES];
> > > > > > > > if (!res->flags || res->parent)
> > > > > > > > continue;
> > > > > > > > - if (!pnv_pci_is_m64(phb, res)) {
> > > > > > > > + if (!pnv_pci_is_m64_flags(res->flags)) {
> > > > > > > > dev_warn(&pdev->dev, "Don't support SR-IOV
> > > > with"
> > > > > > > > " non M64 VF BAR%d: %pR.
> > > > \n",
> > > > > > > > i, res);
> > >
> > > What is that function actually doing ? Having IORESOURCE_64 and
> > > PREFETCHABLE is completely orthogonal to being in the M64 region. This
> > > is the bug my original patch was fixing in fact as it's possible for
> > > the allocator to put a 64-bit resource in the M32 region.
> > >
> >
> > This function is called before the resoureces are resized and assigned.
> > So using the resource's start/end addresses to judge it's in M64 or M32
> > windows are not reliable. Currently, all IOV BARs is required to have
> > (IORESOURCE_64 | PREFETCHABLE) which is covered by bridge's M64 window
> > and PHB's M64 windows (BARs).
> >
> > >
> > >
> > > >
> > > >
> > > > @@ -3096,7 +3103,7 @@ static resource_size_t
> > > > pnv_pci_window_alignment(struct pci_bus *bus,
> > > > > > > > * alignment for any 64-bit resource, PCIe doesn't care and
> > > > > > > > * bridges only do 64-bit prefetchable anyway.
> > > > > > > > */
> > > > > > > > - if (phb->ioda.m64_segsize && (type & IORESOURCE_MEM_64))
> > > > > > > > + if (phb->ioda.m64_segsize && pnv_pci_is_m64_flags(type))
> > > > > > > > return phb->ioda.m64_segsize;
> > >
> > > I disagree similarly. 64-bit non-prefetchable resources should live in
> > > the M64 space as well.
> > >
> >
> > As I understood, 64-bits non-prefetchable BARs cannot live behind
> > M64 (64-bits prefetchable) windows.
> >
> > >
> > >
> > > >
> > > >
> > > > > > > > if (type & IORESOURCE_MEM)
> > > > > > > > return phb->ioda.m32_segsize;
> > >
> > > Something seems to be deeply wrong here and this patch looks to me that
> > > it's just papering over the problem in way that could bring back the
> > > bugs I've seen if the generic allocator decides to put things in the
> > > M32 window.
> > >
> > > We need to look at this more closely and understand WTF that code
> > > intends means to do.
> > >
> >
> > Yeah, it seems it partially reverts your changes. The start/end addresses
> > are usable after resource resizing/assignment is finished. Before that,
> > we still need to use the flags.
>
> I agree with Ben that we need to look at this more closely to find a proper fix
> rather than this hacky partial revert, but for now it's important that we fix
> SR-IOV and thus I think this patch should be carried forward.
Yes, this might be enough for 4.8
> This patch is a bandaid, but I believe completely fixing the underlying problem
> is not achievable given we're at rc7.
>
> As a side note, I am going to prototype a heavy refactor of the allocation code
> that simplifies things from an EEH perspective and allows us to use more generic
> PCI code.
>
> >
> >
> > Thanks,
> > Gavin
> >
> >
> > >
> > >
> > > Cheers,
> > > Ben.
> > >
More information about the Linuxppc-dev
mailing list