[PATCH 03/15] powerpc/powernv/pci: Add explicit tracking of the DMA setup state

Tue Jul 14 17:21:42 AEST 2020

On 14/07/2020 15:58, Oliver O'Halloran wrote:
> On Tue, Jul 14, 2020 at 3:37 PM Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
>>
>> On 10/07/2020 15:23, Oliver O'Halloran wrote:
>>> There's an optimisation in the PE setup which skips performing DMA
>>> setup for a PE if we only have bridges in a PE. The assumption being
>>> that only "real" devices will DMA to system memory, which is probably
>>> fair. However, if we start off with only bridge devices in a PE then
>>> add a non-bridge device the new device won't be able to use DMA  because
>>> we never configured it.
>>>
>>> Fix this (admittedly pretty weird) edge case by tracking whether we've done
>>> the DMA setup for the PE or not. If a non-bridge device is added to the PE
>>> (via rescan or hotplug, or whatever) we can set up DMA on demand.
>>
>> So hotplug does not work on powernv then, right? I thought you tested it
>> a while ago, or this patch is the result of that attempt? If it is, then
> 
> It mostly works. Just the really niche case of hot plugging a bridge,
> then later on hot plugging a device into the same bus which wouldn't
> work.

Do not you have to have a slot (which is a bridge) for hotplug in the
first place, to hotplug the bridge?

> 
>> Reviewed-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>
>>
>>> This also means the only remaining user of the old "DMA Weight" code is
>>> the IODA1 DMA setup code that it was originally added for, which is good.
>>
>>
>> Is ditching IODA1 in the plan? :)
> 
> That or separating out the pci_controller_ops for IODA1 and IODA2 so
> we can stop any IODA2 specific changes from breaking it.

Is IODA1 tested at all these days? Or, is anyone running upstream
kernels anywhere and keeps shouting when it does not work on IODA1? Thanks,

> For the most
> part keeping around IODA1 support isn't hurting anyone, but I wanted
> to re-work how the BDFN->PE assignment works so that we'd delay
> assigning a BDFN to a PE until the device is probed. Right now when
> we're configuring the PE for a bus we map all 255 devfn's to that PE.
> This is mostly fine, but if you do a bus rescan and there's no device
> present we'll get a spurious EEH on that PE since the PHB sees that
> there's no device responding to the CFG cycle. We stop the spurious
> EEH freeze today by only allowing config cycles if we can find a
> pci_dn for that bdfn, but I want to get rid of pci_dn.
> 
> Mapping each BDFN to a PE after the device is probed is easy enough to
> do on PHB3 and above since the mapping is handled by an in-memory
> table which is indexed by the BDFN. Earlier PHBs (i.e. IODA1) use a
> table of bask & mask values which match on the BDFN, so assigning a
> whole bus at once is easy, but adding individual BDFNs is hard. It's
> still possible to do in the HW, but the way the OPAL API works makes
> it impossible.
> 
>>>
>>> Cc: Alexey Kardashevskiy <aik at ozlabs.ru>
>>> Signed-off-by: Oliver O'Halloran <oohall at gmail.com>
>>> ---
>>> Alexey, do we need to have the IOMMU API stuff set/clear this flag?
>>
>>
>> I'd say no as that API only cares if a device is in a PE and for those
>> the PE DMA setup  optimization is skipped. Thanks,
> 
> Ok cool.
> 

-- 
Alexey