[PATCH kernel v2 0/7] powerpc/powenv/ioda: Allow huge DMA window at 4GB

Wed Apr 22 16:49:06 AEST 2020

On 21/04/2020 16:35, Oliver O'Halloran wrote:
> On Tue, Apr 21, 2020 at 3:11 PM Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
>>
>> One example of a problem device is AMD GPU with 64bit video PCI function
>> and 32bit audio, no?
>>
>> What PEs will they get assigned to now? Where will audio's MMIO go? It
>> cannot be the same 64bit MMIO segment, right? If so, it is a separate PE
>> already. If not, then I do not understand "we're free to assign whatever
>> PE number we want.
> 
> The BARs stay in the same place and as far as MMIO is concerned
> nothing has changed. For MMIO the PHB uses the MMIO address to find a
> PE via the M64 BAR table, but for DMA it uses a *completely* different
> mechanism. Instead it takes the BDFN (included in the DMA packet
> header) and the Requester Translation Table (RTT) to map the BDFN to a
> PE. Normally you would configure the PHB so the same PE used for MMIO
> and DMA, but you don't have to.

32bit MMIO is what puzzles me in this picture, how does it work?

>>> I think the key thing to realise is that we'd only be using the DMA-PE
>>> when a crippled DMA mask is set by the driver. In all other cases we
>>> can just use the "native PE" and when the driver unbinds we can de-
>>> allocate our DMA-PE and return the device to the PE containing it's
>>> MMIO BARs. I think we can keep things relatively sane that way and the
>>> real issue is detecting EEH events on the DMA-PE.
>>
>>
>> Oooor we could just have 1 DMA window (or, more precisely, a single
>> "TVE" as it is either window or bypass) per a PE and give every function
>> its own PE and create a window or a table when a device sets a DMA mask.
>> I feel I am missing something here though.
> 
> Yes, we could do that, but do we want to?
> 
> I was thinking we should try minimise the number of DMA-only PEs since
> it complicates the EEH freeze handling. When MMIO and DMA are mapped
> to the same PE an error on either will cause the hardware to stop
> both. When seperate PEs are used for DMA and MMIO you lose that
> atomicity. It's not a big deal if DMA is stopped and MMIO allowed
> since PAPR (sort-of) allows that, but having MMIO frozen with DMA
> unfrozen is a bit sketch.

You suggested using slave PEs for crippled functions - won't we have the
same problem then?
And is this "slave PE" something the hardware supports or it is a
software concept?

>>>> For the time being, this patchset is good for:
>>>> 1. weird hardware which has limited DMA mask (this is why the patchset
>>>> was written in the first place)
>>>> 2. debug DMA by routing it via IOMMU (even when 4GB hack is not enabled).
>>>
>>> Sure, but it's still dependent on having firmware which supports the
>>> 4GB hack and I don't think that's in any offical firmware releases yet.
>>
>> It's been a while :-/
> 
> There's been no official FW releases with a skiboot that supports the
> phb get/set option opal calls so the only systems that can actually
> take advantage of it are our lab systems. It might still be useful for
> future systems, but I'd rather something that doesn't depend on FW
> support.

Pensando folks use it ;)

-- 
Alexey