[PATCH] powerpc: Enhance pmem DMA bypass handling

Alexey Kardashevskiy aik at ozlabs.ru
Tue Oct 26 16:39:46 AEDT 2021



On 10/26/21 01:40, Brian King wrote:
> On 10/23/21 7:18 AM, Alexey Kardashevskiy wrote:
>>
>>
>> On 23/10/2021 07:18, Brian King wrote:
>>> On 10/22/21 7:24 AM, Alexey Kardashevskiy wrote:
>>>>
>>>>
>>>> On 22/10/2021 04:44, Brian King wrote:
>>>>> If ibm,pmemory is installed in the system, it can appear anywhere
>>>>> in the address space. This patch enhances how we handle DMA for devices when
>>>>> ibm,pmemory is present. In the case where we have enough DMA space to
>>>>> direct map all of RAM, but not ibm,pmemory, we use direct DMA for
>>>>> I/O to RAM and use the default window to dynamically map ibm,pmemory.
>>>>> In the case where we only have a single DMA window, this won't work, > so if the window is not big enough to map the entire address range,
>>>>> we cannot direct map.
>>>>
>>>> but we want the pmem range to be mapped into the huge DMA window too if we can, why skip it?
>>>
>>> This patch should simply do what the comment in this commit mentioned below suggests, which says that
>>> ibm,pmemory can appear anywhere in the address space. If the DMA window is large enough
>>> to map all of MAX_PHYSMEM_BITS, we will indeed simply do direct DMA for everything,
>>> including the pmem. If we do not have a big enough window to do that, we will do
>>> direct DMA for DRAM and dynamic mapping for pmem.
>>
>>
>> Right, and this is what we do already, do not we? I missing something here.
> 
> The upstream code does not work correctly that I can see. If I boot an upstream kernel
> with an nvme device and vpmem assigned to the LPAR, and enable dev_dbg in arch/powerpc/platforms/pseries/iommu.c,
> I see the following in the logs:
> 
> [    2.157549] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 500000 8000000 20000121 returned 0
> [    2.157561] nvme 0121:50:00.0: Skipping ibm,pmemory
> [    2.157567] nvme 0121:50:00.0: can't map partition max 0x8000000000000 with 16777216 65536-sized pages
> [    2.170150] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 500000 8000000 20000121 10 28 returned 0 (liobn = 0x70000121 starting addr = 8000000 0)
> [    2.170170] nvme 0121:50:00.0: created tce table LIOBN 0x70000121 for /pci at 800000020000121/pci1014,683 at 0
> [    2.356260] nvme 0121:50:00.0: node is /pci at 800000020000121/pci1014,683 at 0
> 
> This means we are heading down the leg in enable_ddw where we do not set direct_mapping to true. We use
> create the DDW window, but don't do any direct DMA. This is because the window is not large enough to
> map 2PB of memory, which is what ddw_memory_hotplug_max returns without my patch. 
> 
> With my patch applied, I get this in the logs:
> 
> [    2.204866] nvme 0121:50:00.0: ibm,query-pe-dma-windows(53) 500000 8000000 20000121 returned 0
> [    2.204875] nvme 0121:50:00.0: Skipping ibm,pmemory
> [    2.205058] nvme 0121:50:00.0: ibm,create-pe-dma-window(54) 500000 8000000 20000121 10 21 returned 0 (liobn = 0x70000121 starting addr = 8000000 0)
> [    2.205068] nvme 0121:50:00.0: created tce table LIOBN 0x70000121 for /pci at 800000020000121/pci1014,683 at 0
> [    2.215898] nvme 0121:50:00.0: iommu: 64-bit OK but direct DMA is limited by 800000200000000
> 


ah I see. then...


> 
> Thanks,
> 
> Brian
> 
> 
>>
>>>
>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/arch/powerpc/platforms/pseries/iommu.c?id=bf6e2d562bbc4d115cf322b0bca57fe5bbd26f48
>>>
>>>
>>> Thanks,
>>>
>>> Brian
>>>
>>>
>>>>
>>>>
>>>>>
>>>>> Signed-off-by: Brian King <brking at linux.vnet.ibm.com>
>>>>> ---
>>>>>    arch/powerpc/platforms/pseries/iommu.c | 19 ++++++++++---------
>>>>>    1 file changed, 10 insertions(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/arch/powerpc/platforms/pseries/iommu.c b/arch/powerpc/platforms/pseries/iommu.c
>>>>> index 269f61d519c2..d9ae985d10a4 100644
>>>>> --- a/arch/powerpc/platforms/pseries/iommu.c
>>>>> +++ b/arch/powerpc/platforms/pseries/iommu.c
>>>>> @@ -1092,15 +1092,6 @@ static phys_addr_t ddw_memory_hotplug_max(void)
>>>>>        phys_addr_t max_addr = memory_hotplug_max();
>>>>>        struct device_node *memory;
>>>>>    -    /*
>>>>> -     * The "ibm,pmemory" can appear anywhere in the address space.
>>>>> -     * Assuming it is still backed by page structs, set the upper limit
>>>>> -     * for the huge DMA window as MAX_PHYSMEM_BITS.
>>>>> -     */
>>>>> -    if (of_find_node_by_type(NULL, "ibm,pmemory"))
>>>>> -        return (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
>>>>> -            (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS);
>>>>> -
>>>>>        for_each_node_by_type(memory, "memory") {
>>>>>            unsigned long start, size;
>>>>>            int n_mem_addr_cells, n_mem_size_cells, len;
>>>>> @@ -1341,6 +1332,16 @@ static bool enable_ddw(struct pci_dev *dev, struct device_node *pdn)
>>>>>         */
>>>>>        len = max_ram_len;
>>>>>        if (pmem_present) {
>>>>> +        if (default_win_removed) {
>>>>> +            /*
>>>>> +             * If we only have one DMA window and have pmem present,
>>>>> +             * then we need to be able to map the entire address
>>>>> +             * range in order to be able to do direct DMA to RAM.
>>>>> +             */
>>>>> +            len = order_base_2((sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ?
>>>>> +                    (phys_addr_t) -1 : (1ULL << MAX_PHYSMEM_BITS));


... len = (sizeof(phys_addr_t) * 8 <= MAX_PHYSMEM_BITS) ? 31 :
MAX_PHYSMEM_BITS  ?

Or actually simply drop this hunk and only leave the first one and add
this instead:


diff --git a/arch/powerpc/platforms/pseries/iommu.c
b/arch/powerpc/platforms/pseries/iommu.c
index 591ec9e94edb..68bfcd2227d9 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1518,7 +1518,7 @@ static bool enable_ddw(struct pci_dev *dev, struct
device_node *pdn)
         * as RAM, then we failed to create a window to cover persistent
         * memory and need to set the DMA limit.
         */
-       if (pmem_present && ddw_enabled && direct_mapping && len ==
max_ram_len)
+       if (pmem_present && ddw_enabled && direct_mapping)

?

Thanks,



>>>>> +        }
>>>>> +
>>>>>            if (query.largest_available_block >=
>>>>>                (1ULL << (MAX_PHYSMEM_BITS - page_shift)))
>>>>>                len = MAX_PHYSMEM_BITS;
>>>>>
>>>>
>>>
>>>
>>
> 
> 

-- 
Alexey


More information about the Linuxppc-dev mailing list