[Linuxppc-users] DMA fails to reserved memory

Benjamin Herrenschmidt benh at au1.ibm.com
Tue Jun 5 07:53:07 AEST 2018


On Mon, 2018-06-04 at 13:08 -0600, Brian Varney wrote:
> Thanks for responding.
> 
> I'm not opposed to sharing snippets of code, but not sure what you
> would need to see at this point.
> 
> I'm not using the kernel's DMA API.  It wasn't necessary with X86
> architecture where the code is working fine.  Note that for x86
> architecture, I am using the "memmap=" kernel parameter to reserve
> memory instead of removing memory from the devicetree passed into the
> kernel.
> 
> After your response, I did try running using the api's
> "dma_map_resource" function but it returned the same address I sent
> it.  So no translation is necessary I guess.  It still failed the
> same way.

dma_map_resource() seems to be a new call that was added to the API
recently that we don't yet support or implement on powerpc. It also
look rather ... broken as it doesn't fail if the backend doesn't
support it.

It also doesn't work for normal memory.

What are you trying to do ?

If you are trying to write to memory, use dma_map_single/sg or
dma_alloc_coherent, and DMA to that.

> I did get a PCIE analyzer hooked up now and I was able to get a pcie
> trace of it:
> https://ent.box.com/s/vx8nex5zo83xs562usd6bfeckoxbn3c8
> 
> The last transaction shown is my PCIE adapter writing to 0x81166000
> -- within the memory I reserved.  The transaction gets ACK'ed but
> then no other traffic is shown.  It seems the system has cut itself
> off from this PCIE adapter at this point.
> 
> Any ideas?
> 
> Thanks,
> -Brian V.
> 
> 
> 
> 
> 
> On Fri, Jun 1, 2018 at 1:50 PM, Brian King <brking at linux.vnet.ibm.com> wrote:
> > Any code you can share? On an LC922 system running bare metal, as long as your
> > adapter is capable of 64 bits of DMA address space, and not all adapters are,
> > then you would not be using the IOMMU. However, that does not mean that the
> > physical address the host sees equals the address that is used on the PCIe
> > link by the adapter. You need to make sure you are using the DMA-API
> > as defined in the kernel documentation to allocate DMA'able memory to be given
> > to the adapter. This API will give you a virtual address you can use to access
> > the memory in the kernel as well as a dma_addr_t which is the token you give
> > to the adapter as the DMA address. In cases where an IOMMU is in use, this would
> > setup the translation control entry (TCE) in the IOMMU. In your case, where you
> > are not using an IOMMU, it will do a simple translation to an address that
> > can be used on the PCIe link.
> > 
> > What you are seeing in the log below is an EEH error, which is an error correction
> > feature of the Power PCI host bridge, which allows the platform to recover from
> > various PCIe errors. In this case, its a DMA write to an invalid address. In your
> > case the invalid address is 81166000, which is not a valid DMA address on an
> > LC922. 
> > 
> > Thanks,
> > 
> > Brian
> > 
> > On 06/01/2018 11:56 AM, Brian Varney wrote:
> > > Hello all,
> > > 
> > > I have a LC922 system running Fedora 28 (4.16.10-300.fc28.ppc64le) and I am reserving memory by modifying the device tree passed in to the kernel as described by this forum entry: https://lists.ozlabs.org/pipermail/linuxppc-users/2017-September/000112.html <https://lists.ozlabs.org/pipermail/linuxppc-users/2017-September/000112.html>
> > > 
> > > I have a PCIE adapter plugged into the system that I am testing.  When the adapter performs a DMA operation to this reserved memory, things start to go south.  All reads with the adapter's BAR space start returning all FF's.  I suspect the reads aren't actually making it to the adapter but I don't have a PCIE analyzer on there to verify.  Then I get the following in dmesg:
> > > 
> > > [  340.316599] EEH: Frozen PHB#34-PE#0 detected
> > > [  340.316645] EEH: PE location: WIO Slot2, PHB location: N/A
> > > [  340.316675] CPU: 133 PID: 5380 Comm: mr Tainted: P           OE    4.16.10-300.fc28.ppc64le #1
> > > [  340.316676] Call Trace:
> > > [  340.316682] [c0002004287b7a40] [c000000000bec5d0] dump_stack+0xb4/0x104 (unreliable)
> > > [  340.316686] [c0002004287b7a80] [c00000000003f9d0] eeh_dev_check_failure+0x4b0/0x5b0
> > > [  340.316689] [c0002004287b7b20] [c0000000000b3ae8] pnv_pci_read_config+0x138/0x170
> > > [  340.316692] [c0002004287b7b70] [c0000000006c7e14] pci_user_read_config_byte+0x84/0x160
> > > [  340.316693] [c0002004287b7bc0] [c0000000006df1fc] pci_read_config+0x12c/0x2d0
> > > [  340.316696] [c0002004287b7c50] [c0000000004bceb4] sysfs_kf_bin_read+0x94/0xf0
> > > [  340.316698] [c0002004287b7c90] [c0000000004bbd30] kernfs_fop_read+0x130/0x2a0
> > > [  340.316699] [c0002004287b7ce0] [c0000000003e721c] __vfs_read+0x6c/0x1e0
> > > [  340.316701] [c0002004287b7d80] [c0000000003e744c] vfs_read+0xbc/0x1b0
> > > [  340.316703] [c0002004287b7dd0] [c0000000003e7ed4] SyS_pread64+0xc4/0x120
> > > [  340.316705] [c0002004287b7e30] [c00000000000b8e0] system_call+0x58/0x6c
> > > [  340.316734] EEH: Detected PCI bus error on PHB#34-PE#0
> > > [  340.316739] EEH: This PCI device has failed 1 times in the last hour
> > > [  340.316739] EEH: Notify device drivers to shutdown
> > > [  340.316746] EEH: Collect temporary log
> > > [  340.316780] EEH: of node=0034:01:00.0
> > > [  340.316783] EEH: PCI device/vendor: 00d11000
> > > [  340.316786] EEH: PCI cmd/status register: 00100146
> > > [  340.316787] EEH: PCI-E capabilities and status follow:
> > > [  340.316802] EEH: PCI-E 00: 0002b010 112c8023 00002950 00437d03
> > > [  340.316813] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
> > > [  340.316815] EEH: PCI-E 20: 00000000
> > > [  340.316816] EEH: PCI-E AER capability register set follows:
> > > [  340.316828] EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
> > > [  340.316839] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000
> > > [  340.316849] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
> > > [  340.316853] EEH: PCI-E AER 30: 00000000 00000000
> > > [  340.316855] PHB4 PHB#52 Diag-data (Version: 1)
> > > [  340.316855] brdgCtl:    00000002
> > > [  340.316857] RootSts:    00000040 00402000 a0830008 00100107 00000000
> > > [  340.316859] PhbSts:     0000001c00000000 0000001c00000000
> > > [  340.316860] Lem:        0000000010000000 0000000000000000 0000000010000000
> > > [  340.316862] PhbErr:     0000080000000000 0000080000000000 2148000098000240 a008400000000000
> > > [  340.316864] RxeArbErr:  0000000800000000 0000000800000000 7f1a01000000001b 0000000081166000
> > > [  340.316865] RegbErr:    0040000000000000 0000000000000000 a2000a4018000000 1800000000000000
> > > [  340.316868] PE[000] A/B: 8000802301000000 8000000081166000
> > > [  340.316871] PE[100] A/B: 80000000ff275c00 80000000300d088b
> > > [  340.316872] EEH: Reset with hotplug activity
> > > [  340.316898] iommu: Removing device 0034:01:00.0 from group 4
> > > 
> > > 
> > > Any idea why this is happening?  My suspicion is that there is iommu hardware that is not allowing my adapter to access this memory, but I am not familiar with the power9 architecture.  Is there a way to disable the iommu completely or kernel functions to call to give my adapter "permision" to DMA to a memory range?
> > > 
> > > Thanks,
> > > Brian V.
> > > 
> > > 
> > > 
> > > _______________________________________________
> > > Linuxppc-users mailing list
> > > Linuxppc-users at lists.ozlabs.org
> > > https://lists.ozlabs.org/listinfo/linuxppc-users
> > > 
> > 
> > 
> > _______________________________________________
> > Linuxppc-users mailing list
> > Linuxppc-users at lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-users



More information about the Linuxppc-users mailing list