[Linuxppc-users] DMA fails to reserved memory
Brian Varney
brian.varney at broadcom.com
Tue Jun 5 05:08:22 AEST 2018
Thanks for responding.
I'm not opposed to sharing snippets of code, but not sure what you would
need to see at this point.
I'm not using the kernel's DMA API. It wasn't necessary with X86
architecture where the code is working fine. Note that for x86
architecture, I am using the "memmap=" kernel parameter to reserve memory
instead of removing memory from the devicetree passed into the kernel.
After your response, I did try running using the api's "dma_map_resource"
function but it returned the same address I sent it. So no translation is
necessary I guess. It still failed the same way.
I did get a PCIE analyzer hooked up now and I was able to get a pcie trace
of it:
https://ent.box.com/s/vx8nex5zo83xs562usd6bfeckoxbn3c8
The last transaction shown is my PCIE adapter writing to 0x81166000 --
within the memory I reserved. The transaction gets ACK'ed but then no
other traffic is shown. It seems the system has cut itself off from this
PCIE adapter at this point.
Any ideas?
Thanks,
-Brian V.
On Fri, Jun 1, 2018 at 1:50 PM, Brian King <brking at linux.vnet.ibm.com>
wrote:
> Any code you can share? On an LC922 system running bare metal, as long as
> your
> adapter is capable of 64 bits of DMA address space, and not all adapters
> are,
> then you would not be using the IOMMU. However, that does not mean that the
> physical address the host sees equals the address that is used on the PCIe
> link by the adapter. You need to make sure you are using the DMA-API
> as defined in the kernel documentation to allocate DMA'able memory to be
> given
> to the adapter. This API will give you a virtual address you can use to
> access
> the memory in the kernel as well as a dma_addr_t which is the token you
> give
> to the adapter as the DMA address. In cases where an IOMMU is in use, this
> would
> setup the translation control entry (TCE) in the IOMMU. In your case,
> where you
> are not using an IOMMU, it will do a simple translation to an address that
> can be used on the PCIe link.
>
> What you are seeing in the log below is an EEH error, which is an error
> correction
> feature of the Power PCI host bridge, which allows the platform to recover
> from
> various PCIe errors. In this case, its a DMA write to an invalid address.
> In your
> case the invalid address is 81166000, which is not a valid DMA address on
> an
> LC922.
>
> Thanks,
>
> Brian
>
> On 06/01/2018 11:56 AM, Brian Varney wrote:
> > Hello all,
> >
> > I have a LC922 system running Fedora 28 (4.16.10-300.fc28.ppc64le) and I
> am reserving memory by modifying the device tree passed in to the kernel as
> described by this forum entry: https://lists.ozlabs.
> org/pipermail/linuxppc-users/2017-September/000112.html <
> https://lists.ozlabs.org/pipermail/linuxppc-users/2017-
> September/000112.html>
> >
> > I have a PCIE adapter plugged into the system that I am testing. When
> the adapter performs a DMA operation to this reserved memory, things start
> to go south. All reads with the adapter's BAR space start returning all
> FF's. I suspect the reads aren't actually making it to the adapter but I
> don't have a PCIE analyzer on there to verify. Then I get the following in
> dmesg:
> >
> > [ 340.316599] EEH: Frozen PHB#34-PE#0 detected
> > [ 340.316645] EEH: PE location: WIO Slot2, PHB location: N/A
> > [ 340.316675] CPU: 133 PID: 5380 Comm: mr Tainted: P OE
> 4.16.10-300.fc28.ppc64le #1
> > [ 340.316676] Call Trace:
> > [ 340.316682] [c0002004287b7a40] [c000000000bec5d0]
> dump_stack+0xb4/0x104 (unreliable)
> > [ 340.316686] [c0002004287b7a80] [c00000000003f9d0]
> eeh_dev_check_failure+0x4b0/0x5b0
> > [ 340.316689] [c0002004287b7b20] [c0000000000b3ae8]
> pnv_pci_read_config+0x138/0x170
> > [ 340.316692] [c0002004287b7b70] [c0000000006c7e14]
> pci_user_read_config_byte+0x84/0x160
> > [ 340.316693] [c0002004287b7bc0] [c0000000006df1fc]
> pci_read_config+0x12c/0x2d0
> > [ 340.316696] [c0002004287b7c50] [c0000000004bceb4]
> sysfs_kf_bin_read+0x94/0xf0
> > [ 340.316698] [c0002004287b7c90] [c0000000004bbd30]
> kernfs_fop_read+0x130/0x2a0
> > [ 340.316699] [c0002004287b7ce0] [c0000000003e721c]
> __vfs_read+0x6c/0x1e0
> > [ 340.316701] [c0002004287b7d80] [c0000000003e744c] vfs_read+0xbc/0x1b0
> > [ 340.316703] [c0002004287b7dd0] [c0000000003e7ed4]
> SyS_pread64+0xc4/0x120
> > [ 340.316705] [c0002004287b7e30] [c00000000000b8e0]
> system_call+0x58/0x6c
> > [ 340.316734] EEH: Detected PCI bus error on PHB#34-PE#0
> > [ 340.316739] EEH: This PCI device has failed 1 times in the last hour
> > [ 340.316739] EEH: Notify device drivers to shutdown
> > [ 340.316746] EEH: Collect temporary log
> > [ 340.316780] EEH: of node=0034:01:00.0
> > [ 340.316783] EEH: PCI device/vendor: 00d11000
> > [ 340.316786] EEH: PCI cmd/status register: 00100146
> > [ 340.316787] EEH: PCI-E capabilities and status follow:
> > [ 340.316802] EEH: PCI-E 00: 0002b010 112c8023 00002950 00437d03
> > [ 340.316813] EEH: PCI-E 10: 10830000 00000000 00000000 00000000
> > [ 340.316815] EEH: PCI-E 20: 00000000
> > [ 340.316816] EEH: PCI-E AER capability register set follows:
> > [ 340.316828] EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030
> > [ 340.316839] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000
> > [ 340.316849] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000
> > [ 340.316853] EEH: PCI-E AER 30: 00000000 00000000
> > [ 340.316855] PHB4 PHB#52 Diag-data (Version: 1)
> > [ 340.316855] brdgCtl: 00000002
> > [ 340.316857] RootSts: 00000040 00402000 a0830008 00100107 00000000
> > [ 340.316859] PhbSts: 0000001c00000000 0000001c00000000
> > [ 340.316860] Lem: 0000000010000000 0000000000000000
> 0000000010000000
> > [ 340.316862] PhbErr: 0000080000000000 0000080000000000
> 2148000098000240 a008400000000000
> > [ 340.316864] RxeArbErr: 0000000800000000 0000000800000000
> 7f1a01000000001b 0000000081166000
> > [ 340.316865] RegbErr: 0040000000000000 0000000000000000
> a2000a4018000000 1800000000000000
> > [ 340.316868] PE[000] A/B: 8000802301000000 8000000081166000
> > [ 340.316871] PE[100] A/B: 80000000ff275c00 80000000300d088b
> > [ 340.316872] EEH: Reset with hotplug activity
> > [ 340.316898] iommu: Removing device 0034:01:00.0 from group 4
> >
> >
> > Any idea why this is happening? My suspicion is that there is iommu
> hardware that is not allowing my adapter to access this memory, but I am
> not familiar with the power9 architecture. Is there a way to disable the
> iommu completely or kernel functions to call to give my adapter "permision"
> to DMA to a memory range?
> >
> > Thanks,
> > Brian V.
> >
> >
> >
> > _______________________________________________
> > Linuxppc-users mailing list
> > Linuxppc-users at lists.ozlabs.org
> > https://lists.ozlabs.org/listinfo/linuxppc-users
> >
>
>
> --
> Brian King
> Power Linux I/O
> IBM Linux Technology Center
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ozlabs.org/pipermail/linuxppc-users/attachments/20180604/9781d253/attachment.html>
More information about the Linuxppc-users
mailing list