<div dir="ltr">yes! that is exactly what I needed. I made a reserved-memory node entry in the devicetree rather than removing the memory completely and now I can use __va() and call dma_map_single() and was able to DMA to that memory.<div><br></div><div>Thank you both for your help.</div><div><br></div><div>-Brian V.</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Jun 4, 2018 at 5:15 PM, Benjamin Herrenschmidt <span dir="ltr"><<a href="mailto:benh@au1.ibm.com" target="_blank">benh@au1.ibm.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Mon, 2018-06-04 at 17:04 -0600, Brian Varney wrote:<br>
> Okay, so in my haste to look up the dma api, it looks like I ended up<br>
> using the wrong function.<br>
> <br>
> The dma_map_single and dma_map_page take a kernel virtual pointer and<br>
> page * respectively. I don't have either one of those. All I have<br>
> is a physical address. Since I am modifying the devicetree to hide<br>
> this memory before booting the kernel, the kernel doesn't really know<br>
> this memory exists.<br>
> <br>
> What I'm trying to do is reserve a huge chunk of contiguous memory<br>
> (i.e. several GBs). I want this memory to be DMA-able from my PCIE<br>
> adapter and be able access it in userspace. Is there a better way to<br>
> accomplish this -- probably, but I'm trying to port a big code base<br>
> that we had working from x86 platform to PPC without changing too<br>
> much.<br>
<br>
</span>Rather than completely "remove" it from the /memory node of the device-<br>
tree, instead make sure you put it in the reserved regions, that way it<br>
will still be mapped by the kernel, just not used.<br>
<br>
That way you can get a kernel virtual address by just doing __va() on<br>
the physical address. A bit hackish but should work.<br>
<div class="HOEnZb"><div class="h5"><br>
> Reserving that memory by modifying the devicetree passed into the<br>
> kernel works great. I can then access that memory by<br>
> calling remap_pfn_range on it. The only thing I am missing is being<br>
> able to DMA to that memory.<br>
> <br>
> Thanks,<br>
> Brian V.<br>
> <br>
> <br>
> <br>
> <br>
> <br>
> On Mon, Jun 4, 2018 at 3:53 PM, Benjamin Herrenschmidt <<a href="mailto:benh@au1.ibm.com">benh@au1.ibm.com</a>> wrote:<br>
> > On Mon, 2018-06-04 at 13:08 -0600, Brian Varney wrote:<br>
> > > Thanks for responding.<br>
> > > <br>
> > > I'm not opposed to sharing snippets of code, but not sure what you<br>
> > > would need to see at this point.<br>
> > > <br>
> > > I'm not using the kernel's DMA API. It wasn't necessary with X86<br>
> > > architecture where the code is working fine. Note that for x86<br>
> > > architecture, I am using the "memmap=" kernel parameter to reserve<br>
> > > memory instead of removing memory from the devicetree passed into the<br>
> > > kernel.<br>
> > > <br>
> > > After your response, I did try running using the api's<br>
> > > "dma_map_resource" function but it returned the same address I sent<br>
> > > it. So no translation is necessary I guess. It still failed the<br>
> > > same way.<br>
> > <br>
> > dma_map_resource() seems to be a new call that was added to the API<br>
> > recently that we don't yet support or implement on powerpc. It also<br>
> > look rather ... broken as it doesn't fail if the backend doesn't<br>
> > support it.<br>
> > <br>
> > It also doesn't work for normal memory.<br>
> > <br>
> > What are you trying to do ?<br>
> > <br>
> > If you are trying to write to memory, use dma_map_single/sg or<br>
> > dma_alloc_coherent, and DMA to that.<br>
> > <br>
> > > I did get a PCIE analyzer hooked up now and I was able to get a pcie<br>
> > > trace of it:<br>
> > > <a href="https://ent.box.com/s/vx8nex5zo83xs562usd6bfeckoxbn3c8" rel="noreferrer" target="_blank">https://ent.box.com/s/<wbr>vx8nex5zo83xs562usd6bfeckoxbn3<wbr>c8</a><br>
> > > <br>
> > > The last transaction shown is my PCIE adapter writing to 0x81166000<br>
> > > -- within the memory I reserved. The transaction gets ACK'ed but<br>
> > > then no other traffic is shown. It seems the system has cut itself<br>
> > > off from this PCIE adapter at this point.<br>
> > > <br>
> > > Any ideas?<br>
> > > <br>
> > > Thanks,<br>
> > > -Brian V.<br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > <br>
> > > On Fri, Jun 1, 2018 at 1:50 PM, Brian King <<a href="mailto:brking@linux.vnet.ibm.com">brking@linux.vnet.ibm.com</a>> wrote:<br>
> > > > Any code you can share? On an LC922 system running bare metal, as long as your<br>
> > > > adapter is capable of 64 bits of DMA address space, and not all adapters are,<br>
> > > > then you would not be using the IOMMU. However, that does not mean that the<br>
> > > > physical address the host sees equals the address that is used on the PCIe<br>
> > > > link by the adapter. You need to make sure you are using the DMA-API<br>
> > > > as defined in the kernel documentation to allocate DMA'able memory to be given<br>
> > > > to the adapter. This API will give you a virtual address you can use to access<br>
> > > > the memory in the kernel as well as a dma_addr_t which is the token you give<br>
> > > > to the adapter as the DMA address. In cases where an IOMMU is in use, this would<br>
> > > > setup the translation control entry (TCE) in the IOMMU. In your case, where you<br>
> > > > are not using an IOMMU, it will do a simple translation to an address that<br>
> > > > can be used on the PCIe link.<br>
> > > > <br>
> > > > What you are seeing in the log below is an EEH error, which is an error correction<br>
> > > > feature of the Power PCI host bridge, which allows the platform to recover from<br>
> > > > various PCIe errors. In this case, its a DMA write to an invalid address. In your<br>
> > > > case the invalid address is 81166000, which is not a valid DMA address on an<br>
> > > > LC922. <br>
> > > > <br>
> > > > Thanks,<br>
> > > > <br>
> > > > Brian<br>
> > > > <br>
> > > > On 06/01/2018 11:56 AM, Brian Varney wrote:<br>
> > > > > Hello all,<br>
> > > > > <br>
> > > > > I have a LC922 system running Fedora 28 (4.16.10-300.fc28.ppc64le) and I am reserving memory by modifying the device tree passed in to the kernel as described by this forum entry: <a href="https://lists.ozlabs.org/pipermail/linuxppc-users/2017-September/000112.html" rel="noreferrer" target="_blank">https://lists.ozlabs.org/<wbr>pipermail/linuxppc-users/2017-<wbr>September/000112.html</a> <<a href="https://lists.ozlabs.org/pipermail/linuxppc-users/2017-September/000112.html" rel="noreferrer" target="_blank">https://lists.ozlabs.org/<wbr>pipermail/linuxppc-users/2017-<wbr>September/000112.html</a>><br>
> > > > > <br>
> > > > > I have a PCIE adapter plugged into the system that I am testing. When the adapter performs a DMA operation to this reserved memory, things start to go south. All reads with the adapter's BAR space start returning all FF's. I suspect the reads aren't actually making it to the adapter but I don't have a PCIE analyzer on there to verify. Then I get the following in dmesg:<br>
> > > > > <br>
> > > > > [ 340.316599] EEH: Frozen PHB#34-PE#0 detected<br>
> > > > > [ 340.316645] EEH: PE location: WIO Slot2, PHB location: N/A<br>
> > > > > [ 340.316675] CPU: 133 PID: 5380 Comm: mr Tainted: P OE 4.16.10-300.fc28.ppc64le #1<br>
> > > > > [ 340.316676] Call Trace:<br>
> > > > > [ 340.316682] [c0002004287b7a40] [c000000000bec5d0] dump_stack+0xb4/0x104 (unreliable)<br>
> > > > > [ 340.316686] [c0002004287b7a80] [c00000000003f9d0] eeh_dev_check_failure+0x4b0/<wbr>0x5b0<br>
> > > > > [ 340.316689] [c0002004287b7b20] [c0000000000b3ae8] pnv_pci_read_config+0x138/<wbr>0x170<br>
> > > > > [ 340.316692] [c0002004287b7b70] [c0000000006c7e14] pci_user_read_config_byte+<wbr>0x84/0x160<br>
> > > > > [ 340.316693] [c0002004287b7bc0] [c0000000006df1fc] pci_read_config+0x12c/0x2d0<br>
> > > > > [ 340.316696] [c0002004287b7c50] [c0000000004bceb4] sysfs_kf_bin_read+0x94/0xf0<br>
> > > > > [ 340.316698] [c0002004287b7c90] [c0000000004bbd30] kernfs_fop_read+0x130/0x2a0<br>
> > > > > [ 340.316699] [c0002004287b7ce0] [c0000000003e721c] __vfs_read+0x6c/0x1e0<br>
> > > > > [ 340.316701] [c0002004287b7d80] [c0000000003e744c] vfs_read+0xbc/0x1b0<br>
> > > > > [ 340.316703] [c0002004287b7dd0] [c0000000003e7ed4] SyS_pread64+0xc4/0x120<br>
> > > > > [ 340.316705] [c0002004287b7e30] [c00000000000b8e0] system_call+0x58/0x6c<br>
> > > > > [ 340.316734] EEH: Detected PCI bus error on PHB#34-PE#0<br>
> > > > > [ 340.316739] EEH: This PCI device has failed 1 times in the last hour<br>
> > > > > [ 340.316739] EEH: Notify device drivers to shutdown<br>
> > > > > [ 340.316746] EEH: Collect temporary log<br>
> > > > > [ 340.316780] EEH: of node=0034:01:00.0<br>
> > > > > [ 340.316783] EEH: PCI device/vendor: 00d11000<br>
> > > > > [ 340.316786] EEH: PCI cmd/status register: 00100146<br>
> > > > > [ 340.316787] EEH: PCI-E capabilities and status follow:<br>
> > > > > [ 340.316802] EEH: PCI-E 00: 0002b010 112c8023 00002950 00437d03<br>
> > > > > [ 340.316813] EEH: PCI-E 10: 10830000 00000000 00000000 00000000<br>
> > > > > [ 340.316815] EEH: PCI-E 20: 00000000<br>
> > > > > [ 340.316816] EEH: PCI-E AER capability register set follows:<br>
> > > > > [ 340.316828] EEH: PCI-E AER 00: 14820001 00000000 00400000 00462030<br>
> > > > > [ 340.316839] EEH: PCI-E AER 10: 00000000 0000e000 000001e0 00000000<br>
> > > > > [ 340.316849] EEH: PCI-E AER 20: 00000000 00000000 00000000 00000000<br>
> > > > > [ 340.316853] EEH: PCI-E AER 30: 00000000 00000000<br>
> > > > > [ 340.316855] PHB4 PHB#52 Diag-data (Version: 1)<br>
> > > > > [ 340.316855] brdgCtl: 00000002<br>
> > > > > [ 340.316857] RootSts: 00000040 00402000 a0830008 00100107 00000000<br>
> > > > > [ 340.316859] PhbSts: 0000001c00000000 0000001c00000000<br>
> > > > > [ 340.316860] Lem: 0000000010000000 0000000000000000 0000000010000000<br>
> > > > > [ 340.316862] PhbErr: 0000080000000000 0000080000000000 2148000098000240 a008400000000000<br>
> > > > > [ 340.316864] RxeArbErr: 0000000800000000 0000000800000000 7f1a01000000001b 0000000081166000<br>
> > > > > [ 340.316865] RegbErr: 0040000000000000 0000000000000000 a2000a4018000000 1800000000000000<br>
> > > > > [ 340.316868] PE[000] A/B: 8000802301000000 8000000081166000<br>
> > > > > [ 340.316871] PE[100] A/B: 80000000ff275c00 80000000300d088b<br>
> > > > > [ 340.316872] EEH: Reset with hotplug activity<br>
> > > > > [ 340.316898] iommu: Removing device 0034:01:00.0 from group 4<br>
> > > > > <br>
> > > > > <br>
> > > > > Any idea why this is happening? My suspicion is that there is iommu hardware that is not allowing my adapter to access this memory, but I am not familiar with the power9 architecture. Is there a way to disable the iommu completely or kernel functions to call to give my adapter "permision" to DMA to a memory range?<br>
> > > > > <br>
> > > > > Thanks,<br>
> > > > > Brian V.<br>
> > > > > <br>
> > > > > <br>
> > > > > <br>
> > > > > ______________________________<wbr>_________________<br>
> > > > > Linuxppc-users mailing list<br>
> > > > > <a href="mailto:Linuxppc-users@lists.ozlabs.org">Linuxppc-users@lists.ozlabs.<wbr>org</a><br>
> > > > > <a href="https://lists.ozlabs.org/listinfo/linuxppc-users" rel="noreferrer" target="_blank">https://lists.ozlabs.org/<wbr>listinfo/linuxppc-users</a><br>
> > > > > <br>
> > > > <br>
> > > > <br>
> > > > ______________________________<wbr>_________________<br>
> > > > Linuxppc-users mailing list<br>
> > > > <a href="mailto:Linuxppc-users@lists.ozlabs.org">Linuxppc-users@lists.ozlabs.<wbr>org</a><br>
> > > > <a href="https://lists.ozlabs.org/listinfo/linuxppc-users" rel="noreferrer" target="_blank">https://lists.ozlabs.org/<wbr>listinfo/linuxppc-users</a><br>
> > <br>
> <br>
> <br>
<br>
</div></div></blockquote></div><br></div>