[PATCH 0/6] drm+dma: cache support for arm, etc
Rob Clark
robdclark at chromium.org
Sat Aug 17 07:04:35 AEST 2019
On Thu, Aug 15, 2019 at 10:53 AM Christoph Hellwig <hch at lst.de> wrote:
>
> On Thu, Aug 15, 2019 at 06:54:39AM -0700, Rob Clark wrote:
> > On Wed, Aug 14, 2019 at 11:51 PM Christoph Hellwig <hch at lst.de> wrote:
> > >
> > > As said before I don't think these low-level helpers are the
> > > right API to export, but even if they did you'd just cover a tiny
> > > subset of the architectures.
> >
> > Are you thinking instead something like:
> >
> > void dma_sync_sg_for_{cpu,device}(struct device *dev, struct scatterlist *sgl,
> > int nents, enum dma_data_direction dir)
> > {
> > for_each_sg(sgl, sg, nents, i) {
> > arch_sync_dma_for_..(dev, sg_phys(sg), sg->length, dir);
> > }
> > }
> > EXPORT_SYMBOL_GPL(dma_sync_sg_for_..)
> >
> > or did you have something else in mind?
>
> No. We really need an interface thay says please give me uncached
> memory (for some definition of uncached that includes that grapics
> drivers call write combine), and then let the architecture do the right
> thing. Basically dma_alloc_coherent with DMA_ATTR_NO_KERNEL_MAPPING
> is superficially close to what you want, except that the way the drm
> drivers work you can't actually use it.
I don't disagree about needing an API to get uncached memory (or
ideally just something outside of the linear map). But I think this
is a separate problem.
What I was hoping for, for v5.4, is a way to stop abusing dma_map/sync
for cache ops to get rid of the hack I had to make for v5.3. And also
to fix vgem on non-x86. (Unfortunately changing vgem to used cached
mappings breaks x86 CI, but fixes CI on arm/arm64..) We can do that
without any changes in allocation. There is still the possibility for
problems due to cached alias, but that has been a problem this whole
time, it isn't something new.
BR,
-R
> The reason for that is if we can we really need to not create another
> uncachable alias, but instead change the page attributes in place.
> On x86 we can and must do that for example, and based on the
> conversation with Will arm64 could do that fairly easily. arm32 can
> right now only do that for CMA, though.
>
> The big question is what API do we want. I had a pretty similar
> discussion with Christian on doing such an allocation for amdgpu,
> where the device normally is cache coherent, but they actually want
> to turn it into non-coherent by using PCIe unsnooped transactions.
>
> Here is my high level plan, which still has a few lose end:
>
> (1) provide a new API:
>
> struct page *dma_alloc_pages(struct device *dev, unsigned nr_pages,
> gfp_t gfp, unsigned long flags);
> void dma_free_pages(struct device *dev, unsigned nr_pages,
> unsigned long flags);
>
> These give you back page backed memory that is guaranteed to be
> addressable by the device (no swiotlb or similar). The memory can
> then be mapped using dma_map*, including unmap and dma_sync to
> bounce ownership around. This is the replacement for the current
> dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT API, that is rather
> badly defined.
>
> (2) Add support for DMA_ATTR_NO_KERNEL_MAPPING to this new API instead
> of dma_alloc_attrs. The initial difference with that flag is just
> that we allow highmem, but in the future we could also unmap this
> memory from the kernel linear mapping entirely on architectures
> where we can easily do that.
>
> (3) Add a dma_pages_map/dma_pages_unmap or similar API that allows you
> to get a kernel mapping for parts or all of a
> DMA_ATTR_NO_KERNEL_MAPPING allocation. This is to replace things
> like your open-coded vmap in msm (or similarly elsewhere in dma-buf
> providers).
>
> (4) Add support for a DMA_ATTR_UNCACHABLE flags (or similar) to the new
> API, that maps the pages as uncachable iff they have a kernel
> mapping, including invalidating the caches at time of this page
> attribute change (or creation of a new mapping). This API will fail
> if the architecture does not allow in-place remapping. Note that for
> arm32 we could always dip into the CMA pool if one is present to not
> fail. We'll also need some helper to map from the DMA_ATTR_* flags
> to a pgprot for mapping the page to userspace. There is also a few
> other weird bits here, e.g. on architectures like mips that use an
> uncached segment we'll have to fail use with the plain
> DMA_ATTR_UNCACHABLE flag, but it could be supported with
> DMA_ATTR_UNCACHABLE | DMA_ATTR_NO_KERNEL_MAPPING.
>
> I was hoping to get most of this done for this merge window, but I'm
> probably lucky if I get at least parts done. Too much distraction.
>
> > Hmm, not entirely sure why.. you should be on the cc list for each
> > individual patch.
>
> They finally made it, although even with the delay they only ended up
> in the spam mailbox. I still can't see them on the various mailing
> lists.
More information about the Linuxppc-dev
mailing list