[PATCH 0/6] drm+dma: cache support for arm, etc

Sat Aug 17 07:04:35 AEST 2019

On Thu, Aug 15, 2019 at 10:53 AM Christoph Hellwig <hch at lst.de> wrote:
>
> On Thu, Aug 15, 2019 at 06:54:39AM -0700, Rob Clark wrote:
> > On Wed, Aug 14, 2019 at 11:51 PM Christoph Hellwig <hch at lst.de> wrote:
> > >
> > > As said before I don't think these low-level helpers are the
> > > right API to export, but even if they did you'd just cover a tiny
> > > subset of the architectures.
> >
> > Are you thinking instead something like:
> >
> > void dma_sync_sg_for_{cpu,device}(struct device *dev, struct scatterlist *sgl,
> >                                   int nents, enum dma_data_direction dir)
> > {
> >     for_each_sg(sgl, sg, nents, i) {
> >         arch_sync_dma_for_..(dev, sg_phys(sg), sg->length, dir);
> >     }
> > }
> > EXPORT_SYMBOL_GPL(dma_sync_sg_for_..)
> >
> > or did you have something else in mind?
>
> No.  We really need an interface thay says please give me uncached
> memory (for some definition of uncached that includes that grapics
> drivers call write combine), and then let the architecture do the right
> thing.  Basically dma_alloc_coherent with DMA_ATTR_NO_KERNEL_MAPPING
> is superficially close to what you want, except that the way the drm
> drivers work you can't actually use it.

I don't disagree about needing an API to get uncached memory (or
ideally just something outside of the linear map).  But I think this
is a separate problem.

What I was hoping for, for v5.4, is a way to stop abusing dma_map/sync
for cache ops to get rid of the hack I had to make for v5.3.  And also
to fix vgem on non-x86.  (Unfortunately changing vgem to used cached
mappings breaks x86 CI, but fixes CI on arm/arm64..)  We can do that
without any changes in allocation.  There is still the possibility for
problems due to cached alias, but that has been a problem this whole
time, it isn't something new.

BR,
-R

> The reason for that is if we can we really need to not create another
> uncachable alias, but instead change the page attributes in place.
> On x86 we can and must do that for example, and based on the
> conversation with Will arm64 could do that fairly easily.  arm32 can
> right now only do that for CMA, though.
>
> The big question is what API do we want.  I had a pretty similar
> discussion with Christian on doing such an allocation for amdgpu,
> where the device normally is cache coherent, but they actually want
> to turn it into non-coherent by using PCIe unsnooped transactions.
>
> Here is my high level plan, which still has a few lose end:
>
>  (1) provide a new API:
>
>         struct page *dma_alloc_pages(struct device *dev, unsigned nr_pages,
>                         gfp_t gfp, unsigned long flags);
>         void dma_free_pages(struct device *dev, unsigned nr_pages,
>                         unsigned long flags);
>
>      These give you back page backed memory that is guaranteed to be
>      addressable by the device (no swiotlb or similar).  The memory can
>      then be mapped using dma_map*, including unmap and dma_sync to
>      bounce ownership around.  This is the replacement for the current
>      dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT API, that is rather
>      badly defined.
>
>  (2) Add support for DMA_ATTR_NO_KERNEL_MAPPING to this new API instead
>      of dma_alloc_attrs.  The initial difference with that flag is just
>      that we allow highmem, but in the future we could also unmap this
>      memory from the kernel linear mapping entirely on architectures
>      where we can easily do that.
>
>  (3) Add a dma_pages_map/dma_pages_unmap or similar API that allows you
>      to get a kernel mapping for parts or all of a
>      DMA_ATTR_NO_KERNEL_MAPPING allocation.  This is to replace things
>      like your open-coded vmap in msm (or similarly elsewhere in dma-buf
>      providers).
>
>  (4) Add support for a DMA_ATTR_UNCACHABLE flags (or similar) to the new
>      API, that maps the pages as uncachable iff they have a kernel
>      mapping, including invalidating the caches at time of this page
>      attribute change (or creation of a new mapping).  This API will fail
>      if the architecture does not allow in-place remapping.  Note that for
>      arm32 we could always dip into the CMA pool if one is present to not
>      fail.  We'll also need some helper to map from the DMA_ATTR_* flags
>      to a pgprot for mapping the page to userspace.  There is also a few
>      other weird bits here, e.g. on architectures like mips that use an
>      uncached segment we'll have to fail use with the plain
>      DMA_ATTR_UNCACHABLE flag, but it could be supported with
>      DMA_ATTR_UNCACHABLE | DMA_ATTR_NO_KERNEL_MAPPING.
>
> I was hoping to get most of this done for this merge window, but I'm
> probably lucky if I get at least parts done.  Too much distraction.
>
> > Hmm, not entirely sure why.. you should be on the cc list for each
> > individual patch.
>
> They finally made it, although even with the delay they only ended up
> in the spam mailbox.  I still can't see them on the various mailing
> lists.