[PATCH 0/6] drm+dma: cache support for arm, etc
Christoph Hellwig
hch at lst.de
Fri Aug 16 03:53:46 AEST 2019
On Thu, Aug 15, 2019 at 06:54:39AM -0700, Rob Clark wrote:
> On Wed, Aug 14, 2019 at 11:51 PM Christoph Hellwig <hch at lst.de> wrote:
> >
> > As said before I don't think these low-level helpers are the
> > right API to export, but even if they did you'd just cover a tiny
> > subset of the architectures.
>
> Are you thinking instead something like:
>
> void dma_sync_sg_for_{cpu,device}(struct device *dev, struct scatterlist *sgl,
> int nents, enum dma_data_direction dir)
> {
> for_each_sg(sgl, sg, nents, i) {
> arch_sync_dma_for_..(dev, sg_phys(sg), sg->length, dir);
> }
> }
> EXPORT_SYMBOL_GPL(dma_sync_sg_for_..)
>
> or did you have something else in mind?
No. We really need an interface thay says please give me uncached
memory (for some definition of uncached that includes that grapics
drivers call write combine), and then let the architecture do the right
thing. Basically dma_alloc_coherent with DMA_ATTR_NO_KERNEL_MAPPING
is superficially close to what you want, except that the way the drm
drivers work you can't actually use it.
The reason for that is if we can we really need to not create another
uncachable alias, but instead change the page attributes in place.
On x86 we can and must do that for example, and based on the
conversation with Will arm64 could do that fairly easily. arm32 can
right now only do that for CMA, though.
The big question is what API do we want. I had a pretty similar
discussion with Christian on doing such an allocation for amdgpu,
where the device normally is cache coherent, but they actually want
to turn it into non-coherent by using PCIe unsnooped transactions.
Here is my high level plan, which still has a few lose end:
(1) provide a new API:
struct page *dma_alloc_pages(struct device *dev, unsigned nr_pages,
gfp_t gfp, unsigned long flags);
void dma_free_pages(struct device *dev, unsigned nr_pages,
unsigned long flags);
These give you back page backed memory that is guaranteed to be
addressable by the device (no swiotlb or similar). The memory can
then be mapped using dma_map*, including unmap and dma_sync to
bounce ownership around. This is the replacement for the current
dma_alloc_attrs with DMA_ATTR_NON_CONSISTENT API, that is rather
badly defined.
(2) Add support for DMA_ATTR_NO_KERNEL_MAPPING to this new API instead
of dma_alloc_attrs. The initial difference with that flag is just
that we allow highmem, but in the future we could also unmap this
memory from the kernel linear mapping entirely on architectures
where we can easily do that.
(3) Add a dma_pages_map/dma_pages_unmap or similar API that allows you
to get a kernel mapping for parts or all of a
DMA_ATTR_NO_KERNEL_MAPPING allocation. This is to replace things
like your open-coded vmap in msm (or similarly elsewhere in dma-buf
providers).
(4) Add support for a DMA_ATTR_UNCACHABLE flags (or similar) to the new
API, that maps the pages as uncachable iff they have a kernel
mapping, including invalidating the caches at time of this page
attribute change (or creation of a new mapping). This API will fail
if the architecture does not allow in-place remapping. Note that for
arm32 we could always dip into the CMA pool if one is present to not
fail. We'll also need some helper to map from the DMA_ATTR_* flags
to a pgprot for mapping the page to userspace. There is also a few
other weird bits here, e.g. on architectures like mips that use an
uncached segment we'll have to fail use with the plain
DMA_ATTR_UNCACHABLE flag, but it could be supported with
DMA_ATTR_UNCACHABLE | DMA_ATTR_NO_KERNEL_MAPPING.
I was hoping to get most of this done for this merge window, but I'm
probably lucky if I get at least parts done. Too much distraction.
> Hmm, not entirely sure why.. you should be on the cc list for each
> individual patch.
They finally made it, although even with the delay they only ended up
in the spam mailbox. I still can't see them on the various mailing
lists.
More information about the Linuxppc-dev
mailing list