[PATCH] dma-mapping: fix page attributes for dma_mmap_*

Russell King - ARM Linux admin linux at armlinux.org.uk
Wed Aug 7 02:48:16 AEST 2019


On Tue, Aug 06, 2019 at 05:45:03PM +0100, Russell King - ARM Linux admin wrote:
> On Tue, Aug 06, 2019 at 05:08:54PM +0100, Will Deacon wrote:
> > On Sat, Aug 03, 2019 at 08:48:12AM +0200, Christoph Hellwig wrote:
> > > On Fri, Aug 02, 2019 at 11:38:03AM +0100, Will Deacon wrote:
> > > > 
> > > > So this boils down to a terminology mismatch. The Arm architecture doesn't have
> > > > anything called "write combine", so in Linux we instead provide what the Arm
> > > > architecture calls "Normal non-cacheable" memory for pgprot_writecombine().
> > > > Amongst other things, this memory type permits speculation, unaligned accesses
> > > > and merging of writes. I found something in the architecture spec about
> > > > non-cachable memory, but it's written in Armglish[1].
> > > > 
> > > > pgprot_noncached(), on the other hand, provides what the architecture calls
> > > > Strongly Ordered or Device-nGnRnE memory. This is intended for mapping MMIO
> > > > (i.e. PCI config space) and therefore forbids speculation, preserves access
> > > > size, requires strict alignment and also forces write responses to come from
> > > > the endpoint.
> > > > 
> > > > I think the naming mismatch is historical, but on arm64 we wanted to use the
> > > > same names as arm32 so that any drivers using these things directly would get
> > > > the same behaviour.
> > > 
> > > That all makes sense, but it totally needs a comment.  I'll try to draft
> > > one based on this.  I've also looked at the arm32 code a bit more, and
> > > it seems arm always (?) supported Normal non-cacheable attribute, but
> > > Linux only optionally uses it for arm v6+ because of fears of drivers
> > > missing barriers.
> > 
> > I think it was also to do with aliasing, but I don't recall all of the
> > details.
> 
> ARMv6+ is where the architecture significantly changed to introduce
> the idea of [Normal, Device, Strongly Ordered] where Normal has the
> cache attributes.
> 
> Before that, we had just "uncached/unbuffered, uncached/buffered,
> cached/unbuffered, cached/buffered" modes.
> 
> The write buffer (enabled by buffered modes) has no architected
> guarantees about how long writes will sit in it, and there is only
> the "drain write buffer" instruction to push writes out.
> 
> Up to and including ARMv5, we took the easy approach of just using
> the "uncached/unbuffered" mode since that is (a) the safest, and (b)
> avoids write buffers that alias when there are multiple different
> mappings.
> 
> We could have used a different approach, making all IO writes contain
> a "drain write buffer" instruction, and map DMA memory as "buffered",
> but as there were no Linux barriers defined to order memory accesses
> to DMA memory (so, for example, ring buffers can be updated in the
> correct order) back in those days, using the uncached/unbuffered mode
> was the sanest and most reliable solution.
> 
> > 
> > > The other really weird things is that in arm32
> > > pgprot_dmacoherent incudes the L_PTE_XN bit, which from my understanding
> > > is the no-execture bit, but pgprot_writecombine does not.  This seems to
> > > not very unintentional.  So minus that the whole DMA_ATTR_WRITE_COMBІNE
> > > seems to be about flagging old arm specific drivers as having the proper
> > > barriers in places and otherwise is a no-op.
> > 
> > I think it only matters for Armv7 CPUs, but yes, we should probably be
> > setting L_PTE_XN for both of these memory types.
> 
> Conventionally, pgprot_writecombine() has only been used to change
> the memory type and not the permissions.  Since writecombine memory
> is still capable of being executed, I don't see any reason to set XN
> for it.
> 
> If the user wishes to mmap() using PROT_READ|PROT_EXEC, then is there
> really a reason for writecombine to set XN overriding the user?
> 
> That said, pgprot_writecombine() is mostly used for framebuffers, which
> arguably shouldn't be executable anyway - but who'd want to mmap() the
> framebuffer with PROT_EXEC?
> 
> > 
> > > Here is my tentative plan:
> > > 
> > >  - respin this patch with a small fix to handle the
> > >    DMA_ATTR_NON_CONSISTENT (as in ignore it unless actually supported),
> > >    but keep the name as-is to avoid churn.  This should allow 5.3
> > >    inclusion and backports
> > >  - remove DMA_ATTR_WRITE_COMBINE support from mips, probably also 5.3
> > >    material.
> > >  - move all architectures but arm over to just define
> > >    pgprot_dmacoherent, including a comment with the above explanation
> > >    for arm64.
> > 
> > That would be great, thanks.
> > 
> > >  - make DMA_ATTR_WRITE_COMBINE a no-op and schedule it for removal,
> > >    thus removing the last instances of arch_dma_mmap_pgprot
> > 
> > All sounds good to me, although I suppose 32-bit Arm platforms without
> > CONFIG_ARM_DMA_MEM_BUFFERABLE may run into issues if DMA_ATTR_WRITE_COMBINE
> > disappears. Only one way to find out...
> 
> Looking at the results of grep, I think only OMAP2+ and Exynos may be
> affected.
> 
> However, removing writecombine support from the DMA API is going to
> have a huge impact for framebuffers on earlier ARMs - that's where we
> do expect framebuffers to be mapped "uncached/buffered" for performance
> reasons and not "uncached/unbuffered".  It's quite literally the
> difference between console scrolling being usable and totally unusable.
> 
> Given what I've said above, switching to using buffered mode for normal
> DMA mappings is data-corrupting risky - as in your filesystem could get
> fried.  I don't think we should play fast and loose with people's data
> by randomly changing that "because we'd like to", and I don't see that
> screwing the console is really an option either.

Sorry, I forgot to explain - the reason is dma_alloc_writecombine()
internally uses DMA_ATTR_WRITE_COMBINE, which I'd forgotten about
when grepping - so there's potentially way more users than my greps
above found.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


More information about the Linuxppc-dev mailing list