[PATCH 20/21] ARM: dma-mapping: split out arch_dma_mark_clean() helper

Arnd Bergmann arnd at arndb.de
Sat Apr 1 04:20:10 AEDT 2023


On Fri, Mar 31, 2023, at 17:12, Robin Murphy wrote:
> On 31/03/2023 3:00 pm, Arnd Bergmann wrote:
>> On Mon, Mar 27, 2023, at 14:48, Robin Murphy wrote:
>> 
>> To be on the safe side, I'd have to pass a flag into
>> arch_dma_mark_clean() about coherency, to let the arm
>> implementation still require the extra dcache flush
>> for coherent DMA, while ia64 can ignore that flag.
>
> Coherent DMA on Arm is assumed to be inner-shareable, so a coherent DMA 
> write should be pretty much equivalent to a coherent write by another 
> CPU (or indeed the local CPU itself) - nothing says that it *couldn't* 
> dirty a line in a data cache above the level of unification, so in 
> general the assumption must be that, yes, if coherent DMA is writing 
> data intended to be executable, then it's going to want a Dcache clean 
> to PoU and an Icache invalidate to PoU before trying to execute it. By 
> comparison, a non-coherent DMA transfer will inherently have to 
> invalidate the Dcache all the way to PoC in its dma_unmap, thus cannot 
> leave dirty data above the PoU, so only the Icache maintenance is 
> required in the executable case.

Ok, makes sense. I've already started reworking my patch for it.

> (FWIW I believe the Armv8 IDC/DIC features can safely be considered 
> irrelevant to 32-bit kernels)
>
> I don't know a great deal about IA-64, but it appears to be using its 
> PG_arch_1 flag in a subtly different manner to Arm, namely to optimise 
> out the *Icache* maintenance. So if anything, it seems IA-64 is the 
> weirdo here (who'd have guessed?) where DMA manages to be *more* 
> coherent than the CPUs themselves :)

I checked this in the ia64 manual, and as far as I can tell, it originally
only had one cacheflush instruction that flushes the dcache and invalidates
the icache at the same time. So flush_icache_range() actually does
both and flush_dcache_page() instead just marks the page as dirty to
ensure flush_icache_range() does not get skipped after a writing a
page from the kernel.

On later Itaniums, there is apparently a separate icache flush
instruction that gets used in flush_icache_range(), but that
still works for the DMA case that is allowed to skip the flush.

> This is all now making me think we need some careful consideration of 
> whether the benefits of consolidating code outweigh the confusion of 
> conflating multiple different meanings of "clean" together...

The difference in usage of PG_dcache_clean/PG_dcache_dirty/PG_arch_1
across architectures is certainly big enough that we can't just
define a a common arch_dma_mark_clean() across architectures, but
I think the idea of having a common entry point for
arch_dma_mark_clean() to be called from the dma-mapping code
to do something architecture specific after a DMA is clean still
makes sense, 

     Arnd


More information about the Linuxppc-dev mailing list