[PATCH 00/21] dma-mapping: unify support for cache flushes
Arnd Bergmann
arnd at kernel.org
Mon Mar 27 23:12:56 AEDT 2023
From: Arnd Bergmann <arnd at arndb.de>
After a long discussion about adding SoC specific semantics for when
to flush caches in drivers/soc/ drivers that we determined to be
fundamentally flawed[1], I volunteered to try to move that logic into
architecture-independent code and make all existing architectures do
the same thing.
As we had determined earlier, the behavior is wildly different across
architectures, but most of the differences come down to either bugs
(when required flushes are missing) or extra flushes that are harmless
but might hurt performance.
I finally found the time to come up with an implementation of this, which
starts by replacing every outlier with one of the three common options:
1. architectures without speculative prefetching (hegagon, m68k,
openrisc, sh, sparc, and certain armv4 and xtensa implementations)
only flush their caches before a DMA, by cleaning write-back caches
(if any) before a DMA to the device, and by invalidating the caches
before a DMA from a device
2. arc, microblaze, mips, nios2, sh and later xtensa now follow the
normal 32-bit arm model and invalidate their writeback caches
again after a DMA from the device, to remove stale cache lines
that got prefetched during the DMA. arc, csky and mips used to
invalidate buffers also before the bidirectional DMA, but this
is now skipped whenever we know it gets invalidated again
after the DMA.
3. parisc, powerpc and riscv already flushed buffers before
a DMA_FROM_DEVICE, and these get moved to the arm64 behavior
that does the writeback before and invalidate after both
DMA_FROM_DEVICE and DMA_BIDIRECTIONAL in order to avoid the
problem of accidentally leaking stale data if the DMA does
not actually happen[2].
The last patch in the series replaces the architecture specific code
with a shared version that implements all three based on architecture
specific parameters that are almost always determined at compile time.
The difference between cases 1. and 2. is hardware specific, while between
2. and 3. we need to decide which semantics we want, but I explicitly
avoid this question in my series and leave it to be decided later.
Another difference that I do not address here is what cache invalidation
does for partical cache lines. On arm32, arm64 and powerpc, a partial
cache line always gets written back before invalidation in order to
ensure that data before or after the buffer is not discarded. On all
other architectures, the assumption is cache lines are never shared
between DMA buffer and data that is accessed by the CPU. If we end up
always writing back dirty cache lines before a DMA (option 3 above),
then this point becomes moot, otherwise we should probably address this
in a follow-up series to document one behavior or the other and implement
it consistently.
Please review!
Arnd
[1] https://lore.kernel.org/all/20221212115505.36770-1-prabhakar.mahadev-lad.rj@bp.renesas.com/
[2] https://lore.kernel.org/all/20220606152150.GA31568@willie-the-truck/
Arnd Bergmann (21):
openrisc: dma-mapping: flush bidirectional mappings
xtensa: dma-mapping: use normal cache invalidation rules
sparc32: flush caches in dma_sync_*for_device
microblaze: dma-mapping: skip extra DMA flushes
powerpc: dma-mapping: split out cache operation logic
powerpc: dma-mapping: minimize for_cpu flushing
powerpc: dma-mapping: always clean cache in _for_device() op
riscv: dma-mapping: only invalidate after DMA, not flush
riscv: dma-mapping: skip invalidation before bidirectional DMA
csky: dma-mapping: skip invalidating before DMA from device
mips: dma-mapping: skip invalidating before bidirectional DMA
mips: dma-mapping: split out cache operation logic
arc: dma-mapping: skip invalidating before bidirectional DMA
parisc: dma-mapping: use regular flush/invalidate ops
ARM: dma-mapping: always invalidate WT caches before DMA
ARM: dma-mapping: bring back dmac_{clean,inv}_range
ARM: dma-mapping: use arch_sync_dma_for_{device,cpu}() internally
ARM: drop SMP support for ARM11MPCore
ARM: dma-mapping: use generic form of arch_sync_dma_* helpers
ARM: dma-mapping: split out arch_dma_mark_clean() helper
dma-mapping: replace custom code with generic implementation
arch/arc/mm/dma.c | 66 ++------
arch/arm/Kconfig | 4 +
arch/arm/include/asm/cacheflush.h | 21 +++
arch/arm/include/asm/glue-cache.h | 4 +
arch/arm/mach-oxnas/Kconfig | 4 -
arch/arm/mach-oxnas/Makefile | 1 -
arch/arm/mach-oxnas/headsmp.S | 23 ---
arch/arm/mach-oxnas/platsmp.c | 96 -----------
arch/arm/mach-versatile/platsmp-realview.c | 4 -
arch/arm/mm/Kconfig | 19 ---
arch/arm/mm/cache-fa.S | 4 +-
arch/arm/mm/cache-nop.S | 6 +
arch/arm/mm/cache-v4.S | 13 +-
arch/arm/mm/cache-v4wb.S | 4 +-
arch/arm/mm/cache-v4wt.S | 22 ++-
arch/arm/mm/cache-v6.S | 35 +---
arch/arm/mm/cache-v7.S | 6 +-
arch/arm/mm/cache-v7m.S | 4 +-
arch/arm/mm/dma-mapping-nommu.c | 36 ++--
arch/arm/mm/dma-mapping.c | 181 ++++++++++-----------
arch/arm/mm/proc-arm1020.S | 4 +-
arch/arm/mm/proc-arm1020e.S | 4 +-
arch/arm/mm/proc-arm1022.S | 4 +-
arch/arm/mm/proc-arm1026.S | 4 +-
arch/arm/mm/proc-arm920.S | 4 +-
arch/arm/mm/proc-arm922.S | 4 +-
arch/arm/mm/proc-arm925.S | 4 +-
arch/arm/mm/proc-arm926.S | 4 +-
arch/arm/mm/proc-arm940.S | 4 +-
arch/arm/mm/proc-arm946.S | 4 +-
arch/arm/mm/proc-feroceon.S | 8 +-
arch/arm/mm/proc-macros.S | 2 +
arch/arm/mm/proc-mohawk.S | 4 +-
arch/arm/mm/proc-xsc3.S | 4 +-
arch/arm/mm/proc-xscale.S | 6 +-
arch/arm64/mm/dma-mapping.c | 28 ++--
arch/csky/mm/dma-mapping.c | 46 +++---
arch/hexagon/kernel/dma.c | 44 ++---
arch/m68k/kernel/dma.c | 43 +++--
arch/microblaze/kernel/dma.c | 38 ++---
arch/mips/mm/dma-noncoherent.c | 75 +++------
arch/nios2/mm/dma-mapping.c | 57 +++----
arch/openrisc/kernel/dma.c | 62 ++++---
arch/parisc/include/asm/cacheflush.h | 6 +-
arch/parisc/kernel/pci-dma.c | 33 +++-
arch/powerpc/mm/dma-noncoherent.c | 76 +++++----
arch/riscv/mm/dma-noncoherent.c | 51 +++---
arch/sh/kernel/dma-coherent.c | 43 +++--
arch/sparc/Kconfig | 2 +-
arch/sparc/kernel/ioport.c | 38 +++--
arch/xtensa/Kconfig | 1 -
arch/xtensa/include/asm/cacheflush.h | 6 +-
arch/xtensa/kernel/pci-dma.c | 47 +++---
include/linux/dma-sync.h | 107 ++++++++++++
54 files changed, 721 insertions(+), 699 deletions(-)
delete mode 100644 arch/arm/mach-oxnas/headsmp.S
delete mode 100644 arch/arm/mach-oxnas/platsmp.c
create mode 100644 include/linux/dma-sync.h
--
2.39.2
Cc: Vineet Gupta <vgupta at kernel.org>
Cc: Russell King <linux at armlinux.org.uk>
Cc: Neil Armstrong <neil.armstrong at linaro.org>
Cc: Linus Walleij <linus.walleij at linaro.org>
Cc: Catalin Marinas <catalin.marinas at arm.com>
Cc: Will Deacon <will at kernel.org>
Cc: Guo Ren <guoren at kernel.org>
Cc: Brian Cain <bcain at quicinc.com>
Cc: Geert Uytterhoeven <geert at linux-m68k.org>
Cc: Michal Simek <monstr at monstr.eu>
Cc: Thomas Bogendoerfer <tsbogend at alpha.franken.de>
Cc: Dinh Nguyen <dinguyen at kernel.org>
Cc: Stafford Horne <shorne at gmail.com>
Cc: Helge Deller <deller at gmx.de>
Cc: Michael Ellerman <mpe at ellerman.id.au>
Cc: Christophe Leroy <christophe.leroy at csgroup.eu>
Cc: Paul Walmsley <paul.walmsley at sifive.com>
Cc: Palmer Dabbelt <palmer at dabbelt.com>
Cc: Rich Felker <dalias at libc.org>
Cc: John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
Cc: "David S. Miller" <davem at davemloft.net>
Cc: Max Filippov <jcmvbkbc at gmail.com>
Cc: Christoph Hellwig <hch at lst.de>
Cc: Robin Murphy <robin.murphy at arm.com>
Cc: Lad Prabhakar <prabhakar.mahadev-lad.rj at bp.renesas.com>
Cc: Conor Dooley <conor.dooley at microchip.com>
Cc: linux-snps-arc at lists.infradead.org
Cc: linux-kernel at vger.kernel.org
Cc: linux-arm-kernel at lists.infradead.org
Cc: linux-oxnas at groups.io
Cc: linux-csky at vger.kernel.org
Cc: linux-hexagon at vger.kernel.org
Cc: linux-m68k at lists.linux-m68k.org
Cc: linux-mips at vger.kernel.org
Cc: linux-openrisc at vger.kernel.org
Cc: linux-parisc at vger.kernel.org
Cc: linuxppc-dev at lists.ozlabs.org
Cc: linux-riscv at lists.infradead.org
Cc: linux-sh at vger.kernel.org
Cc: sparclinux at vger.kernel.org
Cc: linux-xtensa at linux-xtensa.org
More information about the Linuxppc-dev
mailing list