[PATCH 00/16] mm: Introduce MAP_BELOW_HINT
Lorenzo Stoakes
lorenzo.stoakes at oracle.com
Thu Aug 29 04:19:36 AEST 2024
On Tue, Aug 27, 2024 at 10:49:06PM GMT, Charlie Jenkins wrote:
> Some applications rely on placing data in free bits addresses allocated
> by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> address returned by mmap to be less than the maximum address space,
> unless the hint address is greater than this value.
>
> On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> flag allows applications a way to specify exactly how many bits they
> want to be left unused by mmap. This eliminates the need for
> applications to know the page table hierarchy of the system to be able
> to reason which addresses mmap will be allowed to return.
>
> ---
> riscv made this feature of mmap returning addresses less than the hint
> address the default behavior. This was in contrast to the implementation
> of x86/arm64 that have a single boundary at the 5-level page table
> region. However this restriction proved too great -- the reduced
> address space when using a hint address was too small.
>
> A patch for riscv [1] reverts the behavior that broke userspace. This
> series serves to make this feature available to all architectures.
I'm a little confused as to the justification for this - you broke RISC V by
doing this, and have now reverted it, but now offer the same behaviour that
broke RISC V to all other architectures?
I mean this is how this reads, so I might be being ungenerous here :) but would
be good to clarify what the value-add is here.
I also wonder at use of a new MAP_ flag, they're a limited resource and we
should only ever add them if we _really_ need to. This seems a bit niche and
specific to be making such a big change for including touching a bunch of pretty
sensitive arch-specific code.
We have the ability to change how mmap() functions through 'personalities'
though of course this would impact every mmap() call in the process.
Overall I'm really not hugely convinced by this, it feels like userland
could find better ways of doing this (mostly you'd do a PROT_NONE mmap() to
reserve a domain and mprotect() it on allocation or mmap() over it).
So I just struggle to see the purpose myself. BUT absolutely I may be
missing context/others may have a view on the value of this. So happy to
stand corrected.
>
> I have only tested on riscv and x86. There is a tremendous amount of
Yeah, OK this is crazy, you can't really submit something as non-RFC that
touches every single arch and not test it.
I also feel like we need more justification than 'this is a neat thing that
we use in RISC V sometimes' conceptually for such a big change.
Also your test program is currently completely broken afaict (have
commented on it directly). I also feel like your test program is a little
rudimentary, and should test some edge cases close to the limit etc.
So I think this is a NACK until there is testing across the board and a little
more justification.
Feel free to respin, but I think any future revisions should be RFC until
we're absolutely sure on testing/justification.
I appreciate your efforts here so sorry to be negative, but just obviously
want to make sure this is functional and trades off added complexity for
value for the kernel and userland :)
Thanks!
> duplicated code in mmap so the implementations across architectures I
> believe should be mostly consistent. I added this feature to all
> architectures that implement either
> arch_get_mmap_end()/arch_get_mmap_base() or
> arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
>
> Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
>
> To: Arnd Bergmann <arnd at arndb.de>
> To: Paul Walmsley <paul.walmsley at sifive.com>
> To: Palmer Dabbelt <palmer at dabbelt.com>
> To: Albert Ou <aou at eecs.berkeley.edu>
> To: Catalin Marinas <catalin.marinas at arm.com>
> To: Will Deacon <will at kernel.org>
> To: Michael Ellerman <mpe at ellerman.id.au>
> To: Nicholas Piggin <npiggin at gmail.com>
> To: Christophe Leroy <christophe.leroy at csgroup.eu>
> To: Naveen N Rao <naveen at kernel.org>
> To: Muchun Song <muchun.song at linux.dev>
> To: Andrew Morton <akpm at linux-foundation.org>
> To: Liam R. Howlett <Liam.Howlett at oracle.com>
> To: Vlastimil Babka <vbabka at suse.cz>
> To: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
> To: Thomas Gleixner <tglx at linutronix.de>
> To: Ingo Molnar <mingo at redhat.com>
> To: Borislav Petkov <bp at alien8.de>
> To: Dave Hansen <dave.hansen at linux.intel.com>
> To: x86 at kernel.org
> To: H. Peter Anvin <hpa at zytor.com>
> To: Huacai Chen <chenhuacai at kernel.org>
> To: WANG Xuerui <kernel at xen0n.name>
> To: Russell King <linux at armlinux.org.uk>
> To: Thomas Bogendoerfer <tsbogend at alpha.franken.de>
> To: James E.J. Bottomley <James.Bottomley at HansenPartnership.com>
> To: Helge Deller <deller at gmx.de>
> To: Alexander Gordeev <agordeev at linux.ibm.com>
> To: Gerald Schaefer <gerald.schaefer at linux.ibm.com>
> To: Heiko Carstens <hca at linux.ibm.com>
> To: Vasily Gorbik <gor at linux.ibm.com>
> To: Christian Borntraeger <borntraeger at linux.ibm.com>
> To: Sven Schnelle <svens at linux.ibm.com>
> To: Yoshinori Sato <ysato at users.sourceforge.jp>
> To: Rich Felker <dalias at libc.org>
> To: John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
> To: David S. Miller <davem at davemloft.net>
> To: Andreas Larsson <andreas at gaisler.com>
> To: Shuah Khan <shuah at kernel.org>
> To: Alexandre Ghiti <alexghiti at rivosinc.com>
> Cc: linux-arch at vger.kernel.org
> Cc: linux-kernel at vger.kernel.org
> Cc: Palmer Dabbelt <palmer at rivosinc.com>
> Cc: linux-riscv at lists.infradead.org
> Cc: linux-arm-kernel at lists.infradead.org
> Cc: linuxppc-dev at lists.ozlabs.org
> Cc: linux-mm at kvack.org
> Cc: loongarch at lists.linux.dev
> Cc: linux-mips at vger.kernel.org
> Cc: linux-parisc at vger.kernel.org
> Cc: linux-s390 at vger.kernel.org
> Cc: linux-sh at vger.kernel.org
> Cc: sparclinux at vger.kernel.org
> Cc: linux-kselftest at vger.kernel.org
> Signed-off-by: Charlie Jenkins <charlie at rivosinc.com>
>
> ---
> Charlie Jenkins (16):
> mm: Add MAP_BELOW_HINT
> riscv: mm: Do not restrict mmap address based on hint
> mm: Add flag and len param to arch_get_mmap_base()
> mm: Add generic MAP_BELOW_HINT
> riscv: mm: Support MAP_BELOW_HINT
> arm64: mm: Support MAP_BELOW_HINT
> powerpc: mm: Support MAP_BELOW_HINT
> x86: mm: Support MAP_BELOW_HINT
> loongarch: mm: Support MAP_BELOW_HINT
> arm: mm: Support MAP_BELOW_HINT
> mips: mm: Support MAP_BELOW_HINT
> parisc: mm: Support MAP_BELOW_HINT
> s390: mm: Support MAP_BELOW_HINT
> sh: mm: Support MAP_BELOW_HINT
> sparc: mm: Support MAP_BELOW_HINT
> selftests/mm: Create MAP_BELOW_HINT test
>
> arch/arm/mm/mmap.c | 10 ++++++++
> arch/arm64/include/asm/processor.h | 34 ++++++++++++++++++++++----
> arch/loongarch/mm/mmap.c | 11 +++++++++
> arch/mips/mm/mmap.c | 9 +++++++
> arch/parisc/include/uapi/asm/mman.h | 1 +
> arch/parisc/kernel/sys_parisc.c | 9 +++++++
> arch/powerpc/include/asm/task_size_64.h | 36 +++++++++++++++++++++++-----
> arch/riscv/include/asm/processor.h | 32 -------------------------
> arch/s390/mm/mmap.c | 10 ++++++++
> arch/sh/mm/mmap.c | 10 ++++++++
> arch/sparc/kernel/sys_sparc_64.c | 8 +++++++
> arch/x86/kernel/sys_x86_64.c | 25 ++++++++++++++++---
> fs/hugetlbfs/inode.c | 2 +-
> include/linux/sched/mm.h | 34 ++++++++++++++++++++++++--
> include/uapi/asm-generic/mman-common.h | 1 +
> mm/mmap.c | 2 +-
> tools/arch/parisc/include/uapi/asm/mman.h | 1 +
> tools/include/uapi/asm-generic/mman-common.h | 1 +
> tools/testing/selftests/mm/Makefile | 1 +
> tools/testing/selftests/mm/map_below_hint.c | 29 ++++++++++++++++++++++
> 20 files changed, 216 insertions(+), 50 deletions(-)
> ---
> base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> --
> - Charlie
>
>
> _______________________________________________
> linux-riscv mailing list
> linux-riscv at lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
More information about the Linuxppc-dev
mailing list