[PATCH 00/16] mm: Introduce MAP_BELOW_HINT

Charlie Jenkins charlie at rivosinc.com
Thu Aug 29 06:59:18 AEST 2024


On Wed, Aug 28, 2024 at 02:31:42PM -0400, Liam R. Howlett wrote:
> * Charlie Jenkins <charlie at rivosinc.com> [240828 01:49]:
> > Some applications rely on placing data in free bits addresses allocated
> > by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
> > address returned by mmap to be less than the maximum address space,
> > unless the hint address is greater than this value.
> 
> Wait, what arch(s) allows for greater than the max?  The passed hint
> should be where we start searching, but we go to the lower limit then
> start at the hint and search up (or vice-versa on the directions).
> 

I worded this awkwardly. On arm64 there is a page-table boundary at 48
bits and at 52 bits. On x86 the boundaries are at 48 bits and 57 bits.
The max value mmap is able to return on arm64 is 48 bits if the hint
address uses 48 bits or less, even if the architecture supports 5-level
paging and thus addresses can be 52 bits. Applications can opt-in to
using up to 52-bits in an address by using a hint address greater than
48 bits. x86 has the same behavior but with 57 bits instead of 52.

This reason this exists is because some applications arbitrarily replace
bits in virtual addresses with data with an assumption that the address
will not be using any of the bits above bit 48 in the virtual address.
As hardware with larger address spaces was released, x86 decided to
build safety guards into the kernel to allow the applications that made
these assumptions to continue to work on this different hardware.

This causes all application that use a hint address to silently be
restricted to 48-bit addresses. The goal of this flag is to have a way
for applications to explicitly request how many bits they want mmap to
use.

> I don't understand how unmapping works on a higher address; we would
> fail to free it on termination of the application.
> 
> Also, there are archs that map outside of the VMAs, which are freed by
> freeing from the prev->vm_end to next->vm_start, so I don't understand
> what that looks like in this reality as well.
> 
> > 
> > On arm64 this barrier is at 52 bits and on x86 it is at 56 bits. This
> > flag allows applications a way to specify exactly how many bits they
> > want to be left unused by mmap. This eliminates the need for
> > applications to know the page table hierarchy of the system to be able
> > to reason which addresses mmap will be allowed to return.
> 
> But, why do they need to know today?  We have a limit for this don't we?

The limit is different for different architectures. On x86 the limit is
57 bits, and on arm64 it is 52 bits. So in the theoretical case that an
application requires 10 bits free in a virtual address, the application
would always work on arm64 regardless of the hint address, but on x86 if
the hint address is greater than 48 bits then the application will not
work.

The goal of this flag is to have consistent and tunable behavior of
mmap() when it is desired to ensure that mmap() only returns addresses
that use some number of bits.

> 
> Also, these upper limits are how some archs use the upper bits that you
> are trying to use.
> 

It does not eliminate the existing behavior of the architectures to
place this upper limits, it instead provides a way to have consistent
behavior across all architectures.

> > 
> > ---
> > riscv made this feature of mmap returning addresses less than the hint
> > address the default behavior. This was in contrast to the implementation
> > of x86/arm64 that have a single boundary at the 5-level page table
> > region. However this restriction proved too great -- the reduced
> > address space when using a hint address was too small.
> 
> Yes, the hint is used to group things close together so it would
> literally be random chance on if you have enough room or not (aslr and
> all).
> 
> > 
> > A patch for riscv [1] reverts the behavior that broke userspace. This
> > series serves to make this feature available to all architectures.
> 
> I don't fully understand this statement, you say it broke userspace so
> now you are porting it to everyone?  This reads as if you are braking
> the userspace on all architectures :)

It was the default for mmap on riscv. The difference here is that it is now
enabled by a flag instead. Instead of making the flag specific to riscv,
I figured that other architectures might find it useful as well.

> 
> If you fail to find room below, then your application fails as there is
> no way to get the upper bits you need.  It would be better to fix this
> in userspace - if your application is returned too high an address, then
> free it and exit because it's going to fail anyways.
> 

This flag is trying to define an API that is more robust than the
current behavior on that x86 and arm64 which implicitly restricts mmap()
addresses to 48 bits. A solution could be to just write in the docs that
mmap() will always exhaust all addresses below the hint address before
returning an address that is above the hint address. However a flag that
defines this behavior seems more intuitive.

> > 
> > I have only tested on riscv and x86.
> 
> This should be an RFC then.

Fair enough.

> 
> > There is a tremendous amount of
> > duplicated code in mmap so the implementations across architectures I
> > believe should be mostly consistent. I added this feature to all
> > architectures that implement either
> > arch_get_mmap_end()/arch_get_mmap_base() or
> > arch_get_unmapped_area_topdown()/arch_get_unmapped_area(). I also added
> > it to the default behavior for arch_get_mmap_end()/arch_get_mmap_base().
> 
> Way too much duplicate code.  We should be figuring out how to make this
> all work with the same code.
> 
> This is going to make the cloned code problem worse.

That would require standardizing every architecture with the generic
mmap() framework that arm64 has developed. That is far outside the scope
of this patch, but would be a great area to research for each of the
architectures that do not use the generic framework.

- Charlie

> 
> > 
> > Link: https://lore.kernel.org/lkml/20240826-riscv_mmap-v1-2-cd8962afe47f@rivosinc.com/T/ [1]
> > 
> > To: Arnd Bergmann <arnd at arndb.de>
> > To: Paul Walmsley <paul.walmsley at sifive.com>
> > To: Palmer Dabbelt <palmer at dabbelt.com>
> > To: Albert Ou <aou at eecs.berkeley.edu>
> > To: Catalin Marinas <catalin.marinas at arm.com>
> > To: Will Deacon <will at kernel.org>
> > To: Michael Ellerman <mpe at ellerman.id.au>
> > To: Nicholas Piggin <npiggin at gmail.com>
> > To: Christophe Leroy <christophe.leroy at csgroup.eu>
> > To: Naveen N Rao <naveen at kernel.org>
> > To: Muchun Song <muchun.song at linux.dev>
> > To: Andrew Morton <akpm at linux-foundation.org>
> > To: Liam R. Howlett <Liam.Howlett at oracle.com>
> > To: Vlastimil Babka <vbabka at suse.cz>
> > To: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
> > To: Thomas Gleixner <tglx at linutronix.de>
> > To: Ingo Molnar <mingo at redhat.com>
> > To: Borislav Petkov <bp at alien8.de>
> > To: Dave Hansen <dave.hansen at linux.intel.com>
> > To: x86 at kernel.org
> > To: H. Peter Anvin <hpa at zytor.com>
> > To: Huacai Chen <chenhuacai at kernel.org>
> > To: WANG Xuerui <kernel at xen0n.name>
> > To: Russell King <linux at armlinux.org.uk>
> > To: Thomas Bogendoerfer <tsbogend at alpha.franken.de>
> > To: James E.J. Bottomley <James.Bottomley at HansenPartnership.com>
> > To: Helge Deller <deller at gmx.de>
> > To: Alexander Gordeev <agordeev at linux.ibm.com>
> > To: Gerald Schaefer <gerald.schaefer at linux.ibm.com>
> > To: Heiko Carstens <hca at linux.ibm.com>
> > To: Vasily Gorbik <gor at linux.ibm.com>
> > To: Christian Borntraeger <borntraeger at linux.ibm.com>
> > To: Sven Schnelle <svens at linux.ibm.com>
> > To: Yoshinori Sato <ysato at users.sourceforge.jp>
> > To: Rich Felker <dalias at libc.org>
> > To: John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
> > To: David S. Miller <davem at davemloft.net>
> > To: Andreas Larsson <andreas at gaisler.com>
> > To: Shuah Khan <shuah at kernel.org>
> > To: Alexandre Ghiti <alexghiti at rivosinc.com>
> > Cc: linux-arch at vger.kernel.org
> > Cc: linux-kernel at vger.kernel.org
> > Cc: Palmer Dabbelt <palmer at rivosinc.com>
> > Cc: linux-riscv at lists.infradead.org
> > Cc: linux-arm-kernel at lists.infradead.org
> > Cc: linuxppc-dev at lists.ozlabs.org
> > Cc: linux-mm at kvack.org
> > Cc: loongarch at lists.linux.dev
> > Cc: linux-mips at vger.kernel.org
> > Cc: linux-parisc at vger.kernel.org
> > Cc: linux-s390 at vger.kernel.org
> > Cc: linux-sh at vger.kernel.org
> > Cc: sparclinux at vger.kernel.org
> > Cc: linux-kselftest at vger.kernel.org
> > Signed-off-by: Charlie Jenkins <charlie at rivosinc.com>
> > 
> > ---
> > Charlie Jenkins (16):
> >       mm: Add MAP_BELOW_HINT
> >       riscv: mm: Do not restrict mmap address based on hint
> >       mm: Add flag and len param to arch_get_mmap_base()
> >       mm: Add generic MAP_BELOW_HINT
> >       riscv: mm: Support MAP_BELOW_HINT
> >       arm64: mm: Support MAP_BELOW_HINT
> >       powerpc: mm: Support MAP_BELOW_HINT
> >       x86: mm: Support MAP_BELOW_HINT
> >       loongarch: mm: Support MAP_BELOW_HINT
> >       arm: mm: Support MAP_BELOW_HINT
> >       mips: mm: Support MAP_BELOW_HINT
> >       parisc: mm: Support MAP_BELOW_HINT
> >       s390: mm: Support MAP_BELOW_HINT
> >       sh: mm: Support MAP_BELOW_HINT
> >       sparc: mm: Support MAP_BELOW_HINT
> >       selftests/mm: Create MAP_BELOW_HINT test
> > 
> >  arch/arm/mm/mmap.c                           | 10 ++++++++
> >  arch/arm64/include/asm/processor.h           | 34 ++++++++++++++++++++++----
> >  arch/loongarch/mm/mmap.c                     | 11 +++++++++
> >  arch/mips/mm/mmap.c                          |  9 +++++++
> >  arch/parisc/include/uapi/asm/mman.h          |  1 +
> >  arch/parisc/kernel/sys_parisc.c              |  9 +++++++
> >  arch/powerpc/include/asm/task_size_64.h      | 36 +++++++++++++++++++++++-----
> >  arch/riscv/include/asm/processor.h           | 32 -------------------------
> >  arch/s390/mm/mmap.c                          | 10 ++++++++
> >  arch/sh/mm/mmap.c                            | 10 ++++++++
> >  arch/sparc/kernel/sys_sparc_64.c             |  8 +++++++
> >  arch/x86/kernel/sys_x86_64.c                 | 25 ++++++++++++++++---
> >  fs/hugetlbfs/inode.c                         |  2 +-
> >  include/linux/sched/mm.h                     | 34 ++++++++++++++++++++++++--
> >  include/uapi/asm-generic/mman-common.h       |  1 +
> >  mm/mmap.c                                    |  2 +-
> >  tools/arch/parisc/include/uapi/asm/mman.h    |  1 +
> >  tools/include/uapi/asm-generic/mman-common.h |  1 +
> >  tools/testing/selftests/mm/Makefile          |  1 +
> >  tools/testing/selftests/mm/map_below_hint.c  | 29 ++++++++++++++++++++++
> >  20 files changed, 216 insertions(+), 50 deletions(-)
> > ---
> > base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
> > change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
> > -- 
> > - Charlie
> > 


More information about the Linuxppc-dev mailing list