[PATCH RFC v2 0/4] mm: Introduce MAP_BELOW_HINT
Palmer Dabbelt
palmer at dabbelt.com
Fri Aug 30 03:00:45 AEST 2024
On Thu, 29 Aug 2024 02:02:34 PDT (-0700), vbabka at suse.cz wrote:
> Such a large recipient list and no linux-api. CC'd, please include it on
> future postings.
>
> On 8/29/24 09:15, Charlie Jenkins wrote:
>> Some applications rely on placing data in free bits addresses allocated
>> by mmap. Various architectures (eg. x86, arm64, powerpc) restrict the
>> address returned by mmap to be less than the 48-bit address space,
>> unless the hint address uses more than 47 bits (the 48th bit is reserved
>> for the kernel address space).
>>
>> The riscv architecture needs a way to similarly restrict the virtual
>> address space. On the riscv port of OpenJDK an error is thrown if
>> attempted to run on the 57-bit address space, called sv57 [1]. golang
>> has a comment that sv57 support is not complete, but there are some
>> workarounds to get it to mostly work [2].
>>
>> These applications work on x86 because x86 does an implicit 47-bit
>> restriction of mmap() address that contain a hint address that is less
>> than 48 bits.
>>
>> Instead of implicitly restricting the address space on riscv (or any
>> current/future architecture), a flag would allow users to opt-in to this
>> behavior rather than opt-out as is done on other architectures. This is
>> desirable because it is a small class of applications that do pointer
>> masking.
>
> I doubt it's desirable to have different behavior depending on architecture.
> Also you could say it's a small class of applications that need more than 47
> bits.
We're sort of stuck with the architeture-depending behavior here: for
the first few years RISC-V only had 39-bit VAs, so the defato uABI ended
up being that userspace can ignore way more bits. While 48 bits might
be enough for everyone, 39 doesn't seem to be -- or at least IIRC when
we tried restricting the default to that, we broke stuff. There's also
some other wrinkles like arbitrary bit boundaries in pointer masking and
vendor-specific paging formats, but at some point we just end up down a
rabbit hole of insanity there...
FWIW, I think that userspace depending on just tossing some VA bits
because some kernels happened to never allocate from them is just
broken, but it seems like other ports worked around the 48->57 bit
transition and we're trying to do something similar for 39->48 (and that
works with 49->57, as we'll have to deal with that eventually).
So that's basically how we ended up with this sort of thing: trying to
do something similar without a flag broke userspace because we were
trying to jam too much into the hints. I couldn't really figure out a
way to satisfy all the userspace constraints by just implicitly
retrofitting behavior based on the hints, so we figured having an
explicit flag to control the behavior would be the sanest way to go.
That said: I'm not opposed to just saying "depending on 39-bit VAs is
broken" and just forcing people to fix it.
>> This flag will also allow seemless compatibility between all
>> architectures, so applications like Go and OpenJDK that use bits in a
>> virtual address can request the exact number of bits they need in a
>> generic way. The flag can be checked inside of vm_unmapped_area() so
>> that this flag does not have to be handled individually by each
>> architecture.
>>
>> Link:
>> https://github.com/openjdk/jdk/blob/f080b4bb8a75284db1b6037f8c00ef3b1ef1add1/src/hotspot/cpu/riscv/vm_version_riscv.cpp#L79
>> [1]
>> Link:
>> https://github.com/golang/go/blob/9e8ea567c838574a0f14538c0bbbd83c3215aa55/src/runtime/tagptr_64bit.go#L47
>> [2]
>>
>> To: Arnd Bergmann <arnd at arndb.de>
>> To: Richard Henderson <richard.henderson at linaro.org>
>> To: Ivan Kokshaysky <ink at jurassic.park.msu.ru>
>> To: Matt Turner <mattst88 at gmail.com>
>> To: Vineet Gupta <vgupta at kernel.org>
>> To: Russell King <linux at armlinux.org.uk>
>> To: Guo Ren <guoren at kernel.org>
>> To: Huacai Chen <chenhuacai at kernel.org>
>> To: WANG Xuerui <kernel at xen0n.name>
>> To: Thomas Bogendoerfer <tsbogend at alpha.franken.de>
>> To: James E.J. Bottomley <James.Bottomley at HansenPartnership.com>
>> To: Helge Deller <deller at gmx.de>
>> To: Michael Ellerman <mpe at ellerman.id.au>
>> To: Nicholas Piggin <npiggin at gmail.com>
>> To: Christophe Leroy <christophe.leroy at csgroup.eu>
>> To: Naveen N Rao <naveen at kernel.org>
>> To: Alexander Gordeev <agordeev at linux.ibm.com>
>> To: Gerald Schaefer <gerald.schaefer at linux.ibm.com>
>> To: Heiko Carstens <hca at linux.ibm.com>
>> To: Vasily Gorbik <gor at linux.ibm.com>
>> To: Christian Borntraeger <borntraeger at linux.ibm.com>
>> To: Sven Schnelle <svens at linux.ibm.com>
>> To: Yoshinori Sato <ysato at users.sourceforge.jp>
>> To: Rich Felker <dalias at libc.org>
>> To: John Paul Adrian Glaubitz <glaubitz at physik.fu-berlin.de>
>> To: David S. Miller <davem at davemloft.net>
>> To: Andreas Larsson <andreas at gaisler.com>
>> To: Thomas Gleixner <tglx at linutronix.de>
>> To: Ingo Molnar <mingo at redhat.com>
>> To: Borislav Petkov <bp at alien8.de>
>> To: Dave Hansen <dave.hansen at linux.intel.com>
>> To: x86 at kernel.org
>> To: H. Peter Anvin <hpa at zytor.com>
>> To: Andy Lutomirski <luto at kernel.org>
>> To: Peter Zijlstra <peterz at infradead.org>
>> To: Muchun Song <muchun.song at linux.dev>
>> To: Andrew Morton <akpm at linux-foundation.org>
>> To: Liam R. Howlett <Liam.Howlett at oracle.com>
>> To: Vlastimil Babka <vbabka at suse.cz>
>> To: Lorenzo Stoakes <lorenzo.stoakes at oracle.com>
>> To: Shuah Khan <shuah at kernel.org>
>> Cc: linux-arch at vger.kernel.org
>> Cc: linux-kernel at vger.kernel.org
>> Cc: linux-alpha at vger.kernel.org
>> Cc: linux-snps-arc at lists.infradead.org
>> Cc: linux-arm-kernel at lists.infradead.org
>> Cc: linux-csky at vger.kernel.org
>> Cc: loongarch at lists.linux.dev
>> Cc: linux-mips at vger.kernel.org
>> Cc: linux-parisc at vger.kernel.org
>> Cc: linuxppc-dev at lists.ozlabs.org
>> Cc: linux-s390 at vger.kernel.org
>> Cc: linux-sh at vger.kernel.org
>> Cc: sparclinux at vger.kernel.org
>> Cc: linux-mm at kvack.org
>> Cc: linux-kselftest at vger.kernel.org
>> Signed-off-by: Charlie Jenkins <charlie at rivosinc.com>
>>
>> Changes in v2:
>> - Added much greater detail to cover letter
>> - Removed all code that touched architecture specific code and was able
>> to factor this out into all generic functions, except for flags that
>> needed to be added to vm_unmapped_area_info
>> - Made this an RFC since I have only tested it on riscv and x86
>> - Link to v1: https://lore.kernel.org/r/20240827-patches-below_hint_mmap-v1-0-46ff2eb9022d@rivosinc.com
>>
>> ---
>> Charlie Jenkins (4):
>> mm: Add MAP_BELOW_HINT
>> mm: Add hint and mmap_flags to struct vm_unmapped_area_info
>> mm: Support MAP_BELOW_HINT in vm_unmapped_area()
>> selftests/mm: Create MAP_BELOW_HINT test
>>
>> arch/alpha/kernel/osf_sys.c | 2 ++
>> arch/arc/mm/mmap.c | 3 +++
>> arch/arm/mm/mmap.c | 7 ++++++
>> arch/csky/abiv1/mmap.c | 3 +++
>> arch/loongarch/mm/mmap.c | 3 +++
>> arch/mips/mm/mmap.c | 3 +++
>> arch/parisc/kernel/sys_parisc.c | 3 +++
>> arch/powerpc/mm/book3s64/slice.c | 7 ++++++
>> arch/s390/mm/hugetlbpage.c | 4 ++++
>> arch/s390/mm/mmap.c | 6 ++++++
>> arch/sh/mm/mmap.c | 6 ++++++
>> arch/sparc/kernel/sys_sparc_32.c | 3 +++
>> arch/sparc/kernel/sys_sparc_64.c | 6 ++++++
>> arch/sparc/mm/hugetlbpage.c | 4 ++++
>> arch/x86/kernel/sys_x86_64.c | 6 ++++++
>> arch/x86/mm/hugetlbpage.c | 4 ++++
>> fs/hugetlbfs/inode.c | 4 ++++
>> include/linux/mm.h | 2 ++
>> include/uapi/asm-generic/mman-common.h | 1 +
>> mm/mmap.c | 9 ++++++++
>> tools/include/uapi/asm-generic/mman-common.h | 1 +
>> tools/testing/selftests/mm/Makefile | 1 +
>> tools/testing/selftests/mm/map_below_hint.c | 32 ++++++++++++++++++++++++++++
>> 23 files changed, 120 insertions(+)
>> ---
>> base-commit: 5be63fc19fcaa4c236b307420483578a56986a37
>> change-id: 20240827-patches-below_hint_mmap-b13d79ae1c55
More information about the Linuxppc-dev
mailing list