[PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping
Nathan Chancellor
nathan at kernel.org
Tue Aug 20 04:52:53 AEST 2024
Hi Michael,
On Mon, Aug 12, 2024 at 06:26:02PM +1000, Michael Ellerman wrote:
> Add an optional close() callback to struct vm_special_mapping. It will
> be used, by powerpc at least, to handle unmapping of the VDSO.
>
> Although support for unmapping the VDSO was initially added
> for CRIU[1], it is not desirable to guard that support behind
> CONFIG_CHECKPOINT_RESTORE.
>
> There are other known users of unmapping the VDSO which are not related
> to CRIU, eg. Valgrind [2] and void-ship [3].
>
> The powerpc arch_unmap() hook has been in place for ~9 years, with no
> ifdef, so there may be other unknown users that have come to rely on
> unmapping the VDSO. Even if the code was behind an ifdef, major distros
> enable CHECKPOINT_RESTORE so users may not realise unmapping the VDSO
> depends on that configuration option.
>
> It's also undesirable to have such core mm behaviour behind a relatively
> obscure CONFIG option.
>
> Longer term the unmap behaviour should be standardised across
> architectures, however that is complicated by the fact the VDSO pointer
> is stored differently across architectures. There was a previous attempt
> to unify that handling [4], which could be revived.
>
> See [5] for further discussion.
>
> [1]: commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
> [2]: https://sourceware.org/git/?p=valgrind.git;a=commit;h=3a004915a2cbdcdebafc1612427576bf3321eef5
> [3]: https://github.com/insanitybit/void-ship
> [4]: https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
> [5]: https://lore.kernel.org/linuxppc-dev/shiq5v3jrmyi6ncwke7wgl76ojysgbhrchsk32q4lbx2hadqqc@kzyy2igem256
>
> Suggested-by: Linus Torvalds <torvalds at linux-foundation.org>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> Reviewed-by: David Hildenbrand <david at redhat.com>
> ---
> include/linux/mm_types.h | 3 +++
> mm/mmap.c | 6 ++++++
> 2 files changed, 9 insertions(+)
>
> v2:
> - Add some blank lines as requested.
> - Expand special_mapping_close() comment.
> - Add David's reviewed-by.
> - Expand change log to capture review discussion.
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 485424979254..78bdfc59abe5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1313,6 +1313,9 @@ struct vm_special_mapping {
>
> int (*mremap)(const struct vm_special_mapping *sm,
> struct vm_area_struct *new_vma);
> +
> + void (*close)(const struct vm_special_mapping *sm,
> + struct vm_area_struct *vma);
> };
>
> enum tlb_flush_reason {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d0dfc85b209b..af4dbf0d3bd4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3620,10 +3620,16 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
> static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
>
> /*
> + * Close hook, called for unmap() and on the old vma for mremap().
> + *
> * Having a close hook prevents vma merging regardless of flags.
> */
> static void special_mapping_close(struct vm_area_struct *vma)
> {
> + const struct vm_special_mapping *sm = vma->vm_private_data;
> +
> + if (sm->close)
> + sm->close(sm, vma);
> }
>
> static const char *special_mapping_name(struct vm_area_struct *vma)
> --
> 2.45.2
>
This change is now in -next and I bisected a crash that our CI sees with
ARCH=um to it:
$ make -skj"$(nproc)" ARCH=um CROSS_COMPILE=x86_64-linux- defconfig linux
$ ./linux ubd0=$PWD/rootfs.ext4
...
Linux version 6.11.0-rc4-next-20240819 (nathan at thelio-3990X) (x86_64-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 Mon Aug 19 11:42:20 MST 2024
...
Run /sbin/init as init process
Modules linked in:
Pid: 24, comm: mount Not tainted 6.11.0-rc4-next-20240819
RIP: 0033:0x68006f6c
RSP: 000000006c8bfc68 EFLAGS: 00010206
RAX: 0000000068006f6c RBX: 0000000068a0aa18 RCX: 00000000600d8b09
RDX: 0000000000000000 RSI: 0000000068a0aa18 RDI: 0000000068805120
RBP: 000000006c8bfc70 R08: 0000000000000001 R09: 0000000068ae0308
R10: 000000000000000e R11: ffffffffffffffff R12: 0000000000000001
R13: 0000000068a0aa18 R14: 0000000000000015 R15: 0000000068944a88
Kernel panic - not syncing: Segfault with no mm
CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.11.0-rc4-next-20240819 #1
Stack:
600caeff 6c8bfc90 600d8b2a 68944a80
00000047 6c8bfda0 600cbfd9 6c8bfd50
68944ad0 68944a88 7f7ffff000 7f7fffffff
Call Trace:
[<600caeff>] ? special_mapping_close+0x16/0x19
[<600d8b2a>] remove_vma+0x21/0x59
[<600cbfd9>] exit_mmap+0x1f3/0x2bc
[<60032a0c>] ? unblock_signals+0x0/0xbd
[<600329fd>] ? block_signals+0x0/0xf
[<6003831c>] __mmput+0x24/0x94
[<60067262>] ? up_read+0x0/0x2c
[<600383a1>] mmput+0x15/0x18
[<6003ce97>] do_exit+0x381/0x9b8
[<600e4b8d>] ? kfree+0x107/0x11b
[<6003d752>] sys_exit_group+0x0/0x16
[<6003d768>] pid_child_should_wake+0x0/0x42
[<60022e7a>] handle_syscall+0x79/0xa7
[<600358de>] userspace+0x4d3/0x505
[<60020927>] fork_handler+0x84/0x8b
Passing this through scripts/decode_stacktrace.sh results in
? special_mapping_close (mm/mmap.c:2056)
remove_vma (mm/vma.c:144)
exit_mmap (include/linux/sched.h:2049 mm/mmap.c:1947)
? unblock_signals (arch/um/os-Linux/signal.c:296)
? block_signals (arch/um/os-Linux/signal.c:282)
__mmput (kernel/fork.c:1349)
? up_read (arch/x86/include/asm/atomic64_64.h:79 (discriminator 5) include/linux/atomic/atomic-arch-fallback.h:2749 (discriminator 5) include/linux/atomic/atomic-long.h:184 (discriminator 5) include/linux/atomic/atomic-instrumented.h:3317 (discriminator 5) kernel/locking/rwsem.c:1347 (discriminator 5) kernel/locking/rwsem.c:1622 (discriminator 5))
mmput (kernel/fork.c:1370)
do_exit (arch/um/include/asm/thread_info.h:46 kernel/exit.c:572 kernel/exit.c:926)
? kfree (mm/slub.c:4482 (discriminator 2) mm/slub.c:4522 (discriminator 2) mm/slub.c:4669 (discriminator 2))
sys_exit_group (kernel/exit.c:1099 kernel/exit.c:1097)
pid_child_should_wake (kernel/exit.c:1106 kernel/exit.c:1565)
handle_syscall (arch/um/kernel/skas/syscall.c:45 (discriminator 1))
userspace (arch/um/os-Linux/skas/process.c:466)
fork_handler (arch/um/kernel/process.c:137)
This change seems pretty innocuous but the bisect log does not lie :) I
am guessing UML is just special here somehow?
# bad: [367b5c3d53e57d51a5878816804652963da90950] Add linux-next specific files for 20240816
# good: [e724918b3786252b985b0c2764c16a57d1937707] Merge tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
git bisect start '367b5c3d53e57d51a5878816804652963da90950' 'e724918b3786252b985b0c2764c16a57d1937707'
# bad: [b12bdbe2615f5426953ae1e64d74176674618edb] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect bad b12bdbe2615f5426953ae1e64d74176674618edb
# bad: [9ad9c8d6eea9063fe7309cdc8e76bd12377cd613] Merge branch 'for-next' of https://github.com/sophgo/linux.git
git bisect bad 9ad9c8d6eea9063fe7309cdc8e76bd12377cd613
# bad: [57c53c832b28ca79eddca47c5b599036be10d347] Merge branch 'perf-tools-next' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
git bisect bad 57c53c832b28ca79eddca47c5b599036be10d347
# bad: [cbaf19e941bcd83cf50f569b3888f7db6dcaccfc] foo
git bisect bad cbaf19e941bcd83cf50f569b3888f7db6dcaccfc
# good: [cdb0e8eb648858f37bbe1d6245c3a3c49f265c1c] fixup! selftests/mm: Add mseal test for no-discard madvise
git bisect good cdb0e8eb648858f37bbe1d6245c3a3c49f265c1c
# bad: [4fdacc9ec44f04a9edc4ddd0c782ab698cd15257] mm: shmem: support large folio allocation for shmem_replace_folio()
git bisect bad 4fdacc9ec44f04a9edc4ddd0c782ab698cd15257
# good: [90f91965eee8256ffad811a6da097bc13b66aa2e] mm: reduce deferred struct page init ifdeffery
git bisect good 90f91965eee8256ffad811a6da097bc13b66aa2e
# good: [5ae759160c5df466f4ae7cb89c05cd963e91cc3c] mm: introduce a pageflag for partially mapped folios
git bisect good 5ae759160c5df466f4ae7cb89c05cd963e91cc3c
# good: [03683572685d2f8febfc022b758fdb4bddf8d783] maple_tree: fix comment typo with corresponding maple_status
git bisect good 03683572685d2f8febfc022b758fdb4bddf8d783
# bad: [74ef5018120b2a441428400a5f92891307d41b82] powerpc/vdso: refactor error handling
git bisect bad 74ef5018120b2a441428400a5f92891307d41b82
# bad: [5077f828c08424b81279341813a18b8923ebd42e] mm: add optional close() to struct vm_special_mapping
git bisect bad 5077f828c08424b81279341813a18b8923ebd42e
# good: [0ebac8817b5dce7b3a1afd6ff7197a75829d50ad] kfence: save freeing stack trace at calling time instead of freeing time
git bisect good 0ebac8817b5dce7b3a1afd6ff7197a75829d50ad
# first bad commit: [5077f828c08424b81279341813a18b8923ebd42e] mm: add optional close() to struct vm_special_mapping
The rootfs is available from [1] in case it matters
(x86_64-rootfs.ext4.zst, decompress it with zstd first); it just shuts
down the machine on boot.
Cheers,
Nathan
[1]: https://github.com/ClangBuiltLinux/boot-utils/releases/latest
More information about the Linuxppc-dev
mailing list