[PATCH v2 1/4] mm: Add optional close() to struct vm_special_mapping

Nathan Chancellor nathan at kernel.org
Tue Aug 20 04:52:53 AEST 2024


Hi Michael,

On Mon, Aug 12, 2024 at 06:26:02PM +1000, Michael Ellerman wrote:
> Add an optional close() callback to struct vm_special_mapping. It will
> be used, by powerpc at least, to handle unmapping of the VDSO.
> 
> Although support for unmapping the VDSO was initially added
> for CRIU[1], it is not desirable to guard that support behind
> CONFIG_CHECKPOINT_RESTORE.
> 
> There are other known users of unmapping the VDSO which are not related
> to CRIU, eg. Valgrind [2] and void-ship [3].
> 
> The powerpc arch_unmap() hook has been in place for ~9 years, with no
> ifdef, so there may be other unknown users that have come to rely on
> unmapping the VDSO. Even if the code was behind an ifdef, major distros
> enable CHECKPOINT_RESTORE so users may not realise unmapping the VDSO
> depends on that configuration option.
> 
> It's also undesirable to have such core mm behaviour behind a relatively
> obscure CONFIG option.
> 
> Longer term the unmap behaviour should be standardised across
> architectures, however that is complicated by the fact the VDSO pointer
> is stored differently across architectures. There was a previous attempt
> to unify that handling [4], which could be revived.
> 
> See [5] for further discussion.
> 
> [1]: commit 83d3f0e90c6c ("powerpc/mm: tracking vDSO remap")
> [2]: https://sourceware.org/git/?p=valgrind.git;a=commit;h=3a004915a2cbdcdebafc1612427576bf3321eef5
> [3]: https://github.com/insanitybit/void-ship
> [4]: https://lore.kernel.org/lkml/20210611180242.711399-17-dima@arista.com/
> [5]: https://lore.kernel.org/linuxppc-dev/shiq5v3jrmyi6ncwke7wgl76ojysgbhrchsk32q4lbx2hadqqc@kzyy2igem256
> 
> Suggested-by: Linus Torvalds <torvalds at linux-foundation.org>
> Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> Reviewed-by: David Hildenbrand <david at redhat.com>
> ---
>  include/linux/mm_types.h | 3 +++
>  mm/mmap.c                | 6 ++++++
>  2 files changed, 9 insertions(+)
> 
> v2:
> - Add some blank lines as requested.
> - Expand special_mapping_close() comment.
> - Add David's reviewed-by.
> - Expand change log to capture review discussion.
> 
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 485424979254..78bdfc59abe5 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1313,6 +1313,9 @@ struct vm_special_mapping {
>  
>  	int (*mremap)(const struct vm_special_mapping *sm,
>  		     struct vm_area_struct *new_vma);
> +
> +	void (*close)(const struct vm_special_mapping *sm,
> +		      struct vm_area_struct *vma);
>  };
>  
>  enum tlb_flush_reason {
> diff --git a/mm/mmap.c b/mm/mmap.c
> index d0dfc85b209b..af4dbf0d3bd4 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -3620,10 +3620,16 @@ void vm_stat_account(struct mm_struct *mm, vm_flags_t flags, long npages)
>  static vm_fault_t special_mapping_fault(struct vm_fault *vmf);
>  
>  /*
> + * Close hook, called for unmap() and on the old vma for mremap().
> + *
>   * Having a close hook prevents vma merging regardless of flags.
>   */
>  static void special_mapping_close(struct vm_area_struct *vma)
>  {
> +	const struct vm_special_mapping *sm = vma->vm_private_data;
> +
> +	if (sm->close)
> +		sm->close(sm, vma);
>  }
>  
>  static const char *special_mapping_name(struct vm_area_struct *vma)
> -- 
> 2.45.2
> 

This change is now in -next and I bisected a crash that our CI sees with
ARCH=um to it:

$ make -skj"$(nproc)" ARCH=um CROSS_COMPILE=x86_64-linux- defconfig linux

$ ./linux ubd0=$PWD/rootfs.ext4
...
Linux version 6.11.0-rc4-next-20240819 (nathan at thelio-3990X) (x86_64-linux-gcc (GCC) 14.2.0, GNU ld (GNU Binutils) 2.42) #1 Mon Aug 19 11:42:20 MST 2024
...
Run /sbin/init as init process

Modules linked in:
Pid: 24, comm: mount Not tainted 6.11.0-rc4-next-20240819
RIP: 0033:0x68006f6c
RSP: 000000006c8bfc68  EFLAGS: 00010206
RAX: 0000000068006f6c RBX: 0000000068a0aa18 RCX: 00000000600d8b09
RDX: 0000000000000000 RSI: 0000000068a0aa18 RDI: 0000000068805120
RBP: 000000006c8bfc70 R08: 0000000000000001 R09: 0000000068ae0308
R10: 000000000000000e R11: ffffffffffffffff R12: 0000000000000001
R13: 0000000068a0aa18 R14: 0000000000000015 R15: 0000000068944a88
Kernel panic - not syncing: Segfault with no mm
CPU: 0 UID: 0 PID: 24 Comm: mount Not tainted 6.11.0-rc4-next-20240819 #1
Stack:
 600caeff 6c8bfc90 600d8b2a 68944a80
 00000047 6c8bfda0 600cbfd9 6c8bfd50
 68944ad0 68944a88 7f7ffff000 7f7fffffff
Call Trace:
 [<600caeff>] ? special_mapping_close+0x16/0x19
 [<600d8b2a>] remove_vma+0x21/0x59
 [<600cbfd9>] exit_mmap+0x1f3/0x2bc
 [<60032a0c>] ? unblock_signals+0x0/0xbd
 [<600329fd>] ? block_signals+0x0/0xf
 [<6003831c>] __mmput+0x24/0x94
 [<60067262>] ? up_read+0x0/0x2c
 [<600383a1>] mmput+0x15/0x18
 [<6003ce97>] do_exit+0x381/0x9b8
 [<600e4b8d>] ? kfree+0x107/0x11b
 [<6003d752>] sys_exit_group+0x0/0x16
 [<6003d768>] pid_child_should_wake+0x0/0x42
 [<60022e7a>] handle_syscall+0x79/0xa7
 [<600358de>] userspace+0x4d3/0x505
 [<60020927>] fork_handler+0x84/0x8b

Passing this through scripts/decode_stacktrace.sh results in

? special_mapping_close (mm/mmap.c:2056)
remove_vma (mm/vma.c:144)
exit_mmap (include/linux/sched.h:2049 mm/mmap.c:1947)
? unblock_signals (arch/um/os-Linux/signal.c:296)
? block_signals (arch/um/os-Linux/signal.c:282)
__mmput (kernel/fork.c:1349)
? up_read (arch/x86/include/asm/atomic64_64.h:79 (discriminator 5) include/linux/atomic/atomic-arch-fallback.h:2749 (discriminator 5) include/linux/atomic/atomic-long.h:184 (discriminator 5) include/linux/atomic/atomic-instrumented.h:3317 (discriminator 5) kernel/locking/rwsem.c:1347 (discriminator 5) kernel/locking/rwsem.c:1622 (discriminator 5))
mmput (kernel/fork.c:1370)
do_exit (arch/um/include/asm/thread_info.h:46 kernel/exit.c:572 kernel/exit.c:926)
? kfree (mm/slub.c:4482 (discriminator 2) mm/slub.c:4522 (discriminator 2) mm/slub.c:4669 (discriminator 2))
sys_exit_group (kernel/exit.c:1099 kernel/exit.c:1097)
pid_child_should_wake (kernel/exit.c:1106 kernel/exit.c:1565)
handle_syscall (arch/um/kernel/skas/syscall.c:45 (discriminator 1))
userspace (arch/um/os-Linux/skas/process.c:466)
fork_handler (arch/um/kernel/process.c:137)

This change seems pretty innocuous but the bisect log does not lie :) I
am guessing UML is just special here somehow?

# bad: [367b5c3d53e57d51a5878816804652963da90950] Add linux-next specific files for 20240816
# good: [e724918b3786252b985b0c2764c16a57d1937707] Merge tag 'hardening-v6.11-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
git bisect start '367b5c3d53e57d51a5878816804652963da90950' 'e724918b3786252b985b0c2764c16a57d1937707'
# bad: [b12bdbe2615f5426953ae1e64d74176674618edb] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next.git
git bisect bad b12bdbe2615f5426953ae1e64d74176674618edb
# bad: [9ad9c8d6eea9063fe7309cdc8e76bd12377cd613] Merge branch 'for-next' of https://github.com/sophgo/linux.git
git bisect bad 9ad9c8d6eea9063fe7309cdc8e76bd12377cd613
# bad: [57c53c832b28ca79eddca47c5b599036be10d347] Merge branch 'perf-tools-next' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools-next.git
git bisect bad 57c53c832b28ca79eddca47c5b599036be10d347
# bad: [cbaf19e941bcd83cf50f569b3888f7db6dcaccfc] foo
git bisect bad cbaf19e941bcd83cf50f569b3888f7db6dcaccfc
# good: [cdb0e8eb648858f37bbe1d6245c3a3c49f265c1c] fixup! selftests/mm: Add mseal test for no-discard madvise
git bisect good cdb0e8eb648858f37bbe1d6245c3a3c49f265c1c
# bad: [4fdacc9ec44f04a9edc4ddd0c782ab698cd15257] mm: shmem: support large folio allocation for shmem_replace_folio()
git bisect bad 4fdacc9ec44f04a9edc4ddd0c782ab698cd15257
# good: [90f91965eee8256ffad811a6da097bc13b66aa2e] mm: reduce deferred struct page init ifdeffery
git bisect good 90f91965eee8256ffad811a6da097bc13b66aa2e
# good: [5ae759160c5df466f4ae7cb89c05cd963e91cc3c] mm: introduce a pageflag for partially mapped folios
git bisect good 5ae759160c5df466f4ae7cb89c05cd963e91cc3c
# good: [03683572685d2f8febfc022b758fdb4bddf8d783] maple_tree: fix comment typo with corresponding maple_status
git bisect good 03683572685d2f8febfc022b758fdb4bddf8d783
# bad: [74ef5018120b2a441428400a5f92891307d41b82] powerpc/vdso: refactor error handling
git bisect bad 74ef5018120b2a441428400a5f92891307d41b82
# bad: [5077f828c08424b81279341813a18b8923ebd42e] mm: add optional close() to struct vm_special_mapping
git bisect bad 5077f828c08424b81279341813a18b8923ebd42e
# good: [0ebac8817b5dce7b3a1afd6ff7197a75829d50ad] kfence: save freeing stack trace at calling time instead of freeing time
git bisect good 0ebac8817b5dce7b3a1afd6ff7197a75829d50ad
# first bad commit: [5077f828c08424b81279341813a18b8923ebd42e] mm: add optional close() to struct vm_special_mapping

The rootfs is available from [1] in case it matters
(x86_64-rootfs.ext4.zst, decompress it with zstd first); it just shuts
down the machine on boot.

Cheers,
Nathan

[1]: https://github.com/ClangBuiltLinux/boot-utils/releases/latest


More information about the Linuxppc-dev mailing list