[PATCH 2/4] powerpc/mm: Handle VDSO unmapping via close() rather than arch_unmap()

Jeff Xu jeffxu at google.com
Thu Aug 8 06:11:51 AEST 2024


On Wed, Aug 7, 2024 at 10:11 AM Liam R. Howlett <Liam.Howlett at oracle.com> wrote:
>
> * Jeff Xu <jeffxu at google.com> [240807 12:37]:
> > On Wed, Aug 7, 2024 at 8:56 AM Liam R. Howlett <Liam.Howlett at oracle.com> wrote:
> > >
> > > * Jeff Xu <jeffxu at google.com> [240807 11:44]:
> > > > On Wed, Aug 7, 2024 at 5:41 AM Michael Ellerman <mpe at ellerman.id.au> wrote:
> > > > >
> > > > > Add a close() callback to the VDSO special mapping to handle unmapping
> > > > > of the VDSO. That will make it possible to remove the arch_unmap() hook
> > > > > entirely in a subsequent patch.
> > > > >
> > > > > Suggested-by: Linus Torvalds <torvalds at linux-foundation.org>
> > > > > Signed-off-by: Michael Ellerman <mpe at ellerman.id.au>
> > > > > ---
> > > > >  arch/powerpc/include/asm/mmu_context.h |  4 ----
> > > > >  arch/powerpc/kernel/vdso.c             | 17 +++++++++++++++++
> > > > >  2 files changed, 17 insertions(+), 4 deletions(-)
> > > > >
> > > > > diff --git a/arch/powerpc/include/asm/mmu_context.h b/arch/powerpc/include/asm/mmu_context.h
> > > > > index 37bffa0f7918..9b8c1555744e 100644
> > > > > --- a/arch/powerpc/include/asm/mmu_context.h
> > > > > +++ b/arch/powerpc/include/asm/mmu_context.h
> > > > > @@ -263,10 +263,6 @@ extern void arch_exit_mmap(struct mm_struct *mm);
> > > > >  static inline void arch_unmap(struct mm_struct *mm,
> > > > >                               unsigned long start, unsigned long end)
> > > > >  {
> > > > > -       unsigned long vdso_base = (unsigned long)mm->context.vdso;
> > > > > -
> > > > > -       if (start <= vdso_base && vdso_base < end)
> > > > > -               mm->context.vdso = NULL;
> > > > >  }
> > > > >
> > > > >  #ifdef CONFIG_PPC_MEM_KEYS
> > > > > diff --git a/arch/powerpc/kernel/vdso.c b/arch/powerpc/kernel/vdso.c
> > > > > index 7a2ff9010f17..220a76cae7c1 100644
> > > > > --- a/arch/powerpc/kernel/vdso.c
> > > > > +++ b/arch/powerpc/kernel/vdso.c
> > > > > @@ -81,6 +81,21 @@ static int vdso64_mremap(const struct vm_special_mapping *sm, struct vm_area_str
> > > > >         return vdso_mremap(sm, new_vma, &vdso64_end - &vdso64_start);
> > > > >  }
> > > > >
> > > > > +static void vdso_close(const struct vm_special_mapping *sm, struct vm_area_struct *vma)
> > > > > +{
> > > > > +       struct mm_struct *mm = vma->vm_mm;
> > > > > +
> > > > > +       /*
> > > > > +        * close() is called for munmap() but also for mremap(). In the mremap()
> > > > > +        * case the vdso pointer has already been updated by the mremap() hook
> > > > > +        * above, so it must not be set to NULL here.
> > > > > +        */
> > > > > +       if (vma->vm_start != (unsigned long)mm->context.vdso)
> > > > > +               return;
> > > > > +
> > > > > +       mm->context.vdso = NULL;
> > > > > +}
> > > > > +
> > > > >  static vm_fault_t vvar_fault(const struct vm_special_mapping *sm,
> > > > >                              struct vm_area_struct *vma, struct vm_fault *vmf);
> > > > >
> > > > > @@ -92,11 +107,13 @@ static struct vm_special_mapping vvar_spec __ro_after_init = {
> > > > >  static struct vm_special_mapping vdso32_spec __ro_after_init = {
> > > > >         .name = "[vdso]",
> > > > >         .mremap = vdso32_mremap,
> > > > > +       .close = vdso_close,
> > > > IIUC, only CHECKPOINT_RESTORE requires this, and
> > > > CHECKPOINT_RESTORE is in init/Kconfig, with default N
> > > >
> > > > Can we add #ifdef CONFIG_CHECKPOINT_RESTORE here ?
> > > >
> > >
> > > No, these can be unmapped and it needs to be cleaned up.  Valgrind is
> > > one application that is known to unmap the vdso and runs into issues on
> > > platforms that do not handle the removal correctly.
> > >
> > Maybe Valgrind needs that exactly for checkpoint restore ? [1]
>
> Maybe, and maybe there are 100 other applications unmapping the vdso for
> some other reason?
>
When PPC added arch_munamp, it was for CRIU, wasn't it ? That was the original
intention.

Maybe there are more apps that have started unmapping vdso, it might
be interesting
to know those use cases before widely opening this without kernel config.

> >
> > "CRIU fails to restore applications that have unmapped the vDSO
> > segment. One such
> > application is Valgrind."
> >
> > Usually when the kernel accepts new functionality, the patch needs to
> > state the user case.
>
> This isn't new functionality, the arch_unmap() existed before and this
> is moving the functionality.  We can't just disable it because we assume
> we know it's only used once.
>
> I had planned to do this sort of patch set in a follow up to my patch
> set [1], so I was deep into looking at this before the mseal()
> regression - which I expected to happen and have been advocating to
> avoid extra walks... This would be fixed by my patch set by reducing the
> walk count.
>
I would rather leave mseal() in-loop check discussion to the other
email thread :-)

> > The only user case I found for .mremap and .close so far is the CRIU case.
> >
>
> In fact, this is broken on other archs for valgrind since they unmap the
> vdso and then crash [2].  There has been a fix in the works for a while,
> which I stepped in during the patch set [1], which can be seen here
> [3].  In the discussion, the issue with Valgrind is raised and a generic
> solution to solve for more than just ppc is discussed.  The solution
> avoids crashing if vdso is unmapped and that seems like the sane
> direction of this work.
>
> We also have users unmapping vdsos here [4] too.
This is a strange code. If the use case about clock_gettime is legit, then
kerne can provide an option to not unload vdso during elf loading.

>
> Why would we leave a dangling pointer in the mm struct based on a
> compile flag?
>
> [1] https://lore.kernel.org/linux-mm/20240717200709.1552558-18-Liam.Howlett@oracle.com/
> [2] https://lore.kernel.org/lkml/87imd5h5kb.fsf@mpe.ellerman.id.au/
> [3] https://lore.kernel.org/all/20210611180242.711399-17-dima@arista.com/
> [4] https://github.com/insanitybit/void-ship
>


More information about the Linuxppc-dev mailing list