[PATCH v3 12/12] mm/gup: Handle hugetlb in the generic follow_page_mask code

Sat Mar 23 13:15:20 AEDT 2024

On Fri, Mar 22, 2024 at 08:45:59PM -0400, Peter Xu wrote:
> On Fri, Mar 22, 2024 at 01:48:18PM -0700, Andrew Morton wrote:
> > On Thu, 21 Mar 2024 18:08:02 -0400 peterx at redhat.com wrote:
> > 
> > > From: Peter Xu <peterx at redhat.com>
> > > 
> > > Now follow_page() is ready to handle hugetlb pages in whatever form, and
> > > over all architectures.  Switch to the generic code path.
> > > 
> > > Time to retire hugetlb_follow_page_mask(), following the previous
> > > retirement of follow_hugetlb_page() in 4849807114b8.
> > > 
> > > There may be a slight difference of how the loops run when processing slow
> > > GUP over a large hugetlb range on cont_pte/cont_pmd supported archs: each
> > > loop of __get_user_pages() will resolve one pgtable entry with the patch
> > > applied, rather than relying on the size of hugetlb hstate, the latter may
> > > cover multiple entries in one loop.
> > > 
> > > A quick performance test on an aarch64 VM on M1 chip shows 15% degrade over
> > > a tight loop of slow gup after the path switched.  That shouldn't be a
> > > problem because slow-gup should not be a hot path for GUP in general: when
> > > page is commonly present, fast-gup will already succeed, while when the
> > > page is indeed missing and require a follow up page fault, the slow gup
> > > degrade will probably buried in the fault paths anyway.  It also explains
> > > why slow gup for THP used to be very slow before 57edfcfd3419 ("mm/gup:
> > > accelerate thp gup even for "pages != NULL"") lands, the latter not part of
> > > a performance analysis but a side benefit.  If the performance will be a
> > > concern, we can consider handle CONT_PTE in follow_page().
> > > 
> > > Before that is justified to be necessary, keep everything clean and simple.
> > > 
> > 
> > mm/gup.c:33:21: warning: 'follow_hugepd' declared 'static' but never defined [-Wunused-function]
> >    33 | static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
> >       |                     ^~~~~~~~~~~~~
> > 
> > --- a/mm/gup.c~mm-gup-handle-hugepd-for-follow_page-fix
> > +++ a/mm/gup.c
> > @@ -30,10 +30,12 @@ struct follow_page_context {
> >  	unsigned int page_mask;
> >  };
> >  
> > +#ifdef CONFIG_HAVE_FAST_GUP
> >  static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
> >  				  unsigned long addr, unsigned int pdshift,
> >  				  unsigned int flags,
> >  				  struct follow_page_context *ctx);
> > +#endif
> >  
> >  static inline void sanity_check_pinned_pages(struct page **pages,
> >  					     unsigned long npages)
> > _
> > 
> > 
> > This looks inelegant.
> > 
> > That's two build issues so far.  Please be more expansive in the
> > Kconfig variations when testing.  Especially when mucking with pgtable
> > macros.
> 
> Andrew,
> 
> Apologies for that, and also for a slightly late response.  Yeah it's time
> I'll need my set of things to do serious build tests, and I'll at least
> start to cover a few error prone configs/archs in with that.
> 
> I was trying to rely on the build bot in many of previous such cases, as
> that was quite useful to me to cover many build issues without investing my
> own test setups, but I think for some reason it retired and stopped working
> for a while.  Maybe I shouldn't have relied on it at all.
> 
> For this specific issue, I'm not sure if CONFIG_HAVE_FAST_GUP is proper?
> As follow_hugepd() is used in slow gup not fast.  So maybe we can put that
> under CONFIG_MMU below that code (and I think we can drop "static" too as I
> don't think it's anything useful).  My version of fixup attached at the end

the static is useful; below patch did pass on m68k but won't on
x86.. ignore that please.

> of email, and I verified it on m68k build.
> 
> I do plan to post a small fixup series to fix these issues (so far it may
> contain 1 formal patch to touch up vmstat_item_print_in_thp, and 2 fixups
> where I'll mark the subject with "fixup!" properly).  Either you can pick
> up below or you can wait for my small patchset, should be there either
> today or tomorrow.

I changed plan here too; I found more users of HPAGE_PMD_NR assuming it's
defined even if !CONFIG_MMU.  That's weird, as CONFIG_MMU doesn't even
define PMD_SHIFT...  To fix this I decided to use the old trick on using
BUILD_BUG() like it used to work before; frankly I don't know how that
didn't throw warnings, but i'll make sure it passes all known builds (ps: I
still haven't got my build harness ready, so that will still be limited but
should solve known issues).

In short: please wait for my fixup series.  Thanks.

> 
> Thanks,
> 
> ===8<===
> diff --git a/mm/gup.c b/mm/gup.c
> index 4cd349390477..a2ed8203495a 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -30,11 +30,6 @@ struct follow_page_context {
>         unsigned int page_mask;
>  };
>  
> -static struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
> -                                 unsigned long addr, unsigned int pdshift,
> -                                 unsigned int flags,
> -                                 struct follow_page_context *ctx);
> -
>  static inline void sanity_check_pinned_pages(struct page **pages,
>                                              unsigned long npages)
>  {
> @@ -505,6 +500,12 @@ static inline void mm_set_has_pinned_flag(unsigned long *mm_flags)
>  }
>  
>  #ifdef CONFIG_MMU
> +
> +struct page *follow_hugepd(struct vm_area_struct *vma, hugepd_t hugepd,
> +                          unsigned long addr, unsigned int pdshift,
> +                          unsigned int flags,
> +                          struct follow_page_context *ctx);
> +
>  static struct page *no_page_table(struct vm_area_struct *vma,
>                                   unsigned int flags, unsigned long address)
>  {
> ===8<===
> 
> -- 
> Peter Xu

-- 
Peter Xu