Question on follow_page_mask
Hugh Dickins
hughd at google.com
Wed Feb 24 08:07:53 AEDT 2016
On Tue, 23 Feb 2016, Kirill A. Shutemov wrote:
> On Tue, Feb 23, 2016 at 06:45:05PM +0530, Anshuman Khandual wrote:
> > Not able to understand the first code block of follow_page_mask
> > function. follow_huge_addr function is expected to find the page
> > struct for the given address if it turns out to be a HugeTLB page
> > but then when it finds the page we bug on if it had been called
> > with FOLL_GET flag.
> >
> > page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
> > if (!IS_ERR(page)) {
> > BUG_ON(flags & FOLL_GET);
> > return page;
> > }
> >
> > do_move_page_to_node_array calls follow_page with FOLL_GET which
> > in turn calls follow_page_mask with FOLL_GET. On POWER, the
> > function follow_huge_addr is defined and does not return -EINVAL
> > like the generic one. It returns the page struct if its a HugeTLB
> > page. Just curious to know what is the purpose behind the BUG_ON.
>
> I would guess requesting pin on non-reclaimable page is considered
> useless, meaning suspicius behavior. BUG_ON() is overkill, I think.
> WARN_ON_ONCE() would make it.
No, it's there to guard against abuse, until the correct functionality
is implemented: which has not so far been required, I think.
The problem is that a get_page() here is too late: it needs to be done
inside each arch's implementation of follow_huge_addr(), while holding
whatever is the appropriate lock, dropped by the time it returns here.
If you look through where FOLL_GET is usually implemented, such as in
follow_page_pte(), but pud and pmd cases too, I hope you'll still find
that they are careful to get the reference on the page while it's safe
in the pagetable.
But follow_huge_addr() would need some work to offer the same guarantees:
it's good for those "peep at a page without actually getting a reference"
cases, but not good enough for preventing a page for being put to some
other use completely, before we've secured it with our reference.
Unless something's changed: the last time I recall the issue coming up,
was when Naoya Horiguchi was working on hugetlbfs page migration: see
linux-kernel/linux-mm mail thread "BUG at mm/memory.c:1489!" from
28 May 2014; and the resolution there was not to support the
follow_huge_addr() case (which IIRC is peculiar to powerpc alone?).
Hugh
More information about the Linuxppc-dev
mailing list