hugetlbfs for ppc440 - kernel BUG

Thu Oct 23 13:42:47 EST 2008

On Tue, Oct 21, 2008 at 03:50:30PM -0700, Satya wrote:
> On Tue, Oct 21, 2008 at 3:46 PM, Satya <satyakiran at gmail.com> wrote:
> 
> > Ben,
> > Look here: http://www-unix.mcs.anl.gov/zeptoos/hugepages/
> >
> > thanks,
> > ./satya
> >
> >
> > On Tue, Oct 21, 2008 at 1:47 PM, Benjamin Herrenschmidt <
> > benh at kernel.crashing.org> wrote:
> >
> >> On Tue, 2007-07-10 at 13:38 -0500, Satya wrote:
> >> > hello,
> >> > I am trying to implement hugetlbfs on the IBM Bluegene/L IO node
> >> > (ppc440) and I have a big problem as well as a few questions to ask
> >> > the group. I patched a 2.6.21.6 linux kernel (manually) with Edi
> >> > Shmueli's hugetlbfs implementation (found here:
> >> > http://patchwork.ozlabs.org/linuxppc/patch?id=8427) for this. I did
> >> > have to make slight changes (described at the end) to make it work.
> >> > My test program is a shortened version of a sys v shared memory
> >> > example described in Documentation/vm/hugetlbpage.txt
> >>
> >> Hi !
> >>
> >> The patchwork link unfortunately didn't survive the transition to
> >> patchwork 2.
> >>
> >> Do you know what's the status of Hugetlb support for 44x ? Is there any
> >> plan to release that for upstream inclusion ?
> >>
> >> Cheers,
> >> Ben.
> >>
> >>
> >>
> 
> whoops, sorry for top-posting. Here is a patch that worked at that time:
> http://www-unix.mcs.anl.gov/zeptoos/hugepages/hugetlbpage_44x.patch
> 
> I didn't follow up after this to get it merged upstream. Also I don't know
> if hugetlb core has changed to deal with PTEs in high memory.

Ok, had a look at this.  It's had some tweaks since I last looked at
the bluegene hugepage/440 patch.  It still has the rather ugly
approach of storing the hugepage PTEs always at the bottom level, and
duplicating them umpteen times (including pointing multiple PMDs at a
single PTE page when the hugepage size exceeds the area mapped by a
PMD).  It also has the most serious bug I remember from the old
version - the DIRTY and ACCESSED handling is completely bogus, because
it doesn't keep the copies of the bits in the many copies of the PTEs
in sync.  Between the TLB miss rewrite that's happened in the meantime
and my patch to handle these from hugetlb_fault() it's at least now
easier to fix this bug.  Also the patch is arch/ppc based.

I'll try to sort this out in the near future.  I guess the only big
question is whether its important to support hugepage sizes < 2M.  For
hugepage sizes >=2M (16M and 256M) we can just make PMD pointers into
hugepage pointers with the addition of a suitable size field, as we do
for 40x.  For page sizes <2M things get more complicated because we
need some sort of second level hugepage tables (which may or may not
be distinct from the ordinary second level tables).

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson