[RFC] implicit hugetlb pages (hugetlb_implicit)
David Gibson
david at gibson.dropbear.id.au
Tue Jan 13 10:57:08 EST 2004
On Mon, Jan 12, 2004 at 03:38:00PM -0800, Adam Litke wrote:
> Thank you for your comments and suggestions. They are proving very
> helpful as I work to clean this up.
Glad to hear it :)
> On Sun, 2004-01-11 at 20:19, David Gibson wrote:
> > On Fri, Jan 09, 2004 at 01:27:20PM -0800, Adam Litke wrote:
> > >
> > > hugetlb_implicit (2.6.0):
> > > This patch includes the anonymous mmap work from Dave Gibson
> > > (right?)
> >
> > I'm not sure what you're referring to here. My patches for lbss
> > support also include support for copy-on-write of hugepages and
> > various other changes which can make them act kind of like anonymous
> > pages.
> >
> > But I don't see much in this patch that looks familiar.
>
> Hmm. Could the original author of hugetlb for anonymous mmap claim
> credit for the initial code?
I think I once knew who it was, but I've forgotten, sorry.
Incidentally, you probably do want to fold in my hugepage-COW stuff
(although it does mean some more generic changes). Otherwise
hugepages are always MAP_SHARED, which means with an implicit hugepage
mmap() certain regions of memory will silently have totally different
semantics to what you expect - it could get very weird across a
fork().
And for that matter there's at least one plain-old-bug in the current
hugepage code which is addressed in my patch (the LOW_HPAGES bit isn't
propagated correctly across a fork()).
I'll attach my patch, which also includes the hugepage ELF segment
stuff. I'm afraid I haven't had a chance to separate out those parts
of the patch yet.
> > > + /* Do we have enough free huge pages? */
> > > + if (!is_hugepage_mem_enough(len))
> > > + return 0;
> >
> > Is this test safe/necessary? i.e. a) is there any potential race
> > which could cause the mmap() to fail because it's short of memory
> > despite suceeding the test here and b) can't we just let the mmap
> > fail and fall back then rather than checking beforehand?
>
> You're right. Now that safe fallback is working, we might as well
> defer this test to get_unmapped area.
Ok.
> > Do we need/want any consideration of the given "hint" address here?
>
> I am trying to do what the kernel does for normal mmaps here. If
> someone hints at an address, they hopefully have a good reason for it.
> I wouldn't want to override it just so I can do implicit hugetlb. Most
> applications pass NULL for the hint right?
That's kind of my point: what if someone gives a hugepage aligned
size with a non-aligned hint address - currently the test is only on
the size. We either have to map at somewhere other than the hint
address (which is what the patch does now, I think), or only attempt a
hugepage map if the hint address is also aligned.
This is for the case where the hint really is a hint, of course - so
we don't have to obey it. If it's MAP_FIXED it's a different code
path, and we never attempt a hugepage mapping (unless it's explicitly
from hugetlbfs). Perhaps we should, though.
> > > + /* Explicit requests for huge pages are allowed to return errors */
> > > + if (*flags & MAP_HUGETLB) {
> > > + if (pre_error)
> > > + return pre_error;
> > > + return hugetlb_get_unmapped_area(NULL, addr, len, pgoff, *flags);
> > > + }
> > > +
> > > + /*
> > > + * When implicit request fails, return 0 so we can
> > > + * retry later with regular pages.
> > > + */
> > > + if (mmap_hugetlb_implicit(len)) {
> > > + if (pre_error)
> > > + goto out;
> > > + addr = hugetlb_get_unmapped_area(NULL, addr, len, pgoff, *flags);
> > > + if (IS_ERR((void *)addr))
> > > + goto out;
> > > + else {
> > > + *flags |= MAP_HUGETLB;
> > > + return addr;
> > > + }
> > > + }
> > > +
> > > +out:
> > > + *flags &= ~MAP_HUGETLB;
> > > + return 0;
> > > +}
> >
> > This does assume that 0 is never a valid address returned for
> > a hugepage range. That's true now, but it makes be slightly
> > uncomfortable, since there's no inherent reason we couldn't make
> > segment zero a hugepage segment.
>
> You definately found an ugly part of the patch. Cleanup in progress.
Excellent.
> > > +#ifdef CONFIG_HUGETLBFS
> > > +int shm_with_hugepages(int shmflag, size_t size)
> > > +{
> > > + /* flag specified explicitly */
> > > + if (shmflag & SHM_HUGETLB)
> > > + return 1;
> > > + /* Are we disabled? */
> > > + if (!shm_use_hugepages)
> > > + return 0;
> > > + /* Must be HPAGE aligned */
> > > + if (size & ~HPAGE_MASK)
> > > + return 0;
> > > + /* Are we under the max per file? */
> > > + if ((size >> HPAGE_SHIFT) > shm_hugepages_per_file)
> > > + return 0;
> >
> > I don't really understand this per-file restriction. More comments
> > below.
>
> Since hugetlb pages are a relatively scarce resource, this is a
> rudimentary method to ensure that one application doesn't allocate
> more than its fair share of hugetlb memory.
Ah, ok. It's probably worth adding a comment or two to that effect.
At the moment I don't think this is particularly necessary, since you
need root (well CAP_IPC_LOCK) to allocate hugepages. But we may well
want to change that, so some sort of limit is probably a good idea. I
wonder if there is a more direct way of accomplishing this.
> > > + /* Do we have enough free huge pages? */
> > > + if (!is_hugepage_mem_enough(size))
> > > + return 0;
> >
> > Same concerns with this test as in the mmap case.
>
> Your right. This is racey. I haven't given the shared mem part of the
> patch nearly as much attention as the mmap part. I am going to leave
> this partially broken until I clean up the fallback code for mmaps so
> I can put that here as well.
Fair enough.
> > > @@ -501,8 +505,17 @@ unsigned long do_mmap_pgoff(struct file
> > >
> > > /* Obtain the address to map to. we verify (or select) it and ensure
> > > * that it represents a valid section of the address space.
> > > + * VM_HUGETLB will never appear in vm_flags when CONFIG_HUGETLB is
> > > + * unset.
> > > */
> > > - addr = get_unmapped_area(file, addr, len, pgoff, flags);
> > > +#ifdef CONFIG_HUGETLBFS
> > > + addr = try_hugetlb_get_unmapped_area(NULL, addr, len, pgoff, &flags);
> > > + if (IS_ERR((void *)addr))
> > > + return addr;
> >
> > This doesn't look right - we don't fall back if try_hugetlb...()
> > fails. But it can fail if we don't have the right permissions, for
> > one thing in which case we certainly do want to fall back.
>
> I admit this is messy and I am working on cleaning it up.
Great.
--
David Gibson | For every complex problem there is a
david AT gibson.dropbear.id.au | solution which is simple, neat and
| wrong.
http://www.ozlabs.org/people/dgibson
-------------- next part --------------
diff -urN ppc64-linux-2.5/arch/ppc64/mm/hugetlbpage.c linux-gogogo/arch/ppc64/mm/hugetlbpage.c
--- ppc64-linux-2.5/arch/ppc64/mm/hugetlbpage.c 2003-10-14 22:33:33.000000000 +1000
+++ linux-gogogo/arch/ppc64/mm/hugetlbpage.c 2003-11-25 17:04:25.000000000 +1100
@@ -118,6 +118,16 @@
#define hugepte_page(x) pfn_to_page(hugepte_pfn(x))
#define hugepte_none(x) (!(hugepte_val(x) & _HUGEPAGE_PFN))
+#define hugepte_write(x) (hugepte_val(x) & _HUGEPAGE_RW)
+#define hugepte_same(A,B) \
+ (((hugepte_val(A) ^ hugepte_val(B)) & ~_HUGEPAGE_HPTEFLAGS) == 0)
+
+static inline hugepte_t hugepte_mkwrite(hugepte_t pte)
+{
+ hugepte_val(pte) |= _HUGEPAGE_RW;
+ return pte;
+}
+
static void free_huge_page(struct page *page);
static void flush_hash_hugepage(mm_context_t context, unsigned long ea,
@@ -219,20 +229,6 @@
pmd_clear((pmd_t *)(ptep+i));
}
-/*
- * This function checks for proper alignment of input addr and len parameters.
- */
-int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
-{
- if (len & ~HPAGE_MASK)
- return -EINVAL;
- if (addr & ~HPAGE_MASK)
- return -EINVAL;
- if (! is_hugepage_only_range(addr, len))
- return -EINVAL;
- return 0;
-}
-
static void do_slbia(void *unused)
{
asm volatile ("isync; slbia; isync":::"memory");
@@ -251,8 +247,11 @@
/* Check no VMAs are in the region */
vma = find_vma(mm, TASK_HPAGE_BASE_32);
- if (vma && (vma->vm_start < TASK_HPAGE_END_32))
+ if (vma && (vma->vm_start < TASK_HPAGE_END_32)) {
+ printk(KERN_DEBUG "Low HTLB region busy: PID=%d vma @ %lx-%lx\n",
+ current->pid, vma->vm_start, vma->vm_end);
return -EBUSY;
+ }
/* Clean up any leftover PTE pages in the region */
spin_lock(&mm->page_table_lock);
@@ -293,6 +292,43 @@
return 0;
}
+int is_aligned_hugepage_range(unsigned long addr, unsigned long len)
+{
+ if (len & ~HPAGE_MASK)
+ return -EINVAL;
+ if (addr & ~HPAGE_MASK)
+ return -EINVAL;
+ if (! is_hugepage_only_range(addr, len))
+ return -EINVAL;
+ return 0;
+}
+
+int is_potential_hugepage_range(unsigned long addr, unsigned long len)
+{
+ if (len & ~HPAGE_MASK)
+ return -EINVAL;
+ if (addr & ~HPAGE_MASK)
+ return -EINVAL;
+ if (! is_hugepage_potential_range(addr, len))
+ return -EINVAL;
+ return 0;
+}
+
+
+int prepare_hugepage_range(unsigned long addr, unsigned long len)
+{
+ int ret;
+
+ BUG_ON(is_potential_hugepage_range(addr, len) != 0);
+
+ if (is_hugepage_low_range(addr, len)) {
+ ret = open_32bit_htlbpage_range(current->mm);
+ if (ret)
+ return ret;
+ }
+ return 0;
+}
+
int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma)
{
@@ -300,6 +336,16 @@
struct page *ptepage;
unsigned long addr = vma->vm_start;
unsigned long end = vma->vm_end;
+ cpumask_t tmp;
+ int cow;
+ int local;
+
+ /* XXX are there races with checking cpu_vm_mask? - Anton */
+ tmp = cpumask_of_cpu(smp_processor_id());
+ if (cpus_equal(vma->vm_mm->cpu_vm_mask, tmp))
+ local = 1;
+
+ cow = (vma->vm_flags & (VM_SHARED | VM_MAYWRITE)) == VM_MAYWRITE;
while (addr < end) {
BUG_ON(! in_hugepage_area(src->context, addr));
@@ -310,6 +356,17 @@
return -ENOMEM;
src_pte = hugepte_offset(src, addr);
+
+ if (cow) {
+ entry = __hugepte(hugepte_update(src_pte,
+ _HUGEPAGE_RW
+ | _HUGEPAGE_HPTEFLAGS,
+ 0));
+ if ((addr % HPAGE_SIZE) == 0)
+ flush_hash_hugepage(src->context, addr,
+ entry, local);
+ }
+
entry = *src_pte;
if ((addr % HPAGE_SIZE) == 0) {
@@ -483,12 +540,16 @@
struct mm_struct *mm = current->mm;
unsigned long addr;
int ret = 0;
+ int writable;
WARN_ON(!is_vm_hugetlb_page(vma));
BUG_ON((vma->vm_start % HPAGE_SIZE) != 0);
BUG_ON((vma->vm_end % HPAGE_SIZE) != 0);
spin_lock(&mm->page_table_lock);
+
+ writable = (vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED);
+
for (addr = vma->vm_start; addr < vma->vm_end; addr += HPAGE_SIZE) {
unsigned long idx;
hugepte_t *pte = hugepte_alloc(mm, addr);
@@ -518,15 +579,25 @@
ret = -ENOMEM;
goto out;
}
- ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC);
- unlock_page(page);
+ /* This is a new page, all full of zeroes. If
+ * we're MAP_SHARED, the page needs to go into
+ * the page cache. If it's MAP_PRIVATE it
+ * might as well be made "anonymous" now or
+ * we'll just have to copy it on the first
+ * write. */
+ if (vma->vm_flags & VM_SHARED) {
+ ret = add_to_page_cache(page, mapping, idx, GFP_ATOMIC);
+ unlock_page(page);
+ } else {
+ writable = (vma->vm_flags & VM_WRITE);
+ }
if (ret) {
hugetlb_put_quota(mapping);
free_huge_page(page);
goto out;
}
}
- setup_huge_pte(mm, page, pte, vma->vm_flags & VM_WRITE);
+ setup_huge_pte(mm, page, pte, writable);
}
out:
spin_unlock(&mm->page_table_lock);
@@ -659,10 +730,9 @@
if (!in_hugepage_area(mm->context, ea))
return -1;
- ea &= ~(HPAGE_SIZE-1);
-
/* We have to find the first hugepte in the batch, since
* that's the one that will store the HPTE flags */
+ ea &= HPAGE_MASK;
ptep = hugepte_offset(mm, ea);
/* Search the Linux page table for a match with va */
@@ -683,7 +753,7 @@
* prevented then send the problem up to do_page_fault.
*/
is_write = access & _PAGE_RW;
- if (unlikely(is_write && !(hugepte_val(*ptep) & _HUGEPAGE_RW)))
+ if (unlikely(is_write && !hugepte_write(*ptep)))
return 1;
/*
@@ -886,10 +956,11 @@
spin_unlock(&htlbpage_lock);
}
htlbpage_max = htlbpage_free = htlbpage_total = i;
- printk("Total HugeTLB memory allocated, %d\n", htlbpage_free);
+ printk(KERN_INFO "Total HugeTLB memory allocated, %d\n",
+ htlbpage_free);
} else {
htlbpage_max = 0;
- printk("CPU does not support HugeTLB\n");
+ printk(KERN_INFO "CPU does not support HugeTLB\n");
}
return 0;
@@ -914,6 +985,121 @@
return (size + ~HPAGE_MASK)/HPAGE_SIZE <= htlbpage_free;
}
+static int hugepage_cow(struct mm_struct *mm, struct vm_area_struct *vma,
+ unsigned long address, hugepte_t *ptep, hugepte_t pte)
+{
+ struct page *old_page, *new_page;
+ int i;
+ cpumask_t tmp;
+ int local;
+
+ BUG_ON(!pfn_valid(hugepte_pfn(*ptep)));
+
+ old_page = hugepte_page(*ptep);
+
+ /* XXX are there races with checking cpu_vm_mask? - Anton */
+ tmp = cpumask_of_cpu(smp_processor_id());
+ if (cpus_equal(vma->vm_mm->cpu_vm_mask, tmp))
+ local = 1;
+
+ /* If no-one else is actually using this page, avoid the copy
+ * and just make the page writable */
+ if (!TestSetPageLocked(old_page)) {
+ int avoidcopy = (page_count(old_page) == 1);
+ unlock_page(old_page);
+ if (avoidcopy) {
+ for (i = 0; i < HUGEPTE_BATCH_SIZE; i++)
+ set_hugepte(ptep+i, hugepte_mkwrite(pte));
+
+
+ pte = __hugepte(hugepte_update(ptep, _HUGEPAGE_HPTEFLAGS, 0));
+ if (hugepte_val(pte) & _HUGEPAGE_HASHPTE)
+ flush_hash_hugepage(mm->context, address,
+ pte, local);
+ spin_unlock(&mm->page_table_lock);
+ return VM_FAULT_MINOR;
+ }
+ }
+
+ page_cache_get(old_page);
+
+ spin_unlock(&mm->page_table_lock);
+
+ new_page = alloc_hugetlb_page();
+ if (! new_page) {
+ page_cache_release(old_page);
+
+ /* Logically this is OOM, not a SIGBUS, but an OOM
+ * could cause the kernel to go killing other
+ * processes which won't help the hugepage situation
+ * at all (?) */
+ return VM_FAULT_SIGBUS;
+ }
+
+ for (i = 0; i < HPAGE_SIZE/PAGE_SIZE; i++)
+ copy_user_highpage(new_page + i, old_page + i, address + i*PAGE_SIZE);
+
+ spin_lock(&mm->page_table_lock);
+
+ /* XXX are there races with checking cpu_vm_mask? - Anton */
+ tmp = cpumask_of_cpu(smp_processor_id());
+ if (cpus_equal(vma->vm_mm->cpu_vm_mask, tmp))
+ local = 1;
+
+ ptep = hugepte_offset(mm, address);
+ if (hugepte_same(*ptep, pte)) {
+ /* Break COW */
+ for (i = 0; i < HUGEPTE_BATCH_SIZE; i++)
+ hugepte_update(ptep, ~0,
+ hugepte_val(mk_hugepte(new_page, 1)));
+
+ if (hugepte_val(pte) & _HUGEPAGE_HASHPTE)
+ flush_hash_hugepage(mm->context, address,
+ pte, local);
+
+ /* Make the old page be freed below */
+ new_page = old_page;
+ }
+ page_cache_release(new_page);
+ page_cache_release(old_page);
+ spin_unlock(&mm->page_table_lock);
+ return VM_FAULT_MINOR;
+}
+
+int handle_hugetlb_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access)
+{
+ hugepte_t *ptep;
+ int rc = VM_FAULT_SIGBUS;
+
+ spin_lock(&mm->page_table_lock);
+
+ ptep = hugepte_offset(mm, address & HPAGE_MASK);
+
+ if ( (! ptep) || hugepte_none(*ptep))
+ goto fail;
+
+ /* Otherwise, there ought to be a real hugepte here */
+ BUG_ON(hugepte_bad(*ptep));
+
+ rc = VM_FAULT_MINOR;
+
+ if (! (write_access && !hugepte_write(*ptep))) {
+ printk(KERN_WARNING "Unexpected hugepte fault (wr=%d hugepte=%08x\n",
+ write_access, hugepte_val(*ptep));
+ goto fail;
+ }
+
+ /* The only faults we should actually get are COWs */
+ /* this drops the page_table_lock */
+ return hugepage_cow(mm, vma, address, ptep, *ptep);
+
+ fail:
+ spin_unlock(&mm->page_table_lock);
+
+ return rc;
+}
+
/*
* We cannot handle pagefaults against hugetlb pages at all. They cause
* handle_mm_fault() to try to instantiate regular-sized pages in the
diff -urN ppc64-linux-2.5/arch/ppc64/mm/init.c linux-gogogo/arch/ppc64/mm/init.c
--- ppc64-linux-2.5/arch/ppc64/mm/init.c 2003-10-24 09:50:18.000000000 +1000
+++ linux-gogogo/arch/ppc64/mm/init.c 2003-11-25 14:29:53.000000000 +1100
@@ -549,7 +549,11 @@
++ptep;
} while (start < pmd_end);
} else {
- WARN_ON(pmd_hugepage(*pmd));
+ /* We don't need to flush huge
+ * pages here, because that's
+ * done in
+ * copy_hugetlb_page_range()
+ * if necessary */
start = pmd_end;
}
++pmd;
diff -urN ppc64-linux-2.5/fs/binfmt_elf.c linux-gogogo/fs/binfmt_elf.c
--- ppc64-linux-2.5/fs/binfmt_elf.c 2003-10-23 08:29:46.000000000 +1000
+++ linux-gogogo/fs/binfmt_elf.c 2003-11-27 15:58:12.000000000 +1100
@@ -265,11 +265,81 @@
#ifndef elf_map
+#ifdef CONFIG_HUGETLBFS
+#include <linux/hugetlb.h>
+
+static unsigned long elf_htlb_map(struct file *filep, unsigned long addr,
+ struct elf_phdr *eppnt, int prot, int type)
+{
+ struct file *htlbfile;
+ unsigned long start, len;
+ unsigned long map_addr;
+ int retval;
+
+ printk(KERN_DEBUG "Found HTLB ELF segment %lx-%lx\n",
+ addr, addr + eppnt->p_memsz);
+ start = addr & HPAGE_MASK;
+ len = ALIGN(eppnt->p_memsz + (addr & ~HPAGE_MASK), HPAGE_SIZE);
+
+ /* If we have data from the file to put in the segment, we
+ * have to make it writable, so that we can read it in there
+ * (mprotect() doesn't work on hugepages */
+ if (eppnt->p_filesz != 0)
+ prot |= PROT_WRITE;
+
+ if (is_potential_hugepage_range(start, len) != 0) {
+ printk(KERN_WARNING "HTLB ELF segment is not a valid hugepage range\n");
+ return -EINVAL;
+ }
+
+ htlbfile = hugetlb_zero_setup(eppnt->p_memsz);
+ if (IS_ERR(htlbfile)) {
+ printk(KERN_WARNING "Unable to allocate HTLB ELF segment (%ld)\n",
+ PTR_ERR(htlbfile));
+ return PTR_ERR(htlbfile);
+ }
+ set_file_hugepages(htlbfile);
+ down_write(¤t->mm->mmap_sem);
+ map_addr = do_mmap(htlbfile, start, len, prot, type, 0);
+ up_write(¤t->mm->mmap_sem);
+ fput(htlbfile);
+
+ if (eppnt->p_filesz != 0) {
+ loff_t pos = eppnt->p_offset;
+
+ printk("Reading %lu bytes of file data into HTLB segment\n",
+ (unsigned long) eppnt->p_filesz);
+ retval = vfs_read(filep, (void __user *)addr, eppnt->p_filesz, &pos);
+ printk("HTLB read returned %d\n", retval);
+ if (retval < 0) {
+ extern asmlinkage long sys_munmap(unsigned long, size_t);
+ sys_munmap(start, len);
+ return retval;
+ }
+ }
+
+
+ return map_addr;
+}
+#else
+static inline int elf_htlb_map(struct file *filep, unsigned long addr,
+ struct elf_phdr *eppnt, int prot, int type)
+{
+ return -ENOSYS;
+}
+#endif
static unsigned long elf_map(struct file *filep, unsigned long addr,
struct elf_phdr *eppnt, int prot, int type)
{
unsigned long map_addr;
+ if (eppnt->p_flags & PF_LINUX_HTLB) {
+ map_addr = elf_htlb_map(filep, addr, eppnt, prot, type);
+ if (map_addr < (unsigned long)(-1024))
+ return map_addr;
+ printk(KERN_DEBUG "Falling back to non HTLB allocation\n");
+ }
+
down_write(¤t->mm->mmap_sem);
map_addr = do_mmap(filep, ELF_PAGESTART(addr),
eppnt->p_filesz + ELF_PAGEOFFSET(eppnt->p_vaddr), prot, type,
diff -urN ppc64-linux-2.5/include/asm-ppc64/mmu_context.h linux-gogogo/include/asm-ppc64/mmu_context.h
--- ppc64-linux-2.5/include/asm-ppc64/mmu_context.h 2003-09-12 21:06:51.000000000 +1000
+++ linux-gogogo/include/asm-ppc64/mmu_context.h 2003-11-25 13:07:49.000000000 +1100
@@ -80,6 +80,8 @@
{
long head;
unsigned long flags;
+ /* This does the right thing across a fork (I hope) */
+ unsigned long low_hpages = mm->context & CONTEXT_LOW_HPAGES;
spin_lock_irqsave(&mmu_context_queue.lock, flags);
@@ -90,6 +92,7 @@
head = mmu_context_queue.head;
mm->context = mmu_context_queue.elements[head];
+ mm->context |= low_hpages;
head = (head < LAST_USER_CONTEXT-1) ? head+1 : 0;
mmu_context_queue.head = head;
diff -urN ppc64-linux-2.5/include/asm-ppc64/page.h linux-gogogo/include/asm-ppc64/page.h
--- ppc64-linux-2.5/include/asm-ppc64/page.h 2003-09-12 21:06:51.000000000 +1000
+++ linux-gogogo/include/asm-ppc64/page.h 2003-11-24 18:00:54.000000000 +1100
@@ -37,11 +37,22 @@
#define TASK_HPAGE_END_32 (0xc0000000UL)
#define ARCH_HAS_HUGEPAGE_ONLY_RANGE
+#define ARCH_HAS_PREPARE_HUGEPAGE_RANGE
+
+#define is_hugepage_low_range(addr, len) \
+ (((addr) > (TASK_HPAGE_BASE_32-(len))) && ((addr) < TASK_HPAGE_END_32))
+#define is_hugepage_high_range(addr, len) \
+ (((addr) > (TASK_HPAGE_BASE-(len))) && ((addr) < TASK_HPAGE_END))
+
+#define is_hugepage_potential_range(addr, len) \
+ (is_hugepage_high_range(addr, len) || is_hugepage_low_range(addr, len))
#define is_hugepage_only_range(addr, len) \
- ( ((addr > (TASK_HPAGE_BASE-len)) && (addr < TASK_HPAGE_END)) || \
- ((current->mm->context & CONTEXT_LOW_HPAGES) && \
- (addr > (TASK_HPAGE_BASE_32-len)) && (addr < TASK_HPAGE_END_32)) )
+ (is_hugepage_high_range((addr), (len)) || \
+ ( (current->mm->context & CONTEXT_LOW_HPAGES) && \
+ is_hugepage_low_range((addr), (len)) ) )
+
#define HAVE_ARCH_HUGETLB_UNMAPPED_AREA
+#define ARCH_HANDLES_HUGEPAGE_FAULTS
#define in_hugepage_area(context, addr) \
((cur_cpu_spec->cpu_features & CPU_FTR_16M_PAGE) && \
diff -urN ppc64-linux-2.5/include/linux/elf.h linux-gogogo/include/linux/elf.h
--- ppc64-linux-2.5/include/linux/elf.h 2003-10-07 11:38:42.000000000 +1000
+++ linux-gogogo/include/linux/elf.h 2003-11-18 16:46:12.000000000 +1100
@@ -271,6 +271,11 @@
#define PF_W 0x2
#define PF_X 0x1
+#define PF_MASKOS 0x0ff00000
+#define PF_MASKPROC 0xf0000000
+
+#define PF_LINUX_HTLB 0x00100000
+
typedef struct elf32_phdr{
Elf32_Word p_type;
Elf32_Off p_offset;
diff -urN ppc64-linux-2.5/include/linux/hugetlb.h linux-gogogo/include/linux/hugetlb.h
--- ppc64-linux-2.5/include/linux/hugetlb.h 2003-09-27 22:48:37.000000000 +1000
+++ linux-gogogo/include/linux/hugetlb.h 2003-11-25 15:04:35.000000000 +1100
@@ -41,6 +41,22 @@
#define is_hugepage_only_range(addr, len) 0
#endif
+#ifndef ARCH_HAS_PREPARE_HUGEPAGE_RANGE
+#define is_potential_hugepage_range(addr, len) \
+ (is_aligned_hugepage_range((addr), (len)))
+#define prepare_hugepage_range(addr, len) (0)
+#else
+int is_potential_hugepage_range(unsigned long addr, unsigned long len);
+int prepare_hugepage_range(unsigned long addr, unsigned long len);
+#endif
+
+#ifndef ARCH_HANDLES_HUGEPAGE_FAULTS
+#define handle_hugetlb_mm_fault(mm, vma, a, w) (VM_FAULT_SIGBUS)
+#else
+int handle_hugetlb_mm_fault(struct mm_struct *mm, struct vm_area_struct * vma,
+ unsigned long address, int write_access);
+#endif
+
#else /* !CONFIG_HUGETLB_PAGE */
static inline int is_vm_hugetlb_page(struct vm_area_struct *vma)
@@ -61,6 +77,8 @@
#define mark_mm_hugetlb(mm, vma) do { } while (0)
#define follow_huge_pmd(mm, addr, pmd, write) 0
#define is_aligned_hugepage_range(addr, len) 0
+#define is_allowed_hugepage_range(addr, len) 0
+#define prepare_hugepage_range(addr, len) (-EINVAL)
#define pmd_huge(x) 0
#define is_hugepage_only_range(addr, len) 0
diff -urN ppc64-linux-2.5/mm/memory.c linux-gogogo/mm/memory.c
--- ppc64-linux-2.5/mm/memory.c 2003-11-17 11:20:18.000000000 +1100
+++ linux-gogogo/mm/memory.c 2003-11-18 12:42:34.000000000 +1100
@@ -1603,7 +1603,8 @@
inc_page_state(pgfault);
if (is_vm_hugetlb_page(vma))
- return VM_FAULT_SIGBUS; /* mapping truncation does this. */
+ /* mapping truncation can do this. */
+ return handle_hugetlb_mm_fault(mm, vma, address, write_access);
/*
* We need the page table lock to synchronize with kswapd
diff -urN ppc64-linux-2.5/mm/mmap.c linux-gogogo/mm/mmap.c
--- ppc64-linux-2.5/mm/mmap.c 2003-10-23 08:29:46.000000000 +1000
+++ linux-gogogo/mm/mmap.c 2003-11-25 15:04:49.000000000 +1100
@@ -787,7 +787,9 @@
/*
* Make sure that addr and length are properly aligned.
*/
- ret = is_aligned_hugepage_range(addr, len);
+ ret = is_potential_hugepage_range(addr, len);
+ if (ret == 0)
+ ret = prepare_hugepage_range(addr, len);
} else {
/*
* Ensure that a normal request is not falling in a
More information about the Linuxppc64-dev
mailing list