[PATCH V4 0/2] mm: FAULT_AROUND_ORDER patchset performance data for powerpc

Fri May 23 22:28:54 EST 2014

Andrew Morton wrote:
> On Wed, 21 May 2014 16:40:27 +0300 (EEST) "Kirill A. Shutemov" <kirill.shutemov at linux.intel.com> wrote:
> 
> > > Or something.  Can we please get some code commentary over
> > > do_fault_around() describing this design decision and explaining the
> > > reasoning behind it?
> > 
> > I'll do this. But if do_fault_around() rework is needed, I want to do that
> > first.
> 
> This sort of thing should be at least partially driven by observation
> and I don't have the data for that.  My seat of the pants feel is that
> after the first fault, accesses at higher addresses are more
> common/probable than accesses at lower addresses.

It's probably true for data, but the feature is mostly targeted to code pages
and situation is not that obvious to me with all jumps.

> But we don't need to do all that right now.  Let's get the current
> implementation wrapped up for 3.15: get the interface finalized (bytes,
> not pages!)

The patch above by thread is okay for that, right?

> and get the current design decisions appropriately documented.

Here it is. Based on patch to convert order->bytes.

From: "Kirill A. Shutemov" <kirill.shutemov at linux.intel.com>
Date: Fri, 23 May 2014 15:16:47 +0300
Subject: [PATCH] mm: document do_fault_around() feature

Some clarification on how faultaround works.

Signed-off-by: Kirill A. Shutemov <kirill.shutemov at linux.intel.com>
---
 mm/memory.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/mm/memory.c b/mm/memory.c
index 252b319e8cdf..8d723b8d3c86 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -3404,6 +3404,10 @@ void do_set_pte(struct vm_area_struct *vma, unsigned long address,
 
 static unsigned long fault_around_bytes = 65536;
 
+/*
+ * fault_around_pages() and fault_around_mask() round down fault_around_bytes
+ * to nearest page order. It's what do_fault_around() expects to see.
+ */
 static inline unsigned long fault_around_pages(void)
 {
 	return rounddown_pow_of_two(fault_around_bytes) / PAGE_SIZE;
@@ -3445,6 +3449,29 @@ static int __init fault_around_debugfs(void)
 late_initcall(fault_around_debugfs);
 #endif
 
+/*
+ * do_fault_around() tries to map few pages around the fault address. The hope
+ * is that the pages will be needed soon and this would lower the number of
+ * faults to handle.
+ *
+ * It uses vm_ops->map_pages() to map the pages, which skips the page if it's
+ * not ready to be mapped: not up-to-date, locked, etc.
+ *
+ * This function is called with the page table lock taken. In the split ptlock
+ * case the page table lock only protects only those entries which belong to
+ * page table corresponding to the fault address.
+ *
+ * This function don't cross the VMA boundaries in order to call map_pages()
+ * only once.
+ *
+ * fault_around_pages() defines how many pages we'll try to map.
+ * do_fault_around() expects it to be power of two and less or equal to
+ * PTRS_PER_PTE.
+ *
+ * The virtual address of the area that we map is naturally aligned to the
+ * fault_around_pages() (and therefore to page order). This way it's easier to
+ * guarantee that we don't cross the page table boundaries.
+ */
 static void do_fault_around(struct vm_area_struct *vma, unsigned long address,
 		pte_t *pte, pgoff_t pgoff, unsigned int flags)
 {
-- 
 Kirill A. Shutemov