[PATCH -V5 06/25] powerpc: Reduce PTE table memory wastage
David Gibson
dwg at au1.ibm.com
Wed Apr 10 17:04:03 EST 2013
On Wed, Apr 10, 2013 at 11:59:29AM +0530, Aneesh Kumar K.V wrote:
> David Gibson <dwg at au1.ibm.com> writes:
> > On Thu, Apr 04, 2013 at 11:27:44AM +0530, Aneesh Kumar K.V wrote:
[snip]
> >> @@ -97,13 +100,45 @@ void __destroy_context(int context_id)
> >> }
> >> EXPORT_SYMBOL_GPL(__destroy_context);
> >>
> >> +#ifdef CONFIG_PPC_64K_PAGES
> >> +static void destroy_pagetable_page(struct mm_struct *mm)
> >> +{
> >> + int count;
> >> + struct page *page;
> >> +
> >> + page = mm->context.pgtable_page;
> >> + if (!page)
> >> + return;
> >> +
> >> + /* drop all the pending references */
> >> + count = atomic_read(&page->_mapcount) + 1;
> >> + /* We allow PTE_FRAG_NR(16) fragments from a PTE page */
> >> + count = atomic_sub_return(16 - count, &page->_count);
> >
> > You should really move PTE_FRAG_NR to a header so you can actually use
> > it here rather than hard coding 16.
> >
> > It took me a fair while to convince myself that there is no race here
> > with something altering mapcount and count between the atomic_read()
> > and the atomic_sub_return(). It could do with a comment to explain
> > why that is safe.
> >
> > Re-using the mapcount field for your index also seems odd, and it took
> > me a while to convince myself that that's safe too. Wouldn't it be
> > simpler to store a pointer to the next sub-page in the mm_context
> > instead? You can get from that to the struct page easily enough with a
> > shift and pfn_to_page().
>
> I found using _mapcount simpler in this case. I was looking at it not
> as an index, but rather how may fragments are mapped/used already.
Except that it's actually (#fragments - 1). Using subpage pointer
makes the fragments calculation (very slightly) harder, but the
calculation of the table address easier. More importantly it avoids
adding effectively an extra variable - which is then shoehorned into a
structure not really designed to hold it.
> Using
> subpage pointer in mm->context.xyz means, we have to calculate the
> number of fragments used/mapped via the pointer. We need the fragment
> count so that we can drop page reference count correctly here.
>
>
> >
> >> + if (!count) {
> >> + pgtable_page_dtor(page);
> >> + reset_page_mapcount(page);
> >> + free_hot_cold_page(page, 0);
> >
> > It would be nice to use put_page() somehow instead of duplicating its
> > logic, though I realise the sparc code you've based this on does the
> > same thing.
>
> That is not exactly put_page. We can avoid lots of check in this
> specific case.
[snip]
> >> +static pte_t *__alloc_for_cache(struct mm_struct *mm, int kernel)
> >> +{
> >> + pte_t *ret = NULL;
> >> + struct page *page = alloc_page(GFP_KERNEL | __GFP_NOTRACK |
> >> + __GFP_REPEAT | __GFP_ZERO);
> >> + if (!page)
> >> + return NULL;
> >> +
> >> + spin_lock(&mm->page_table_lock);
> >> + /*
> >> + * If we find pgtable_page set, we return
> >> + * the allocated page with single fragement
> >> + * count.
> >> + */
> >> + if (likely(!mm->context.pgtable_page)) {
> >> + atomic_set(&page->_count, PTE_FRAG_NR);
> >> + atomic_set(&page->_mapcount, 0);
> >> + mm->context.pgtable_page = page;
> >> + }
> >
> > .. and in the unlikely case where there *is* a pgtable_page already
> > set, what then? Seems like you should BUG_ON, or at least return NULL
> > - as it is you will return the first sub-page of that page again,
> > which is very likely in use.
>
>
> As explained in the comment above, we return with the allocated page
> with fragment count set to 1. So we end up having only one fragment. The
> other option I had was to to free the allocated page and do a
> get_from_cache under the page_table_lock. But since we already allocated
> the page, why not use that ?. It also keep the code similar to
> sparc.
My point is that I can't see any circumstance under which we should
ever hit this case. Which means if we do something is badly messed up
and we should BUG() (or at least WARN()).
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20130410/8a5de1cf/attachment.sig>
More information about the Linuxppc-dev
mailing list