[RFC PATCH -V2 05/21] powerpc: Reduce PTE table memory wastage
Aneesh Kumar K.V
aneesh.kumar at linux.vnet.ibm.com
Sat Feb 23 04:20:49 EST 2013
Paul Mackerras <paulus at samba.org> writes:
I will reply to the other parts in a seperate email, but the below
>> +static void page_table_free_rcu(struct mmu_gather *tlb, unsigned long *table)
>> +{
>> + struct page *page;
>> + struct mm_struct *mm;
>> + unsigned int bit, mask;
>> +
>> + mm = tlb->mm;
>> + /* Free 2K page table fragment of a 64K page */
>> + page = virt_to_page(table);
>> + bit = 1 << ((__pa(table) & ~PAGE_MASK) / PTE_FRAG_SIZE);
>> + spin_lock(&mm->page_table_lock);
>> + /*
>> + * stash the actual mask in higher half, and clear the lower half
>> + * and selectively, add remove from pgtable list
>> + */
>> + mask = atomic_xor_bits(&page->_mapcount, bit | (bit << FRAG_MASK_BITS));
>> + if (!(mask & FRAG_MASK))
>> + list_del(&page->lru);
>> + else {
>> + /*
>> + * Add the page table page to pgtable_list so that
>> + * the free fragment can be used by the next alloc
>> + */
>> + list_del_init(&page->lru);
>> + list_add_tail(&page->lru, &mm->context.pgtable_list);
>> + }
>> + spin_unlock(&mm->page_table_lock);
>> + tlb_remove_table(tlb, table);
>> +}
>
> This looks like you're allowing a fragment that is being freed to be
> reallocated and used again during the grace period when we are waiting
> for any references to the fragment to disappear. Doesn't that allow a
> race where one CPU traversing the page table and using the fragment in
> its old location in the tree could see a PTE created after the
> fragment was reallocated? In other words, why is it safe to allow the
> fragment to be used during the grace period? If it is safe, it at
> least needs a comment explaining why.
>
We don't allow it to be reallocated during the grace period. The trick
is in the below lines of page_table_alloc()
/*
* Update with the higher order mask bits accumulated,
* added as a part of rcu free.
*/
mask = mask | (mask >> FRAG_MASK_BITS);
When checking for mask, we also look at the higher order bits.
The reason we add the page back to &mm->context.pgtable_list in
page_table_free_rcu is because we need to have access to struct
mm_struct. We don't have that in the rcu call back. So we add early and
make sure we don't reallocate them, until the grace period is over.
I will definitely add more comments around the code to clarify these details.
-aneesh
More information about the Linuxppc-dev
mailing list