[PATCH] [2.4] [RHEL] Backport of benh's PTE mgmt changes

Benjamin Herrenschmidt benh at kernel.crashing.org
Wed Jan 7 16:08:09 EST 2004


On Wed, 2004-01-07 at 11:01, olof at austin.ibm.com wrote:
> Below is a 2.4 backport of parts of benh's 2.6 pte_free rewrite. It's
> different in a few ways:
>
> 1. 2.4 has no RCU. Instead I just send a syncronous IPI to all processors.
> Since the IPI won't be delivered until a processor is out of hash_page, it
> can be used as a barrier between new and old traversals.

But is also quite expensive....

> 2. There's no batching of TLB shootdowns, like in 2.6. So I had to hijack
> do_check_pgt_cache(). This is ugly, and I'm not too happy about it, but
> I think RedHat would be more likely to accept this than a change in
> generic code (at this point in the product cycle). Julie, feel free to
> prove me wrong. :-)
>
> 3. Because of the above reason, I had to add an extra per-cpu lock for the
> pte_freelist_batch structures.
>
> 4. The __hash_page locking is rougher than in 2.6. I left the hash locks
> there, since I believe they are still needed.
>
> 5. I recycled _PAGE_HASHNOIX, since it's never used. There were no other
> free bits available...

I moved bits around on 2.6, basically, _PAGE_FILE can be moved as it's
only used when !_PAGE_PRESENT, to make room.
>
> (6. RedHat disabled the fast PTE/PMD/PGD allocator, so the patch won't
> apply cleanly to an ameslab or marcelo 2.4 tree, but the differences are
> pretty obvious.)
>
>
>
> I think that's it. Please provide feedback. We're working on a deadline
> with RedHat, so sooner is better than later. I'll be beating on this with
> the specweb benchmark over the next couple of days as well. :-)

Comments in the patch.

> +	/*
> +	 * Check the user's access rights to the page.  If access should be
> +	 * prevented then send the problem up to do_page_fault.
> +	 */
> +
>  	access |= _PAGE_PRESENT;
> -	if (unlikely(access & ~(pte_val(*ptep)))) {
> +
> +	/* We'll do access checking and _PAGE_BUSY setting in assembly, since
> +	 * it needs to be atomic.
> +	 */
> +
> +	__asm__ __volatile__ ("\n
> +	1:	ldarx	%0,0,%3\n
> +		# Check access rights (access & ~(pte_val(*ptep)))\n
> +		andc.	%1,%2,%0\n
> +		bne-	2f\n
> +		# Check if PTE is busy\n
> +		andi.	%1,%0,%4\n
> +		bne-	1b\n
> +		ori	%0,%0,%4\n
> +		# Write the linux PTE atomically (setting busy)\n
> +		stdcx.	%0,0,%3\n
> +		bne-	1b\n
> +		li      %1,1\n
> +                b	3f\n
> +	2:      stdcx.  %0,0,%3 # to clear the reservation\n
> +		li      %1,0\n
> +	3:"
> +	: "=r" (old_pte), "=r" (access_ok)
> +	: "r" (access), "r" (ptep), "i" (_PAGE_BUSY)
> +        : "cc", "memory");

 .../...

Heh, so you kept the C version stuffing the asm atomic stuff
in :) Why note... well, it's definitely less invasive that what
I did in 2.6 but also less performant since I optimized the
branches to the ppc_md. hooks. That's probably ok for 2.4 though.

> +	/* Clear _PAGE_BUSY flag atomically. */
> +	__asm__ __volatile__ ("
> +	1:	ldarx	%0,0,%2\n
> +		andc.	%0,%0,%1\n
> +		stdcx.	%0,0,%2\n
> +		bne-	1b\n"
> +	: "=r" (new_pte)
> +        : "r" (tmp), "r" (ptep)
> +        : "cc", "memory");

I'm not sure we need to clear _PAGE_BUSY atomically.... I definitely
don't in 2.6... But we need to make sure this clear happens after
anything that was done previously.

The rest is a bit scary but it's 2.4 so... :) I suppose it should
work though I would have to spend more time looking at the code path
in details

Ben.


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list