fsl booke MM vs. SMP questions

Tue May 22 13:03:41 EST 2007

On May 21, 2007, at 2:06 AM, Benjamin Herrenschmidt wrote:

> Hi Folks !
>
> I see that the fsl booke code has some #ifdef CONFIG_SMP bits here or
> there, thus I suppose there are some SMP implementations of these
> right ?

There will be, the SMP code that exists was just some stuff I put in  
w/o going through each case.  The TLB mgmt code does need some fixup  
for SMP.

- k

>
> I'm having some serious issues trying to figure out how the TLB
> management is made SMP safe however.
>
> There are at least two main issues I've spotted at this point (there's
> at least one more if there are HW threading, that is the TLB is shared
> between logical processors, but I'll ignore that for now since I don't
> think there is such a thing ... yet).
>
>  - How do you guys shield PTE flushing vs. TLB misses on another CPU ?
> That is, how do you prevent (if you do) the following scenario:
>
> 	cpu 0				cpu 1
> 	tlb miss			pte_clear (or similar)
> 	load PTE value
> 					write 0 to PTE (or replace)
> 					tlbviax (tlbie)
> 	tlbwe
>
> That scenario, as you can see, will leave you with stale entries in  
> the
> TLB which will ultimately lead to all sort of unpleasant/random
> behaviours.
>
> If the answer is "oops ... we don't", then let's try to find out ways
> out of that since I may have a similar issue in a not too distant
> future :-) And I'm trying to find out a -fast- way to deal with that
> without bloating the fast path. My main problem is that I want to  
> avoid
> taking a spin lock or equivalent atomic operation in the fast TLB  
> reload
> path (which would solve the problem) since lwarx/stwcx. are generally
> real slow (hundreds of cycles on some processors).
>
>  - I see that your TLB miss handle is using a non-atomic store to  
> write
> the _PAGE_ACCESSED bit back to the PTE. Don't you have a similar race
> where something would do:
>
> 	cpu 0				cpu 1
> 	tlb miss			pte_clear (or similar)
> 	load PTE value
> 					write 0 to PTE (or replace)
> 	write back PTE with _PAGE_ACCESSED
> 	tlbwe
>
> This is an extension of the previous race but it's a different problem
> so I listed it separately. In that case, the problem is worse,  
> since not
> only you have a stale TLB entry, but you -also- have corrupted the  
> linux
> PTE by writing back the old value in it.
>
> At this point, I'm afraid you may have no choice but going atomic,  
> which
> means paying the cost of lwarx/stwcx. on TLB misses, though if you  
> have
> a solution for the first problem, then you can avoid the atomic
> operation in the second problem if _PAGE_ACCESSED is already set.
>
> If not, you might have to use a _PAGE_BUSY bit similar to what 64 bits
> uses as a per-PTE lock, or use mmu_hash_lock... Unless you come up  
> with
> a great idea or some HW black magic that makes the problem go away...
>
> In any case, I'm curious about how you have or intend to solve that
> since as I said above, I might be in a similar situation soon and am
> trying to keep the TLB miss handler as fast as humanly possible.
>
> Cheers,
> Ben.
>