ppc64/cell: local TLB flush with active SPEs

Mark Nutter mnutter at us.ibm.com
Thu Oct 13 08:09:26 EST 2005


For reference, the 2.6.3 bring-up kernel always issued global TLBIE.  This 
was a hack, and we very much wanted to improve performance if possible, 
particularly for the vast majority of PPC applications out there that 
don't use SPEs.

As long as we are thinking about a proper solution, the whole 
mm->cpu_vm_mask thing is broken, at least as a selector for local -vs- 
global TLBIE.  The problem, as I see it, is that memory regions can shared 
among processes (via mmap/shmat), with each task bound to different 
processors.  If we are to continue using a cpumask as selector for TLBIE, 
then we really need a vma->cpu_vma_mask. 
 
---
Mark Nutter
STI Design Center / IBM
email: mnutter at us.ibm.com
voice: 512-838-1612
fax: 512-838-1927
11400 Burnet Road
Mail Stop 906/3003B
Austin, TX 78758





Arnd Bergmann <arnd at arndb.de>
10/12/2005 01:03 PM
 
        To:     linuxppc64-dev at ozlabs.org, linux-mm at kvack.org
        cc:     Benjamin Herrenschmidt <benh at kernel.crashing.org>, Paul 
Mackerras <paulus at samba.org>, Mark Nutter/Austin/IBM at IBMUS, Michael 
Day/Austin/IBM at IBMUS, Ulrich Weigand <Ulrich.Weigand at de.ibm.com>
        Subject:        ppc64/cell: local TLB flush with active SPEs


I'm looking for a clean solution to detect the need for global
TLB flush when an mm_struct is only used on one logical PowerPC
CPU (PPE) and also mapped with the memory flow controller of an
SPE on the Cell CPU.

Normally, we set bits in mm_struct:cpu_vm_mask for each CPU that
accesses the mm and then do global flushes instead of local flushes
when CPUs other than the currently running one are marked as used
in that mask. When an SPE does DMA to that mm, it also gets local
TLB entries that are only flushed with a global tlbie broadcast.

The current hack is to always set cpu_vm_mask to all bits set
when we map an mm into an SPE to ensure receiving the broadcast,
but that is obviously not how it's meant to be used. In particular,
it doesn't work in UP configurations where the cpumask contains
only one bit.

One solution that might be better could be to introduce a new special
flag in addition to cpu_vm_mask for this purpose. We already have
a bit field in mm_struct for dumpable, so adding another bit there
at least does not waste space for other platforms, and it's likely
to be in the same cache line as cpu_vm_mask. However, I'm reluctant
to add more bit fields to such a prominent place, because it might
encourage other people to add more bit fields or thing that they
are accepted coding practice.

Another idea would be to add a new field to mm_context_t, so it stays
in the architecture specific code. Again, adding an int here does
not waste space because there is currently padding in that place on
ppc64.

Or maybe there is a completely different solution.

Suggestions?

                 Arnd <><

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://ozlabs.org/pipermail/linuxppc64-dev/attachments/20051012/f625a3c9/attachment.htm 


More information about the Linuxppc64-dev mailing list