[PATCH 2/2] ppc: lazy flush_tlb_mm for nohash architectures
Dave Kleikamp
shaggy at linux.vnet.ibm.com
Tue Oct 19 08:57:01 EST 2010
On Thu, 2010-10-14 at 11:52 +1100, Benjamin Herrenschmidt wrote:
> On Fri, 2010-09-24 at 13:01 -0500, Dave Kleikamp wrote:
> > On PPC_MMU_NOHASH processors that support a large number of contexts,
> > implement a lazy flush_tlb_mm() that switches to a free context, marking
> > the old one stale. The tlb is only flushed when no free contexts are
> > available.
> >
> > The lazy tlb flushing is controlled by the global variable tlb_lazy_flush
> > which is set during init, dependent upon MMU_FTR_TYPE_47x.
>
> Unless I'm mistaken, there are some issues with that patch... sorry for
> the late review, I've been away for a couple of weeks.
>
> > +int tlb_lazy_flush;
> > +static int tlb_needs_flush[NR_CPUS];
> > +static unsigned long *context_available_map;
> > +static unsigned int nr_stale_contexts;
>
> Now I understand what you're doing here, but wouldn't it have been
> possible to re-use the existing stale map concept or do you reckon it
> would have been too messy ?
I didn't like the implementation of a per-core stale map. The existing
implementation flushes the core's tlb, but only clears a specific entry
from the stale map. It's dealing with the stale contexts one at a time,
where the new function is accumulating many stale contexts, with the
intent to do a single tlb flush per core.
Since I originally intended the new function only to be enabled on the
47x, I left the context-stealing code as untouched as possible thinking
it wouldn't be exercised in 47x-land. This was probably narrow-minded,
and I should look at either 1) aligning the context-stealing code closer
to the new lazy flush code, or 2) dropping this code on the floor and
picking back up the new design that we worked on last year.
> At the very least, the "old style" stale map code and "new style" stale
> TLB code should be more in sync, you may end up flushing the TLB
> twice...
yeah. if we enable this for 440, it is more likely to be an issue than
on 476.
> > @@ -189,6 +220,38 @@ static void context_check_map(void)
> > static void context_check_map(void) { }
> > #endif
> >
> > +/*
> > + * On architectures that support a large number of contexts, the tlb
> > + * can be flushed lazily by picking a new context and making the stale
> > + * context unusable until a lazy tlb flush has been issued.
> > + *
> > + * context_available_map keeps track of both active and stale contexts,
> > + * while context_map continues to track only active contexts. When the
> > + * lazy tlb flush is triggered, context_map is copied to
> > + * context_available_map, making the once-stale contexts available again
> > + */
> > +static void recycle_stale_contexts(void)
> > +{
> > + if (nr_free_contexts == 0 && nr_stale_contexts > 0) {
>
> Do an early return and avoid the indentation instead ?
Yeah, that makes sense.
> > + unsigned int cpu = smp_processor_id();
> > + unsigned int i;
> > +
> > + pr_hard("[%d] recycling stale contexts\n", cpu);
> > + /* Time to flush the TLB's */
> > + memcpy(context_available_map, context_map, CTX_MAP_SIZE);
> > + nr_free_contexts = nr_stale_contexts;
> > + nr_stale_contexts = 0;
> > + for_each_online_cpu(i) {
> > + if ((i < cpu_first_thread_in_core(cpu)) ||
> > + (i > cpu_last_thread_in_core(cpu)))
> > + tlb_needs_flush[i] = 1;
> > + else
> > + tlb_needs_flush[i] = 0; /* This core */
> > + }
> > + _tlbil_all();
> > + }
> > +}
> > +
> > void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > {
> > unsigned int i, id, cpu = smp_processor_id();
> > @@ -197,6 +260,8 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > /* No lockless fast path .. yet */
> > raw_spin_lock(&context_lock);
> >
> > + flush_recycled_contexts(cpu);
> > +
>
> Ok so here's the nasty one I think. You need to make sure that whenever
> you pick something off the context_available_map, you've done the above
> first within the same context_lock section right ? At least before you
> actually -use- said context.
right.
> So far so good ... but steal_context can drop the lock iirc. So you may
> need to re-flush there. Not sure that can happen in practice but better
> safe than sorry. I would have preferred seeing that flush near the end
> of the function to avoid such problem.
I can fix this. For 476, I don't think that even if steal_context()
could be called, it wouldn't drop the lock. But then again, if we
enable this for other architectures, it may be a possibility.
> Then, you can end up in cases where you flush the TLB, but your context
> is marked stale due to stealing, and flush again. That's one of the
> reason I wonder if we can consolidate a bit the two orthogonal
> "staleness" concepts we have now.
>
> Granted, stealing on 47x is unlikely, but I have reasons to think that
> this lazy flushing will benefit 440 too.
>
> > pr_hard("[%d] activating context for mm @%p, active=%d, id=%d",
> > cpu, next, next->context.active, next->context.id);
> >
> > @@ -227,7 +292,12 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > id = next_context;
> > if (id > last_context)
> > id = first_context;
> > - map = context_map;
> > +
> > + if (tlb_lazy_flush) {
> > + recycle_stale_contexts();
> > + map = context_available_map;
> > + } else
> > + map = context_map;
> >
> > /* No more free contexts, let's try to steal one */
> > if (nr_free_contexts == 0) {
> > @@ -250,6 +320,13 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > if (id > last_context)
> > id = first_context;
> > }
> > + if (tlb_lazy_flush)
> > + /*
> > + * In the while loop above, we set the bit in
> > + * context_available_map, it also needs to be set in
> > + * context_map
> > + */
> > + __set_bit(id, context_map);
> > stolen:
> > next_context = id + 1;
> > context_mm[id] = next;
> > @@ -267,7 +344,7 @@ void switch_mmu_context(struct mm_struct *prev, struct mm_struct *next)
> > id, cpu_first_thread_in_core(cpu),
> > cpu_last_thread_in_core(cpu));
> >
> > - local_flush_tlb_mm(next);
> > + __local_flush_tlb_mm(next);
> >
> > /* XXX This clear should ultimately be part of local_flush_tlb_mm */
> > for (i = cpu_first_thread_in_core(cpu);
> > @@ -317,11 +394,61 @@ void destroy_context(struct mm_struct *mm)
> > mm->context.active = 0;
> > #endif
> > context_mm[id] = NULL;
> > - nr_free_contexts++;
> > +
> > + if (tlb_lazy_flush)
> > + nr_stale_contexts++;
> > + else
> > + nr_free_contexts++;
> > }
> > raw_spin_unlock_irqrestore(&context_lock, flags);
> > }
>
> Now...
>
> > +/*
> > + * This is called from flush_tlb_mm(). Mark the current context as stale
> > + * and grab an available one. The tlb will be flushed when no more
> > + * contexts are available
> > + */
> > +void lazy_flush_context(struct mm_struct *mm)
> > +{
> > + unsigned int id;
> > + unsigned long flags;
> > + unsigned long *map;
> > +
> > + raw_spin_lock_irqsave(&context_lock, flags);
> > +
> > + id = mm->context.id;
> > + if (unlikely(id == MMU_NO_CONTEXT))
> > + goto no_context;
>
> First thing is ... you reproduce quite a bit of logic from
> switch_mmu_context() here. Shouldn't it be abstracted in a separate
> function ?
I'm sure there's something I can do there.
> The other thing here is that another CPU might have done a
> recycle_stale_contexts() before you get here. IE. Your TLB may be stale.
> Shouln't you do a flush here ? Since you are picking up a new PID from
> the context_available_map, it can potentially be stale if your tlb needs
> flushing due to another CPU having just done a recycle.
It looks like I missed that. It does seem that there should be a flush
in here.
> > + /*
> > + * Make the existing context stale. It remains in
> > + * context_available_map as long as nr_free_contexts remains non-zero
> > + */
> > + __clear_bit(id, context_map);
> > + context_mm[id] = NULL;
> > + nr_stale_contexts++;
> > +
> > + recycle_stale_contexts();
> > + BUG_ON(nr_free_contexts == 0);
> > +
> > + nr_free_contexts--;
> > + id = last_context;
> > + map = context_available_map;
> > + while (__test_and_set_bit(id, map)) {
> > + id = find_next_zero_bit(map, last_context+1, id);
> > + if (id > last_context)
> > + id = first_context;
> > + }
> > + set_bit(id, context_map);
> > + next_context = id + 1;
> > + context_mm[id] = mm;
> > + mm->context.id = id;
> > + if (current->active_mm == mm)
> > + set_context(id, mm->pgd);
> > +no_context:
> > + raw_spin_unlock_irqrestore(&context_lock, flags);
> > +}
> > +
> > #ifdef CONFIG_SMP
> >
> > static int __cpuinit mmu_context_cpu_notify(struct notifier_block *self,
> > @@ -407,6 +534,7 @@ void __init mmu_context_init(void)
> > } else if (mmu_has_feature(MMU_FTR_TYPE_47x)) {
> > first_context = 1;
> > last_context = 65535;
> > + tlb_lazy_flush = 1;
> > } else {
> > first_context = 1;
> > last_context = 255;
>
> Somebody should measure on 440, might actually improve perfs. Something
> like a kernel compile sounds like a good test here.
I think I'm going to dust off the newer implementation based on your and
Paul's design. I can probably get that in good working order without
too much more work, and it's something we need to look at eventually
anyway. If I find anything that really gets in my way, I might fix up
this patch in the mean time.
Thanks,
Shaggy
--
Dave Kleikamp
IBM Linux Technology Center
More information about the Linuxppc-dev
mailing list