[PATCH] 8xx: get_mmu_context() for (very) FEW_CONTEXTS and KERNEL_PREEMPT race/starvation issue
Marcelo Tosatti
marcelo.tosatti at cyclades.com
Thu Jun 30 01:54:45 EST 2005
Hi Guillaume,
On Wed, Jun 29, 2005 at 11:32:19AM -0400, Guillaume Autran wrote:
>
> Benjamin Herrenschmidt wrote:
>
> >On Tue, 2005-06-28 at 09:42 -0400, Guillaume Autran wrote:
> >
> >
> >>Hi,
> >>
> >>I happen to notice a race condition in the mmu_context code for the 8xx
> >>with very few context (16 MMU contexts) and kernel preemption enable. It
> >>is hard to reproduce has it shows only when many processes are
> >>created/destroy and the system is doing a lot of IRQ processing.
> >>
> >>In short, one process is trying to steal a context that is in the
> >>process of being freed (mm->context == NO_CONTEXT) but not completely
> >>freed (nr_free_contexts == 0).
> >>The steal_context() function does not do anything and the process stays
> >>in the loop forever.
> >>
> >>Anyway, I got a patch that fixes this part. Does not seem to affect
> >>scheduling latency at all.
> >>
> >>Comments are appreciated.
> >>
> >>
> >
> >Your patch seems to do a hell lot more than fixing this race ... What
> >about just calling preempt_disable() in destroy_context() instead ?
> >
> >
> I'm still a bit confused with "kernel preemption". One thing for sure is
> that disabling kernel preemption does indeed fix my problem.
> So, my question is, what if a task in the middle of being schedule gets
> preempted by an IRQ handler, where will this task restart execution ?
> Back at the beginning of schedule or where it left of ?
Execution is resumed exactly where it has been interrupted.
> The idea behind my patch was to get rid of that nr_free_contexts counter
> that is (I thing) redundant with the context_map.
Apparently its there to avoid the spinlock exactly on !FEW_CONTEXTS machines.
I suppose that what happens is that get_mmu_context() gets preempted after stealing
a context (so nr_free_contexts = 0), but before setting next_mmu_context to the
next entry
next_mmu_context = (ctx + 1) & LAST_CONTEXT;
So if the now running higher prio tasks calls switch_mm() (which is likely to happen)
it loops forever on atomic_dec_if_positive(&nr_free_contexts), while steal_context()
sees "mm->context == CONTEXT".
I think that you should try "preempt_disable()/preempt_enable" pair at entry and
exit of get_mmu_context() - I suppose around destroy_context() is not enough (you
can try that also).
spinlock ends up calling preempt_disable().
More information about the Linuxppc-embedded
mailing list