Worst case performance of up()

Sat Dec 2 21:35:54 EST 2006

On Mon, 2006-11-27 at 21:02 +0000, Adrian Cox wrote:
> On Sat, 2006-11-25 at 07:45 +1100, Benjamin Herrenschmidt wrote:
> > On Fri, 2006-11-24 at 16:21 +0000, Adrian Cox wrote:
> > > Does anybody have any ideas what could make up() take so long in this
> > > circumstance? I'd expect cache transfers to make the operation about 100
> > > times slower, but this looks like repeated cache ping-pong between the
> > > two CPUs.
> > 
> > Is it hung in up() (toplevel) or __up (low level) ?
> 
> Not yet proven.

By using a scope, I have further data: the system is hung in this line
of resched_task() in kernel/sched.c:
	set_tsk_thread_flag(p, TIF_NEED_RESCHED);

During this time, there is a great deal of ARTRY activity on the bus.
The sequence ends when the other CPU takes a timer tick.

I'll need to track down what the other CPU is doing at this point, but
my current hypothesis is that it's somewhere in schedule().

> > Have you tried some oprofile runs to catch the exact instruction where
> > the cycles appear to be wasted ?

Oprofile turned out to break the error condition, by increasing the
interrupt rate on each CPU.  In the end a combination of lockmeter and
an oscilloscope did the trick.

-- 
Adrian Cox <adrian at humboldt.co.uk>