[PATCH] powerpc/pseries: Fix cpu hotplug
Sebastien Dugue
sebastien.dugue at bull.net
Fri Nov 28 21:04:46 EST 2008
Hi Nathan,
On Thu, 27 Nov 2008 18:14:33 -0600 Nathan Lynch <ntl at pobox.com> wrote:
> Hi, I have some questions about this patch.
>
> Sebastien Dugue wrote:
> >
> > Currently, pseries_cpu_die() calls msleep() while polling RTAS for
> > the status of the dying cpu.
> >
> > However if the cpu that is going down also happens to be the one doing
> > the tick then we're hosed as the tick_do_timer_cpu 'baton' is only passed
> > later on in tick_shutdown() when _cpu_down() does the CPU_DEAD notification.
> > Therefore jiffies won't be updated anymore.
>
> I confess unfamiliarity with the tick/timer code, but this sounds like
> something that should be addressed earlier in the process of taking
> down a CPU.
Maybe you're right, at least the tick_do_timer_cpu should be changed earlier
in the down process, but I'm not sure where we can do that.
>
>
> > This patch replaces that msleep() with a cpu_relax() to make sure we're
> > not going to schedule at that point.
>
> This is a significant change in behavior. With the msleep(), we poll
> for at least five seconds before giving up; with the cpu_relax(), the
> period will almost certainly be much shorter and we're likely to give
> up too soon in some circumstances.
Right, I realized a bit late that that would indeed change the
behaviour. On my test box (2 Power6) the msleep call is hit in ~10 % of the
cases and only loop once, but that may not be the case for all the pSeries
out there where a longer delay might be needed.
> Could be addressed by using
> mdelay(), but...
Yep, would be better. I'm still wondering why the hang is not systematic
when offlining the tick_do_timer_cpu, I must have missed something in
my analysis and there might be a race somewhere I failed to identify.
>
> It's just not clear to me how busy-waiting in the __cpu_die() path is
> a legitimate fix. Is sleeping in this path forbidden now?
In that case, I think it is if the cpu going down is also doing the
tick.
> (I notice
> at least native_cpu_die() in x86 does msleep(), btw.)
Right, as most other arches do, but the thing is that on those, you
cannot offline CPU0, whereas on power (and maybe otheres too) you can.
>
> As it can take several milliseconds for RTAS to report a CPU
> offline, and the maximum latency of the operation is unspecified, it
> seems inappropriate to tie up the waiting CPU this way.
Agreed.
>
>
> > With this patch my test box survives a 100k iterations hotplug stress
> > test on _all_ cpus, whereas without it, it quickly dies after ~50 iterations.
>
> What is the failure (e.g. stack trace, kernel messages)?
No stack trace, no kernel messages, nothing :( When that happens, the
cpus get stuck in idle and won't reschedule at all.
I verified that the decrementer is still ticking, hrtimers are still
running but regular timers are stuck. Changing the tick_do_timer_cpu manually
under xmon resurects the box.
Will have to look at this a bit more.
Thanks,
Sebastien.
More information about the Linuxppc-dev
mailing list