[PATCH v2 1/3] powerpc/powernv: Always stop secondaries before reboot/shutdown

Michael Ellerman mpe at ellerman.id.au
Sun Nov 12 22:57:28 AEDT 2017


Nicholas Piggin <npiggin at gmail.com> writes:

> On Fri, 10 Nov 2017 22:08:32 +1100
> Michael Ellerman <mpe at ellerman.id.au> wrote:
>
>> Nicholas Piggin <npiggin at gmail.com> writes:
>> 
>> > Currently powernv reboot and shutdown requests just leave secondaries
>> > to do their own things. This is undesirable because they can trigger
>> > any number of watchdogs while waiting for reboot, but also we don't
>> > know what else they might be doing, or they might be stuck somewhere
>> > causing trouble.
>> >
>> > The opal scheduled flash update code already ran into watchdog problems
>> > due to flashing taking a long time, but it's possible for regular
>> > reboots to trigger problems too (this is with watchdog_thresh set to 1,
>> > but I have seen it with watchdog_thresh at the default value once too):
>> >
>> >   reboot: Restarting system
>> >   [  360.038896709,5] OPAL: Reboot request...
>> >   Watchdog CPU:0 Hard LOCKUP
>> >   Watchdog CPU:44 detected Hard LOCKUP other CPUS:16
>> >   Watchdog CPU:16 Hard LOCKUP
>> >   watchdog: BUG: soft lockup - CPU#16 stuck for 3s! [swapper/16:0]
>> >
>> > So remove the special case for flash update, and unconditionally do
>> > smp_send_stop before rebooting.
>> >
>> > Return the CPUs to Linux stop loops rather than OPAL. The reason for
>> > this is that the path to firmware is longer, and the CPUs may have
>> > been interrupted from firmware, which may cause problems to re-enter
>> > it. It's better to put them into a simple spin loop to maximize the
>> > chance of a successful reboot.  
>> 
>> I always assumed we had to send the CPUs back to OPAL for the flashing
>> procedure. Is it OK to leave them in Linux?
>
> According to the comment and changelog
>
> 2196c6f1ed66eef23df3b478cfe71661ae83726e
>
> It was added just to keep secondaries from going silly. Vasant, can
> you remember details?

OK. My worry is that we've established an implicit contract with skiboot
on how we do this, and now we're looking to change it.

So I guess we just want to confirm that the skiboot code has not grown a
dependency on us returning CPUs, and then we should probably document
what the expectations are in eg. the OPAL_FLASH_UPDATE docs.

cheers


More information about the Linuxppc-dev mailing list