[PATCH v6 20/29] init/main: Delay initialization of the lockup detector after smp_init()

Nicholas Piggin npiggin at gmail.com
Fri May 20 10:25:04 AEST 2022


Excerpts from Ricardo Neri's message of May 14, 2022 9:16 am:
> On Tue, May 10, 2022 at 08:38:22PM +1000, Nicholas Piggin wrote:
>> Excerpts from Ricardo Neri's message of May 6, 2022 9:59 am:
>> > Certain implementations of the hardlockup detector require support for
>> > Inter-Processor Interrupt shorthands. On x86, support for these can only
>> > be determined after all the possible CPUs have booted once (in
>> > smp_init()). Other architectures may not need such check.
>> > 
>> > lockup_detector_init() only performs the initializations of data
>> > structures of the lockup detector. Hence, there are no dependencies on
>> > smp_init().
>> 
> 
> Thank you for your feedback Nicholas!
> 
>> I think this is the only real thing which affects other watchdog types?
> 
> Also patches 18 and 19 that decouple the NMI watchdog functionality from
> perf.
> 
>> 
>> Not sure if it's a big problem, the secondary CPUs coming up won't
>> have their watchdog active until quite late, and the primary could
>> implement its own timeout in __cpu_up for secondary coming up, and
>> IPI it to get traces if necessary which is probably more robust.
> 
> Indeed that could work. Another alternative I have been pondering is to boot
> the system with the perf-based NMI watchdog enabled. Once all CPUs are up
> and running, switch to the HPET-based NMI watchdog and free the PMU counters.

Just to cover smp_init()? Unless you could move the watchdog 
significantly earlier, I'd say it's probably not worth bothering
with.

Yes the boot CPU is doing *some* work that could lock up, but most 
complexity is in the secondaries coming up and they won't have their own 
watchdog coverage for a good chunk of that anyway.

If anything I would just add some timeout warning or IPI or something in
those wait loops in x86's __cpu_up code if you are worried about 
catching issues here. Actually the watchdog probably wouldn't catch any
of those anyway because they either run with interrupts enabled or
touch_nmi_watchdog()! So yeah that'd be pretty pointless.

Thanks,
Nick


More information about the Linuxppc-dev mailing list