[PATCH v4 09/17] watchdog/hardlockup: Add a "cpu" param to watchdog_hardlockup_check()
Doug Anderson
dianders at chromium.org
Sat May 20 03:21:50 AEST 2023
Hi,
On Thu, May 11, 2023 at 7:14 AM Petr Mladek <pmladek at suse.com> wrote:
>
> On Thu 2023-05-04 15:13:41, Douglas Anderson wrote:
> > In preparation for the buddy hardlockup detector where the CPU
> > checking for lockup might not be the currently running CPU, add a
> > "cpu" parameter to watchdog_hardlockup_check().
> >
> > --- a/kernel/watchdog.c
> > +++ b/kernel/watchdog.c
> > @@ -92,14 +92,14 @@ static DEFINE_PER_CPU(unsigned long, hrtimer_interrupts_saved);
> > static DEFINE_PER_CPU(bool, watchdog_hardlockup_processed);
> > static unsigned long watchdog_hardlockup_dumped_stacks;
> >
> > -static bool watchdog_hardlockup_is_lockedup(void)
> > +static bool watchdog_hardlockup_is_lockedup(unsigned int cpu)
> > {
> > - unsigned long hrint = __this_cpu_read(hrtimer_interrupts);
> > + unsigned long hrint = per_cpu(hrtimer_interrupts, cpu);
>
> My radar tells me that this should be
> READ_ONCE(per_cpu(hrtimer_interrupts, cpu)) when the value might
> be modified on another CPU. Otherwise, the compiler is allowed
> to split the read into more instructions.
>
> It will be needed for the buddy detector. And it will require
> also incrementing the value in watchdog_hardlockup_interrupt_count()
> an atomic way.
>
> Note that __this_cpu_inc_return() does not guarantee atomicity
> according to my understanding. In theory, the following should
> work because counter will never be incremented in parallel:
>
> static unsigned long watchdog_hardlockup_interrupt_count(void)
> {
> unsigned long count;
>
> count = __this_cpu_read(hrtimer_interrupts);
> count++;
> WRITE_ONCE(*raw_cpu_ptr(hrtimer_interrupts), count);
> }
>
> but it is nasty. A more elegant solution might be using atomic_t
> for hrtimer_interrupts counter.
I switched it over to atomic_t.
> > - if (__this_cpu_read(hrtimer_interrupts_saved) == hrint)
> > + if (per_cpu(hrtimer_interrupts_saved, cpu) == hrint)
> > return true;
> >
> > - __this_cpu_write(hrtimer_interrupts_saved, hrint);
> > + per_cpu(hrtimer_interrupts_saved, cpu) = hrint;
>
> IMHO, hrtimer_interrupts_saved might be handled this way.
> The value is read/written only by this function.
>
> The buddy watchdog should see consistent values even when
> the buddy CPU goes offline. This check should never race
> because this CPU should get touched when another buddy
> gets assigned.
>
> Well, it would deserve a comment.
I spent a bunch of time thinking about this too and I agree that for
hrtimer_interrupts_saved we don't need atomic_t nor even
READ_ONCE/WRITE_ONCE. I've add a comment and a note in the commit
message in v5.
More information about the Linuxppc-dev
mailing list