[PATCH v6 21/29] x86/nmi: Add an NMI_WATCHDOG NMI handler category

Wed May 18 04:41:54 AEST 2022

On Mon, May 09, 2022 at 03:59:40PM +0200, Thomas Gleixner wrote:
> On Thu, May 05 2022 at 17:00, Ricardo Neri wrote:
> > Add a NMI_WATCHDOG as a new category of NMI handler. This new category
> > is to be used with the HPET-based hardlockup detector. This detector
> > does not have a direct way of checking if the HPET timer is the source of
> > the NMI. Instead, it indirectly estimates it using the time-stamp counter.
> >
> > Therefore, we may have false-positives in case another NMI occurs within
> > the estimated time window. For this reason, we want the handler of the
> > detector to be called after all the NMI_LOCAL handlers. A simple way
> > of achieving this with a new NMI handler category.
> >
> > @@ -379,6 +385,10 @@ static noinstr void default_do_nmi(struct pt_regs *regs)
> >  	}
> >  	raw_spin_unlock(&nmi_reason_lock);
> >  
> > +	handled = nmi_handle(NMI_WATCHDOG, regs);
> > +	if (handled == NMI_HANDLED)
> > +		goto out;
> > +
> 
> How is this supposed to work reliably?
> 
> If perf is active and the HPET NMI and the perf NMI come in around the
> same time, then nmi_handle(LOCAL) can swallow the NMI and the watchdog
> won't be checked. Because MSI is strictly edge and the message is only
> sent once, this can result in a stale watchdog, no?

This is true. Instead, at the end of each NMI I should _also_ check if the TSC
is within the expected value of the HPET NMI watchdog. In this way, unrelated
NMIs (e.g., perf NMI) are handled and we don't miss the NMI from the HPET
channel.

Thanks and BR,
Ricardo