[PATCH 2/2] powernv/kdump: Fix cases where the kdump kernel can get HMI's

Balbir Singh bsingharora at gmail.com
Thu Dec 14 11:12:13 AEDT 2017


On Wed, 13 Dec 2017 20:51:01 +1000
Nicholas Piggin <npiggin at gmail.com> wrote:

> This is looking pretty nice now...
> 
> On Wed, 13 Dec 2017 19:08:28 +1100
> Balbir Singh <bsingharora at gmail.com> wrote:
> 
> > @@ -543,7 +543,25 @@ void smp_send_debugger_break(void)
> >  #ifdef CONFIG_KEXEC_CORE
> >  void crash_send_ipi(void (*crash_ipi_callback)(struct pt_regs *))
> >  {
> > +	int cpu;
> > +
> >  	smp_send_nmi_ipi(NMI_IPI_ALL_OTHERS, crash_ipi_callback, 1000000);
> > +	if (kdump_in_progress() && crash_wake_offline) {
> > +		for_each_present_cpu(cpu) {
> > +			if (cpu_online(cpu))
> > +				continue;
> > +			/*
> > +			 * crash_ipi_callback will wait for
> > +			 * all cpus, including offline CPUs.
> > +			 * We don't care about nmi_ipi_function.
> > +			 * Offline cpus will jump straight into
> > +			 * crash_ipi_callback, we can skip the
> > +			 * entire NMI dance and waiting for
> > +			 * cpus to clear pending mask, etc.
> > +			 */
> > +			do_smp_send_nmi_ipi(cpu);  
> 
> Still a little bit concerned about using NMI IPI for this.
>

OK -- for offline CPUs you mean?

> If you take an NMI IPI from stop, the idle code should do the
> right thing and we would just return the system reset wakeup
> reason in SRR1 here (which does not need to be cleared).
> 
> If you take the system reset anywhere else in the loop, it's
> going to go out via system_reset_exception. I guess that
> would end up doing the right thing, it probably gets to
> crash_ipi_callback from crash_kexec_secondary?

You mean like if we are online at the time of NMI'ing? If so
the original loop will NMI us back into crash_ipi_callback
anyway. We don't expect this to occur for offline CPUs

> 
> It's just going to be a very untested code path :( What we
> gain I suppose is better ability to handle a CPU that's locked
> up somewhere in the cpu offline path. Assuming the uncommon
> case works...
> 
> Actually, if you *always* go via the system reset exception
> handler, then code paths will be shared. That might be the
> way to go. So I would check for system reset wakeup SRR1 reason
> and call replay_system_reset() for it. What do you think?
> 

We could do that, but that would call pnv_system_reset_exception
and try to call the NMI function, but we've not used that path
to initiate the NMI, so it should call the stale nmi_ipi_function
which is crash_ipi_callback and not go via the crash_kexec path.


I can't call smp_send_nmi_ipi due to the nmi_ipi_busy_count and
I'm worried about calling a stale nmi_ipi_function via the
system_reset_exception path, if we are OK with it, I can revisit
the code path

Thanks,
Balbir Singh.


More information about the Linuxppc-dev mailing list