[rfc] powernv/kdump: Fix cases where the kdump kernel can get HMI's

Nicholas Piggin npiggin at gmail.com
Mon Dec 4 14:10:33 AEDT 2017


On Mon, 4 Dec 2017 11:37:01 +1100
Balbir Singh <bsingharora at gmail.com> wrote:

> On Sun, Dec 3, 2017 at 1:36 PM, Nicholas Piggin <npiggin at gmail.com> wrote:
> > Seems like a reasonable approach. Why do we only do this for
> > powernv? It seems like a good idea in general to pull all
> > offlined CPUs out and into the same state for all platforms
> > and for all shutdown/restart/crash paths.
> >  
> 
> The reason is largely wake-up related, do we expect offline CPUs to wake
> up in the kdump kernel. Largely the infrastructure allows us to selectively
> decide what platforms need this support. I did not want to break the world
> by enabling it across platforms (pseries for example) without good reason.

What happens if a pseries offlined CPU gets an exception for some reason
though? It seems like it would return into pseries_mach_cpu_die of the
old kernel which will go wrong.

Maybe the platform has stronger guarantees that it won't wake up there,
like requiring a specific hcall or something?

I was just thinking trying to move all platforms in general to the same
scheme would be preferable, unless there is a good reason not to. Just
for sharing code and behaviour.

> 
> > Also I wonder if there is anything we should do on the other
> > side of the equation for the kdump kernel to pull CPUs into a
> > known state rather than rely on the crash kernel to do it for
> > us. We might have a better ability to do that with system
> > reset IPIs now.
> >  
> 
> Yes, but do we need to do that or quickly dump the vmcore to a file
> and exit?

Well if the previous kernel did not shut them down properly, we need
to do that. Don't we? My point is the previous kernel crashed somehow,
we should be trying to fix everything up rather than hoping it crashed
"nicely" for us.

Yes we shouldn't disturb things as much as possible, but we've booted
an entire new kernel in its own reserved memory, so I'm not sure if
it's such a concern to try fixing up wayward CPUs.

Thanks,
Nick


More information about the Linuxppc-dev mailing list