a3b2cb30 broken panic reporting for qemu guests

Michael Ellerman mpe at ellerman.id.au
Fri Dec 1 22:11:50 AEDT 2017


David Gibson <david at gibson.dropbear.id.au> writes:

> On Wed, Nov 29, 2017 at 02:23:43PM +1000, Nicholas Piggin wrote:
>> On Wed, 29 Nov 2017 15:06:52 +1100
>> David Gibson <david at gibson.dropbear.id.au> wrote:
>> 
>> > a3b2cb30 "powerpc: Do not call ppc_md.panic in fadump panic notifier"
>> > purports to fix a problem when the kernel panics with fadump not
>> > registered, but it breaks something else instead.  I _think_ it was
>> > working on the incorrect assumption that ppc_md.panic was (or should
>> > be) only used with fadump, but I'm not really sure.
>> > 
>> > Panic works with kdump enabled, and (I think) with fadump enabled).
>> > However, with neither of these enabled, we always go to the generic
>> > panic logic.
>> 
>> Yeah thanks, I can't remember what assumption I was working on tbh.
>>  
>> > That's incorrect for PAPR guests - they should call ibm,os-term via
>> > RTAS.  Under qemu this leads to a "GUEST_PANICKED" event notification
>> > which higher-level management pays attention to.  Since a3b2cb30 we
>> > now reboot instead of reporting that.
>> > 
>> > I believe it will also break panic for PS3 machines, but since that
>> > platform basically no longer exists, we probably don't care.
>> 
>> I (hope) it should just go down to the normal panic path and not do
>> much worse than it already does -- although it won't print out that
>> message.
>> 
>> > I'm not entirely sure how to fix this.  I _think_ what we want is to
>> > call ppc_md.panic from a late panic notifier, the way this patch does
>> > for fadump_panic_event() if fadump is registered.
>> 
>> The problem I had there is that some of the printk and console stuff
>> wasn't getting flushed out, so I was getting a blank screen. This was
>> probably in conjunction with panicing from NMI context that we're now
>> starting to introduce.
>> 
>> So it's a bit annoying. There's other ugliness we have for being unable
>> to control panic code well enough from arch code
>> (arch/powerpc/platforms/powernv/opal.c)
>> 
>> I guess a really minimal fix is to put an #ifdef powerpc down the bottom
>> there (/me *cries*).
>
> Um.. right.  I'm not really sure from that how to go forward from
> here.  We want to fix this for RHEL7.5, which doesn't give us a lot of
> time.
>
> Adding the #ifdef at the bottom of the generic panic code is gross,
> but there's already a bunch of that, so maybe adequate until a better
> solution can be found?

I think you mean put an #ifdef at the bottom of panic(). If so that
won't work. Our default panic_timeout is 180 so we never get to the
bottom of panic(), we call emergency_restart().

You *could* put an #ifdef powerpc before that, but that's even more
gross because it's in a different place to the sparc/s390 #ifdefs.

I notice we don't implement machine_emergency_restart(), it just
becomes machine_restart(NULL).

So it seems like that's the place we should be hooking,
machine_emergency_restart(). That's what x86 does.

But panic() is not the only caller of emergency_restart(), so it's not
an entirely straight forward conversion.

So I think for 4.15 and 4.14 I'm inclined to revert. Then we can do a
bigger rework for 4.16.

Nick that shouldn't break your original aim too badly I think? ie. worst
case is we panic() but don't see the output if we came from NMI?

cheers


More information about the Linuxppc-dev mailing list