[OpenPower-Firmware] Generate a dump of the Linux kernel on host OS (P8)

Artem Senichev artemsen at gmail.com
Wed Feb 13 03:20:05 AEDT 2019


On Fri, Apr 13, 2018 at 01:56:17PM +1000, Nicholas Piggin wrote:
> On Fri, 13 Apr 2018 13:26:58 +1000
> Stewart Smith <stewart at linux.ibm.com> wrote:
>
> > Artem Senichev <artemsen at gmail.com> writes:
> > > I need the ability to generate a dump of the Linux kernel on host OS
> > > using a command from BMC.
> > >
> > > As far as I know, the dump is generated as part of NMI handler in the
> > > Linux Kernel.
> > > The NMI handler is configured at startup and depends on
> > > OPAL_SIGNAL_SYSTEM_RESET support flag:
> > > https://github.com/torvalds/linux/blob/master/arch/powerpc/platforms/powernv/smp.c#L391
> > >
> > > Currently, only P9 supports the system reset signal feature:
> > > https://github.com/open-power/skiboot/blob/master/core/direct-controls.c#L810
> > > But function p9_sreset_cpu() uses calls with branches depending on CPU
> > > gen, for example dctl_sreset() calls p8_sreset_thread() or
> > > p8_sreset_thread():
> > > https://github.com/open-power/skiboot/blob/master/core/direct-controls.c#L636
>
> The dump will be initiated when we get a crash or sreset. We can kick
> off a dump without using sreset. The benefits of sreset is that it can
> be generated from the BMC, and that the host CPUs can't block it if they
> have crashed with interrupts off.
>
> > >
> > > So, I have a couple of questions:
> > > 1. Is there exist an IPMI command which I can send to the OPAL to
> > > generate the dump?
> >
> > I don't think we have the IPMI NMI command hooked up to anything
> > currently, and I haven't really thought at all about how we should hook
> > that up.
>
> My thought is that we could use libpdbg to send the sreset to the host.
> If we could get ipmi wired up to use that for the nmi command, it should
> work.
>
> >
> > > 2. Is it possible to implement P8 support for OPAL_SIGNAL_SYSTEM_RESET
> > > feature?
> >
> > Alistair has a patch from a while ago that implements it using
> > instruction ramming: https://patchwork.ozlabs.org/patch/694794/
> >
> > This uses some scom register writes to force a cpu core to execute
> > instructions we tell it to (which is basically jump to 0x100).
> >
> > This patch will need a bit of cleanup/rebase to current skiboot, which
> > you could either give a go or we bribe Alistair to do it with a few
> > beers :)
> >
>
> We have just been talking about this a bit more. Ramming is a bit
> complex and has some restrictions. On P8 we can actually send a sreset,
> but the SRR1 register may end up being incorrect. This means we can not
> return from the interrupt and continue, but we should be able to go on
> to take a crash dump and restart the machine.
>
> Most of the P8 code is already there in skiboot to do this for fast
> reboot as an IPI with OPAL_SIGNAL_SYSTEM_RESET (core/direct-controls.c),
> and pdbg on the BMC has the sreset command.
>
> It would be a matter of putting wiring things up, testing them, seeing
> what breaks, and perhaps adding some P8 specific workarounds to Linux
> system reset handler (e.g., to recognize that SRR1 is corrupted).
>
> Thanks,
> Nick

Let me return to the subject.

We have a couple of patches:
* for pdbg (support SRESET for P8): http://patchwork.ozlabs.org/patch/1038525/
* and for skiboot, just modification of several lines in file
  core/direct-controls.c:
  -       if (proc_gen != proc_gen_p9)
  +       if (proc_gen != proc_gen_p9 && proc_gen != proc_gen_p8)

This solution works fine in usual mode (with low host CPU load):
we send sreset from BMC (`pdbg -p 0 -c 1 -t 0 sreset`) and host's kernel
initiates kdump.

But if I load all host's cores up to 100% (`stress -c 256`) and send sreset,
nothing is happened - no crashdump, no logs from kernel, no OPAL logs -
even with enabled debug trace. I can stop stress execution with ctrl-c and
system let me continue to work. NMI seems to be just ignored. How is it
possible?


--
Regards,
Artem Senichev
Software Engineer, YADRO.


More information about the OpenPower-Firmware mailing list