[OpenPower-Firmware] Generate a dump of the Linux kernel on host OS (P8)

Artem Senichev artemsen at gmail.com
Tue Feb 19 22:22:23 AEDT 2019


On Fri, Apr 13, 2018 at 01:56:17PM +1000, Nicholas Piggin wrote:
> > Artem Senichev <artemsen at gmail.com> writes:
> > > I need the ability to generate a dump of the Linux kernel on host OS
> > > using a command from BMC.
> 
> The dump will be initiated when we get a crash or sreset. We can kick
> off a dump without using sreset. The benefits of sreset is that it can
> be generated from the BMC, and that the host CPUs can't block it if they
> have crashed with interrupts off.
> 
> My thought is that we could use libpdbg to send the sreset to the host.
> If we could get ipmi wired up to use that for the nmi command, it should
> work.
> 
> We have just been talking about this a bit more. Ramming is a bit
> complex and has some restrictions. On P8 we can actually send a sreset,
> but the SRR1 register may end up being incorrect. This means we can not
> return from the interrupt and continue, but we should be able to go on
> to take a crash dump and restart the machine.
> 
> Most of the P8 code is already there in skiboot to do this for fast
> reboot as an IPI with OPAL_SIGNAL_SYSTEM_RESET (core/direct-controls.c),
> and pdbg on the BMC has the sreset command.

Yes, in fact we don't need any patches for skiboot to get the NMI/SRESET
functionality. Existing code works fine in most cases and handles
SRESET signal correctly.

The entire solution includes only one patch for PDBG, that allows us to
send SRESET signal from OpenBMC console:
http://patchwork.ozlabs.org/patch/1038525/

The only problem I have is the case when I load the CPU's thread that should
handle SRESET signal. If I understand right, we should send SRESET to one only
thread on host's CPU. But if that thread takes 100% CPU, it can't handle the
signal.
Step to reproduce:
1. On the host's side: call `stress` for the first thread of CPU0:
   # taskset 01 stress -c 1
2. From OpenBMC: send SRESET signal for the first host's thread:
   # pdbg --backend=i2c --device=/dev/i2c-4 -p 0 -c 1 -t 0 sreset
In this scenario, as a result, SRESET signal is ignored, there are no any
messages in OPAL's or kernel's logs. I can just stop `stress` execution by
Ctrl-C and the system continues to work as usual. After that, I can resend
SRESET and everything works as expected: kernel starts 'System Reset' signal
handler and initiates reload kernel to perform memory dump creation.

Test environment:
VESNIN server, 4 x POWER8 CPU (8-cores).
OpenPOWER firmware (built from https://github.com/open-power/op-build,
vesnin_defconfig):
open-power-vesnin-v2.2-rc1-85-g7b5f1ef
	buildroot-2018.11.1-7-g5d7cc8c
	skiboot-v6.2
	hostboot-p8-a3b0cb9-p4cc4a16
	occ-p8-28f2cec-p1729bf8
	linux-4.19.13-openpower1-pcf1113c
	petitboot-1.10.0
	machine-xml-c994a18
	hostboot-binaries-hw012919a.930
	capp-ucode-p9-dd2-v4

Host OS: Ubuntu 18.04 (linux kernel 4.15), but the problem reproduces on linux
kernel 5.0-rc6 as well.

Nick, Stewart,
would you help us to solve the issue?

-- 
Regards,
Artem Senichev
Software Engineer, YADRO.


More information about the OpenPower-Firmware mailing list