[OpenPower-Firmware] Generate a dump of the Linux kernel on host OS (P8)

Artem Senichev artemsen at gmail.com
Wed Feb 27 03:36:06 AEDT 2019


On Tue, Feb 26, 2019 at 04:36:18PM +1000, Nicholas Piggin wrote:
> Artem Senichev's on February 21, 2019 6:17 pm:
> > On Wed, Feb 20, 2019 at 10:19:18PM +1000, Nicholas Piggin wrote:
> >> Artem Senichev's on February 20, 2019 9:02 pm:
> >> > On Tue, Feb 19, 2019 at 11:47:43PM +1000, Nicholas Piggin wrote:
> >> >> Artem Senichev's on February 19, 2019 9:22 pm:
> >> >> > On Fri, Apr 13, 2018 at 01:56:17PM +1000, Nicholas Piggin wrote:
> >> >> >> > Artem Senichev <artemsen at gmail.com> writes:
> >> >> >> > > I need the ability to generate a dump of the Linux kernel on host OS
> >> >> >> > > using a command from BMC.
> >> >> >> 
> >> >> >> The dump will be initiated when we get a crash or sreset. We can kick
> >> >> >> off a dump without using sreset. The benefits of sreset is that it can
> >> >> >> be generated from the BMC, and that the host CPUs can't block it if they
> >> >> >> have crashed with interrupts off.
> >> >> >> 
> >> >> >> My thought is that we could use libpdbg to send the sreset to the host.
> >> >> >> If we could get ipmi wired up to use that for the nmi command, it should
> >> >> >> work.
> >> >> >> 
> >> >> >> We have just been talking about this a bit more. Ramming is a bit
> >> >> >> complex and has some restrictions. On P8 we can actually send a sreset,
> >> >> >> but the SRR1 register may end up being incorrect. This means we can not
> >> >> >> return from the interrupt and continue, but we should be able to go on
> >> >> >> to take a crash dump and restart the machine.
> >> >> >> 
> >> >> >> Most of the P8 code is already there in skiboot to do this for fast
> >> >> >> reboot as an IPI with OPAL_SIGNAL_SYSTEM_RESET (core/direct-controls.c),
> >> >> >> and pdbg on the BMC has the sreset command.
> >> >> > 
> >> >> > Yes, in fact we don't need any patches for skiboot to get the NMI/SRESET
> >> >> > functionality. Existing code works fine in most cases and handles
> >> >> > SRESET signal correctly.
> >> >> > 
> >> >> > The entire solution includes only one patch for PDBG, that allows us to
> >> >> > send SRESET signal from OpenBMC console:
> >> >> > http://patchwork.ozlabs.org/patch/1038525/
> >> >> > 
> >> >> > The only problem I have is the case when I load the CPU's thread that should
> >> >> > handle SRESET signal. If I understand right, we should send SRESET to one only
> >> >> > thread on host's CPU.
> >> >> 
> >> >> Linux can deal with one or more threads taking sreset. You should sreset
> >> >> all, because if Linux does not see all threads getting sreset, it will 
> >> >> use IPIs to bring the remaining threads in. If you are going to use P8
> >> >> with no skiboot patch, then Linux will have no NMI IPI.
> >> >> 
> >> > 
> >> > I tried to send SRESET to all threads (with '-a' option of pdbg),
> >> > in this case I get a lot of kernel messages about system reset, one message
> >> > per logical CPU:
> >> > 
> >> > cpu 0x47: Vector: 100 (System Reset) at [c000003fcac4fbd8]
> >> > ...
> >> > 
> >> > but it stops working after that, kernel just hangs. Also, the last
> >> > message says that the last CPU that received sreset is 71 (0x47),
> >> > but I have 256 logical CPU in the system.
> >> 
> >> Okay. It's not supposed to of course, and guest kernels under hypervisor 
> >> (PowerVM or KVM) get a 0x100 interrupt on every CPU when the HV gives a 
> >> crash or NMI signal.
> >> 
> >> Is this happening with an upstream kernel? Not running KVM?
> > 
> > It's a PowerNV machine, without KVM. I test the solution with vanilla
> > linux kernel 5.0-rc7.
> > 
> >> > 
> >> >> > signal.
> >> >> > Step to reproduce:
> >> >> > 1. On the host's side: call `stress` for the first thread of CPU0:
> >> >> >    # taskset 01 stress -c 1
> >> >> > 2. From OpenBMC: send SRESET signal for the first host's thread:
> >> >> >    # pdbg --backend=i2c --device=/dev/i2c-4 -p 0 -c 1 -t 0 sreset
> >> >> > In this scenario, as a result, SRESET signal is ignored, there are no any
> >> >> > messages in OPAL's or kernel's logs. I can just stop `stress` execution by
> >> >> > Ctrl-C and the system continues to work as usual. After that, I can resend
> >> >> > SRESET and everything works as expected: kernel starts 'System Reset' signal
> >> >> > handler and initiates reload kernel to perform memory dump creation.
> >> >> 
> >> >> You may need to stop the thread first with pdbg. P9 requires that I 
> >> >> think. Some documentation indicates it works without stopping first,
> >> >> but I don't think that's the case. P8 may be similar.
> >> >> 
> >> >> The stop sequence in pdbg for P8 does not exactly match the workbook 
> >> >> either, by the looks. It doesn't check for maint mode, it does some
> >> >> funny thing for RAM mode at the end, etc. If it does not work
> >> >> properly for sreset then it would be worth experimenting with that
> >> >> (I would try take out the last bit of code from p8_thread_stop() that
> >> >> sets the thread active).
> >> >> 
> >> > 
> >> > Nick, what do you mean, "stop the thread"?
> >> > Is it something like Alister suggested to do in the patch
> >> > "core/fast-reboot.c: Add sreset opal call":
> >> > https://patchwork.ozlabs.org/patch/694794/
> >> > By ramming an instruction sequence into an active thread?
> >> 
> >> No, I meant stop with pdbg.
> > 
> > That trick doesn't work, if I send sreset to the stopped thread, the signal
> > is not handled.
> 
> Maybe try removing this from p8_thread_stop
> 
>         /* Make the threads RAM thread active */
>         CHECK_ERR(pib_read(&chip->target, THREAD_ACTIVE_REG, &val));
>         val |= PPC_BIT(8) >> thread->id;
>         CHECK_ERR(pib_write(&chip->target, THREAD_ACTIVE_REG, val));
> 
> Also try setting the thread to "prenap", see 
> skiboot/core/direct-controls.c.

I've tried all combinations, the only worked one is simple SRESET sending,
even without stopping the thread.

Here are the results (with 'prenap' and removed mentioned code from
p8_thread_stop):

1. 0% CPU load, send sreset without stopping the thread: everything work as
   expected, I get "cpu 0x0: Vector: 100 (System Reset)";

2. 0% CPU load, stop the thread and send sreset: signal is ignored by the
   kernel;

3. 100% CPU load, send sreset without stopping the thread: SRESET signal is
   ignored, the usermode application (stress that loads the CPU) failed:
   "cpu 0x0: Vector: 400 (Instruction Access)";

4. 100% CPU load, stop the thread and send sreset: same as previous, but with
   another exception: "cpu 0x0: Vector: e40 (Emulation Assist)".

If I stop all threads and send sreset - system (kernel) hangs.

> The P8 sreset code there for fast reboots is relatively well tested.

Sorry, I can't agree to that. Fast reboot on P8 was broken since Linux
kernel 4.17:
https://github.com/open-power/skiboot/issues/185

> 
> > 
> >> > Because if I stop thread from pdbg (with 'stop' command), the SRESET
> >> > signal doesn't handle by host, it has same effect as using `stress`.
> >> 
> >> I'm not sure what stress is. Does nothing appear to happen? It could be
> >> due to the ramming thing in the p8 stop sequence in pdbg.
> > 
> > `stress` is a small utility to perform stress tests (load CPUs in
> > my case):
> > http://manpages.ubuntu.com/manpages/cosmic/man1/stress-ng.1.html
> > 
> > If I load the first core up to 80%, everything works fine. 90% - it
> > works time to time. 100% load - it doesn't work at all.
> > Anyway, how is it possible that NMI is ignored by kernel?
> 
> NMI can be "ignored" by the kernel if it hits when in a nap state and 
> wakes up without the correct SRR1 reason bit set. Kernel can then wake 
> up but not do anything with the sreset.
> 
> I don't think that sounds like the case here, more like the scom
> sequences are not correct. It can be quite strange behaviour if you 
> don't have everything exactly right, espeically when you add power 
> saving to the mix.
> 
> > Is it possible to get an acknowledgment on BMC side, that NMI has been sent
> > or handled?
> 
> I'm not aware of anything like a synchronous notification, but if you 
> STOP a thread, confirm it to be stopped, and then SRESET it and see that 
> it is no longer stopped, then you can be quite sure the sreset has been 
> delivered.

-- 
Regards,
Artem Senichev
Software Engineer, YADRO.


More information about the OpenPower-Firmware mailing list