[OpenPower-Firmware] Generate a dump of the Linux kernel on host OS (P8)

Stewart Smith stewart at linux.ibm.com
Wed Mar 6 17:47:47 AEDT 2019


Artem Senichev <artemsen at gmail.com> writes:
> On Tue, Feb 26, 2019 at 04:36:18PM +1000, Nicholas Piggin wrote:
>> Artem Senichev's on February 21, 2019 6:17 pm:
>> > On Wed, Feb 20, 2019 at 10:19:18PM +1000, Nicholas Piggin wrote:
>> >> Artem Senichev's on February 20, 2019 9:02 pm:
>> >> > On Tue, Feb 19, 2019 at 11:47:43PM +1000, Nicholas Piggin wrote:
>> >> >> Artem Senichev's on February 19, 2019 9:22 pm:
>> >> >> > On Fri, Apr 13, 2018 at 01:56:17PM +1000, Nicholas Piggin wrote:
>> >> >> >> > Artem Senichev <artemsen at gmail.com> writes:
>> >> >> >> > > I need the ability to generate a dump of the Linux kernel on host OS
>> >> >> >> > > using a command from BMC.
>> >> >> >> 
>> >> >> >> The dump will be initiated when we get a crash or sreset. We can kick
>> >> >> >> off a dump without using sreset. The benefits of sreset is that it can
>> >> >> >> be generated from the BMC, and that the host CPUs can't block it if they
>> >> >> >> have crashed with interrupts off.
>> >> >> >> 
>> >> >> >> My thought is that we could use libpdbg to send the sreset to the host.
>> >> >> >> If we could get ipmi wired up to use that for the nmi command, it should
>> >> >> >> work.
>> >> >> >> 
>> >> >> >> We have just been talking about this a bit more. Ramming is a bit
>> >> >> >> complex and has some restrictions. On P8 we can actually send a sreset,
>> >> >> >> but the SRR1 register may end up being incorrect. This means we can not
>> >> >> >> return from the interrupt and continue, but we should be able to go on
>> >> >> >> to take a crash dump and restart the machine.
>> >> >> >> 
>> >> >> >> Most of the P8 code is already there in skiboot to do this for fast
>> >> >> >> reboot as an IPI with OPAL_SIGNAL_SYSTEM_RESET (core/direct-controls.c),
>> >> >> >> and pdbg on the BMC has the sreset command.
>> >> >> > 
>> >> >> > Yes, in fact we don't need any patches for skiboot to get the NMI/SRESET
>> >> >> > functionality. Existing code works fine in most cases and handles
>> >> >> > SRESET signal correctly.
>> >> >> > 
>> >> >> > The entire solution includes only one patch for PDBG, that allows us to
>> >> >> > send SRESET signal from OpenBMC console:
>> >> >> > http://patchwork.ozlabs.org/patch/1038525/
>> >> >> > 
>> >> >> > The only problem I have is the case when I load the CPU's thread that should
>> >> >> > handle SRESET signal. If I understand right, we should send SRESET to one only
>> >> >> > thread on host's CPU.
>> >> >> 
>> >> >> Linux can deal with one or more threads taking sreset. You should sreset
>> >> >> all, because if Linux does not see all threads getting sreset, it will 
>> >> >> use IPIs to bring the remaining threads in. If you are going to use P8
>> >> >> with no skiboot patch, then Linux will have no NMI IPI.
>> >> >> 
>> >> > 
>> >> > I tried to send SRESET to all threads (with '-a' option of pdbg),
>> >> > in this case I get a lot of kernel messages about system reset, one message
>> >> > per logical CPU:
>> >> > 
>> >> > cpu 0x47: Vector: 100 (System Reset) at [c000003fcac4fbd8]
>> >> > ...
>> >> > 
>> >> > but it stops working after that, kernel just hangs. Also, the last
>> >> > message says that the last CPU that received sreset is 71 (0x47),
>> >> > but I have 256 logical CPU in the system.
>> >> 
>> >> Okay. It's not supposed to of course, and guest kernels under hypervisor 
>> >> (PowerVM or KVM) get a 0x100 interrupt on every CPU when the HV gives a 
>> >> crash or NMI signal.
>> >> 
>> >> Is this happening with an upstream kernel? Not running KVM?
>> > 
>> > It's a PowerNV machine, without KVM. I test the solution with vanilla
>> > linux kernel 5.0-rc7.
>> > 
>> >> > 
>> >> >> > signal.
>> >> >> > Step to reproduce:
>> >> >> > 1. On the host's side: call `stress` for the first thread of CPU0:
>> >> >> >    # taskset 01 stress -c 1
>> >> >> > 2. From OpenBMC: send SRESET signal for the first host's thread:
>> >> >> >    # pdbg --backend=i2c --device=/dev/i2c-4 -p 0 -c 1 -t 0 sreset
>> >> >> > In this scenario, as a result, SRESET signal is ignored, there are no any
>> >> >> > messages in OPAL's or kernel's logs. I can just stop `stress` execution by
>> >> >> > Ctrl-C and the system continues to work as usual. After that, I can resend
>> >> >> > SRESET and everything works as expected: kernel starts 'System Reset' signal
>> >> >> > handler and initiates reload kernel to perform memory dump creation.
>> >> >> 
>> >> >> You may need to stop the thread first with pdbg. P9 requires that I 
>> >> >> think. Some documentation indicates it works without stopping first,
>> >> >> but I don't think that's the case. P8 may be similar.
>> >> >> 
>> >> >> The stop sequence in pdbg for P8 does not exactly match the workbook 
>> >> >> either, by the looks. It doesn't check for maint mode, it does some
>> >> >> funny thing for RAM mode at the end, etc. If it does not work
>> >> >> properly for sreset then it would be worth experimenting with that
>> >> >> (I would try take out the last bit of code from p8_thread_stop() that
>> >> >> sets the thread active).
>> >> >> 
>> >> > 
>> >> > Nick, what do you mean, "stop the thread"?
>> >> > Is it something like Alister suggested to do in the patch
>> >> > "core/fast-reboot.c: Add sreset opal call":
>> >> > https://patchwork.ozlabs.org/patch/694794/
>> >> > By ramming an instruction sequence into an active thread?
>> >> 
>> >> No, I meant stop with pdbg.
>> > 
>> > That trick doesn't work, if I send sreset to the stopped thread, the signal
>> > is not handled.
>> 
>> Maybe try removing this from p8_thread_stop
>> 
>>         /* Make the threads RAM thread active */
>>         CHECK_ERR(pib_read(&chip->target, THREAD_ACTIVE_REG, &val));
>>         val |= PPC_BIT(8) >> thread->id;
>>         CHECK_ERR(pib_write(&chip->target, THREAD_ACTIVE_REG, val));
>> 
>> Also try setting the thread to "prenap", see 
>> skiboot/core/direct-controls.c.
>
> I've tried all combinations, the only worked one is simple SRESET sending,
> even without stopping the thread.
>
> Here are the results (with 'prenap' and removed mentioned code from
> p8_thread_stop):
>
> 1. 0% CPU load, send sreset without stopping the thread: everything work as
>    expected, I get "cpu 0x0: Vector: 100 (System Reset)";
>
> 2. 0% CPU load, stop the thread and send sreset: signal is ignored by the
>    kernel;
>
> 3. 100% CPU load, send sreset without stopping the thread: SRESET signal is
>    ignored, the usermode application (stress that loads the CPU) failed:
>    "cpu 0x0: Vector: 400 (Instruction Access)";
>
> 4. 100% CPU load, stop the thread and send sreset: same as previous, but with
>    another exception: "cpu 0x0: Vector: e40 (Emulation Assist)".
>
> If I stop all threads and send sreset - system (kernel) hangs.
>
>> The P8 sreset code there for fast reboots is relatively well tested.
>
> Sorry, I can't agree to that. Fast reboot on P8 was broken since Linux
> kernel 4.17:
> https://github.com/open-power/skiboot/issues/185

So, in trying to delve into this, I went and modified op-test to do
something that I suspected could be similar (or at least I'd like to
rule out): https://github.com/open-power/op-test-framework/pull/437

So, the good news is I've managed to reproduce the problem in a
relatively simple setting.

The bad news is that I've also made it go away without changing anything
that should affect it... So... umm... fun times :)

I'll keep digging and keep places updated

-- 
Stewart Smith
OPAL Architect, IBM.



More information about the OpenPower-Firmware mailing list