Boot flakiness with QEMU 3.1.0 and Clang built kernels
Cédric Le Goater
clg at kaod.org
Sun Apr 12 22:03:01 AEST 2020
On 4/11/20 3:57 PM, Nicholas Piggin wrote:
> Nicholas Piggin's on April 11, 2020 7:32 pm:
>> Nathan Chancellor's on April 11, 2020 10:53 am:
>>> The tt.config values are needed to reproduce but I did not verify that
>>> ONLY tt.config was needed. Other than that, no, we are just building
>>> either pseries_defconfig or powernv_defconfig with those configs and
>>> letting it boot up with a simple initramfs, which prints the version
>>> string then shuts the machine down.
>>>
>>> Let me know if you need any more information, cheers!
>>
>> Okay I can reproduce it. Sometimes it eventually recovers after a long
>> pause, and some keyboard input often helps it along. So that seems like
>> it might be a lost interrupt.
>>
>> POWER8 vs POWER9 might just be a timing thing if P9 is still hanging
>> sometimes. I wasn't able to reproduce it with defconfig+tt.config, I
>> needed your other config with various other debug options.
>>
>> Thanks for the very good report. I'll let you know what I find.
>
> It looks like a qemu bug. Booting with '-d int' shows the decrementer
> simply stops firing at the point of the hang, even though MSR[EE]=1 and
> the DEC register is wrapping. Linux appears to be doing the right thing
> as far as I can tell (not losing interrupts).
>
> This qemu patch fixes the boot hang for me. I don't know that qemu
> really has the right idea of "context synchronizing" as defined in the
> powerpc architecture -- mtmsrd L=1 is not context synchronizing but that
> does not mean it can avoid looking at exceptions until the next such
> event. It looks like the decrementer exception goes high but the
> execution of mtmsrd L=1 is ignoring it.
>
> Prior to the Linux patch 3282a3da25b you bisected to, interrupt replay
> code would return with an 'rfi' instruction as part of interrupt return,
> which probably helped to get things moving along a bit. However it would
> not be foolproof, and Cedric did say he encountered some mysterious
> lockups under load with qemu powernv before that patch was merged, so
> maybe it's the same issue?
Nope :/ but this is a fix for an important problem reported by Anton in
November. Attached is the test case.
Thanks,
C.
-------------- next part --------------
/*
Mikey and I noticed that the decrementer isn't firing when
it should. If a decrementer is pending and an mtmsrd(MSR_EE) is
executed then we should take the decrementer exception. From the PPC AS:
If MSR EE = 0 and an External, Decrementer, or Per-
formance Monitor exception is pending, executing
an mtmsrd instruction that sets MSR EE to 1 will
cause the interrupt to occur before the next instruc-
tion is executed, if no higher priority exception
exists
A test case is below. r31 is incremented for every decrementer
exception.
powerpc64le-linux-gcc -c test.S
powerpc64le-linux-ld -Ttext=0x0 -o test.elf test.o
powerpc64le-linux-objcopy -O binary test.elf test.bin
qemu-system-ppc64 -M powernv -cpu POWER9 -nographic -bios test.bin
"info registers" shows it looping in the lower loop, ie the
decrementer exception was never taken.
r31 never moves. If I build with:
powerpc64le-linux-gcc -DFIX_BROKEN -c test.S
I see r31 move.
*/
#include <ppc-asm.h>
/* Load an immediate 64-bit value into a register */
#define LOAD_IMM64(r, e) \
lis r,(e)@highest; \
ori r,r,(e)@higher; \
rldicr r,r, 32, 31; \
oris r,r, (e)@h; \
ori r,r, (e)@l;
#define FIXUP_ENDIAN \
tdi 0,0,0x48; /* Reverse endian of b . + 8 */ \
b 191f; /* Skip trampoline if endian is good */ \
.long 0xa600607d; /* mfmsr r11 */ \
.long 0x01006b69; /* xori r11,r11,1 */ \
.long 0x05009f42; /* bcl 20,31,$+4 */ \
.long 0xa602487d; /* mflr r10 */ \
.long 0x14004a39; /* addi r10,r10,20 */ \
.long 0xa64b5a7d; /* mthsrr0 r10 */ \
.long 0xa64b7b7d; /* mthsrr1 r11 */ \
.long 0x2402004c; /* hrfid */ \
191:
.= 0x0
.globl _start
_start:
b 1f
.= 0x10
FIXUP_ENDIAN
b 1f
.= 0x100
1:
FIXUP_ENDIAN
b __initialize
#define EXCEPTION(nr) \
.= nr ;\
b .
/* More exception stubs */
EXCEPTION(0x300)
EXCEPTION(0x380)
EXCEPTION(0x400)
EXCEPTION(0x480)
EXCEPTION(0x500)
EXCEPTION(0x600)
EXCEPTION(0x700)
EXCEPTION(0x800)
.= 0x900
LOAD_IMM64(r0, 0x1000000)
mtdec r0
addi r31,r31,1
rfid
EXCEPTION(0x980)
EXCEPTION(0xa00)
EXCEPTION(0xb00)
EXCEPTION(0xc00)
EXCEPTION(0xd00)
EXCEPTION(0xe00)
EXCEPTION(0xe20)
EXCEPTION(0xe40)
EXCEPTION(0xe60)
EXCEPTION(0xe80)
EXCEPTION(0xf00)
EXCEPTION(0xf20)
EXCEPTION(0xf40)
EXCEPTION(0xf60)
EXCEPTION(0xf80)
EXCEPTION(0x1000)
EXCEPTION(0x1100)
EXCEPTION(0x1200)
EXCEPTION(0x1300)
EXCEPTION(0x1400)
EXCEPTION(0x1500)
EXCEPTION(0x1600)
__initialize:
/* SF, HV, EE, RI, LE */
LOAD_IMM64(r0, 0x9000000000008003)
mtmsrd r0
/* HID0: HILE */
LOAD_IMM64(r0, 0x800000000000000)
mtspr 0x3f0,r0
LOAD_IMM64(r0, 0x1000000)
mtdec r0
1: LOAD_IMM64(r30,0x8000)
mtmsrd r30,1
/* We should take the decrementer here */
#ifdef FIX_BROKEN
LOAD_IMM64(r29,0x100000000)
mtctr r29
2: bdnz 2b
#endif
li r30,0x0
mtmsrd r30,1
b 1b
More information about the Linuxppc-dev
mailing list