[Skiboot] [PATCH] external/mambo: add helper for machine checks
Nicholas Piggin
npiggin at gmail.com
Mon May 1 14:13:32 AEST 2017
On Mon, 01 May 2017 14:02:17 +1000
Stewart Smith <stewart at linux.vnet.ibm.com> wrote:
> Nicholas Piggin <npiggin at gmail.com> writes:
> > Add helpers to construct machine checks with registers set up properly.
> > exc_mce raises a machine check exception that can be stepped into. This
> > is useful for testing the machine check handler.
> >
> > Also add a similar exc_sreset for system reset.
> >
> > inject_mce does the same but runs immediately and stops when the
> > instruction reaches the NIP (which can get tangled up if machine check
> > re-enters this code). This is useful for testing robustness to
> > interleaving machine checks.
> >
> > inject_mce_step allows injecting MCEs between each instruction and stepping
> > over them. inject_mce_step_ri does the same but only when MSR has RI set.
> > This can be useful to test correctness of low level code. For example,
> > testing system call vs machine check:
> >
> > systemsim % b 0xC000000000004c00
> > systemsim % c
> > 0xC000000000004C00 (0x0000000000004C00) Enc:0xA64BB17D : mtspr HSPRG1,r13
> > systemsim % inject_mce_step_ri 100
> > 0xC000000000004C04 (0x0000000000004C04) Enc:0xA64AB07D : mfspr r13,HSPRG0
> > 0xC000000000004C08 (0x0000000000004C08) Enc:0x80002DF9 : std r9,0x80(r13)
> > 0xC000000000004C0C (0x0000000000004C0C) Enc:0xA6E2207D : mfspr r9,PPR
> > 0xC000000000004C10 (0x0000000000004C10) Enc:0x7813427C : mr r2,r2
> > 0xC000000000004C14 (0x0000000000004C14) Enc:0x88004DF9 : std r10,0x88(r13)
> > 0xC000000000004C18 (0x0000000000004C18) Enc:0xD8002DF9 : std r9,0xD8(r13)
> > 0xC000000000004C1C (0x0000000000004C1C) Enc:0x2600207D : mfcr r9
> > 0xC000000000004C20 (0x0000000000004C20) Enc:0xE8074D89 : lbz r10,0x7E8(r13)
> > 0xC000000000004C24 (0x0000000000004C24) Enc:0x00000A2C : cmpwi cr0,r10,0
> > 0xC000000000004C28 (0x0000000000004C28) Enc:0xA80F8240 : bne cr0,$+0xFA8 (bc 0x4,0x2,0xFA8,0,0)
> > 0xC000000000004C2C (0x0000000000004C2C) Enc:0xA64AB17D : mfspr r13,HSPRG1
> > 0xC000000000004C30 (0x0000000000004C30) Enc:0xBE1E202C : cmpdi cr0,r0,7870
> > 0xC000000000004C34 (0x0000000000004C34) Enc:0x2000C241 : beq cr0,$+0x20 (bc 0xE,0x2,0x20,0,0)
> > 0xC000000000004C38 (0x0000000000004C38) Enc:0x786BA97D : mr r9,r13
> > 0xC000000000004C3C (0x0000000000004C3C) Enc:0xA64AB07D : mfspr r13,HSPRG0
> > 0xC000000000004C40 (0x0000000000004C40) Enc:0xA6027A7D : mfspr r11,SRR0
> > 0xC000000000004C44 (0x0000000000004C44) Enc:0xA6029B7D : mfspr r12,SRR1
> > 0xC000000000004C48 (0x0000000000004C48) Enc:0x02004039 : li r10,2
> > 0xC000000000004C4C (0x0000000000004C4C) Enc:0x6401417D : mtmsrd r10,1
> > 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b $+0x62B0
> > 236380163: (212143620): Disabling lock debugging due to kernel taint
> > 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b $+0x62B0
> > 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> > 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> > [...]
> >
> > Every instruction after 0xC000000000004C4C is getting an interleaving
> > MCE, and continuing after this injection the kernel prints a lot of MCE
> > reports and continues working properly.
> >
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
> > Hi,
> >
> > If anybody would find this useful or has a better way to do it, let me
> > know. This is a polished up and improved version of what I've been using
> > for testing.
> >
> > It should be noted that upstream mambo currently does not quite work
> > properly with this because of a quirk in how it injects MCE interrupts.
> > I was kind-of hacking around that in the script but took out that code
> > because the mambo developers will be fixing that or giving us an option
> > to change behaviour soon.
> >
> > Thanks,
> > Nick
>
> Thanks, merged to master as of 6e6d5417ec67797b1dc52c8e30e9b1b4cf64e74f
>
Thanks. BTW., if it wasn't mentioned, this did actually find a bug in Linux.
This may just need a couple of minor tweaks with the recent simulator
changes to exception injection that's been made for us, which is working
its way into the next sim release.
I think it should work, but I'll go through it again and re-test it all. I'll
also look at adding canned test cases for Linux that we can run as regression
tests so everything stays working.
Thanks,
Nick
More information about the Skiboot
mailing list