[Skiboot] [PATCH] external/mambo: add helper for machine checks

Nicholas Piggin npiggin at gmail.com
Mon May 1 14:13:32 AEST 2017


On Mon, 01 May 2017 14:02:17 +1000
Stewart Smith <stewart at linux.vnet.ibm.com> wrote:

> Nicholas Piggin <npiggin at gmail.com> writes:
> > Add helpers to construct machine checks with registers set up properly.
> > exc_mce raises a machine check exception that can be stepped into. This
> > is useful for testing the machine check handler.
> >
> > Also add a similar exc_sreset for system reset.
> >
> > inject_mce does the same but runs immediately and stops when the
> > instruction reaches the NIP (which can get tangled up if machine check
> > re-enters this code). This is useful for testing robustness to
> > interleaving machine checks.
> >
> > inject_mce_step allows injecting MCEs between each instruction and stepping
> > over them. inject_mce_step_ri does the same but only when MSR has RI set.
> > This can be useful to test correctness of low level code. For example,
> > testing system call vs machine check:
> >
> > systemsim % b 0xC000000000004c00
> > systemsim % c
> > 0xC000000000004C00 (0x0000000000004C00) Enc:0xA64BB17D : mtspr   HSPRG1,r13
> > systemsim % inject_mce_step_ri 100
> > 0xC000000000004C04 (0x0000000000004C04) Enc:0xA64AB07D : mfspr   r13,HSPRG0
> > 0xC000000000004C08 (0x0000000000004C08) Enc:0x80002DF9 : std     r9,0x80(r13)
> > 0xC000000000004C0C (0x0000000000004C0C) Enc:0xA6E2207D : mfspr   r9,PPR
> > 0xC000000000004C10 (0x0000000000004C10) Enc:0x7813427C : mr      r2,r2
> > 0xC000000000004C14 (0x0000000000004C14) Enc:0x88004DF9 : std     r10,0x88(r13)
> > 0xC000000000004C18 (0x0000000000004C18) Enc:0xD8002DF9 : std     r9,0xD8(r13)
> > 0xC000000000004C1C (0x0000000000004C1C) Enc:0x2600207D : mfcr    r9
> > 0xC000000000004C20 (0x0000000000004C20) Enc:0xE8074D89 : lbz     r10,0x7E8(r13)
> > 0xC000000000004C24 (0x0000000000004C24) Enc:0x00000A2C : cmpwi   cr0,r10,0
> > 0xC000000000004C28 (0x0000000000004C28) Enc:0xA80F8240 : bne     cr0,$+0xFA8  (bc 0x4,0x2,0xFA8,0,0)
> > 0xC000000000004C2C (0x0000000000004C2C) Enc:0xA64AB17D : mfspr   r13,HSPRG1
> > 0xC000000000004C30 (0x0000000000004C30) Enc:0xBE1E202C : cmpdi   cr0,r0,7870
> > 0xC000000000004C34 (0x0000000000004C34) Enc:0x2000C241 : beq     cr0,$+0x20  (bc 0xE,0x2,0x20,0,0)
> > 0xC000000000004C38 (0x0000000000004C38) Enc:0x786BA97D : mr      r9,r13
> > 0xC000000000004C3C (0x0000000000004C3C) Enc:0xA64AB07D : mfspr   r13,HSPRG0
> > 0xC000000000004C40 (0x0000000000004C40) Enc:0xA6027A7D : mfspr   r11,SRR0
> > 0xC000000000004C44 (0x0000000000004C44) Enc:0xA6029B7D : mfspr   r12,SRR1
> > 0xC000000000004C48 (0x0000000000004C48) Enc:0x02004039 : li      r10,2
> > 0xC000000000004C4C (0x0000000000004C4C) Enc:0x6401417D : mtmsrd  r10,1
> > 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
> > 236380163: (212143620): Disabling lock debugging due to kernel taint
> > 0xC000000000004C50 (0x0000000000004C50) Enc:0xB0620048 : b       $+0x62B0
> > 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> > 0xC00000000000AF00 (0x000000000000AF00) Enc:0xE1F78A79 : rldicl. r10,r12,30,63,63 (0x0000000000000001)
> > [...]
> >
> > Every instruction after 0xC000000000004C4C is getting an interleaving
> > MCE, and continuing after this injection the kernel prints a lot of MCE
> > reports and continues working properly.
> >
> > Signed-off-by: Nicholas Piggin <npiggin at gmail.com>
> > ---
> > Hi,
> >
> > If anybody would find this useful or has a better way to do it, let me
> > know. This is a polished up and improved version of what I've been using
> > for testing.
> >
> > It should be noted that upstream mambo currently does not quite work
> > properly with this because of a quirk in how it injects MCE interrupts.
> > I was kind-of hacking around that in the script but took out that code
> > because the mambo developers will be fixing that or giving us an option
> > to change behaviour soon.
> >
> > Thanks,
> > Nick  
> 
> Thanks, merged to master as of 6e6d5417ec67797b1dc52c8e30e9b1b4cf64e74f
> 

Thanks. BTW., if it wasn't mentioned, this did actually find a bug in Linux.

This may just need a couple of minor tweaks with the recent simulator
changes to exception injection that's been made for us, which is working
its way into the next sim release.

I think it should work, but I'll go through it again and re-test it all. I'll
also look at adding canned test cases for Linux that we can run as regression
tests so everything stays working.

Thanks,
Nick


More information about the Skiboot mailing list