[RFC PATCH 2/9] powerpc: handle machine check in Linux host.

Anshuman Khandual khandual at linux.vnet.ibm.com
Thu Aug 8 15:01:00 EST 2013


On 08/07/2013 03:08 PM, Mahesh J Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> 
> Move machine check entry point into Linux. So far we were dependent on
> firmware to decode MCE error details and handover the high level info to OS.
> 
> This patch introduces early machine check routine that saves the MCE
> information (srr1, srr0, dar and dsisr) to the emergency stack. We allocate
> stack frame on emergency stack and set the r1 accordingly. This allows us
> to be prepared to take another exception without loosing context. One thing
> to note here that, if we get another machine check while ME bit is off then
> we risk a checkstop. Hence we restrict ourselves to save only MCE information
> and turn the ME bit on.
> 
> This is the code flow:
> 
> 		Machine Check Interrupt
> 			|
> 			V
> 		   0x200 vector				  ME=0, IR=0, DR=0
> 			|
> 			V
> 	+-----------------------------------------------+
> 	|machine_check_pSeries_early:			| ME=0, IR=0, DR=0
> 	|	Alloc frame on emergency stack		|
> 	|	Save srr1, srr0, dar and dsisr on stack |
> 	+-----------------------------------------------+
> 			|
> 		(ME=1, IR=0, DR=0, RFID)
> 			|
> 			V
> 		machine_check_handle_early		  ME=1, IR=0, DR=0
> 			|
> 			V
> 	+-----------------------------------------------+
> 	|	machine_check_early (r3=pt_regs)	| ME=1, IR=0, DR=0
> 	|	Things to do: (in next patches)		|
> 	|		Flush SLB for SLB errors	|
> 	|		Flush TLB for TLB errors	|
> 	|		Decode and save MCE info	|
> 	+-----------------------------------------------+
> 			|
> 	(Fall through existing exception handler routine.)
> 			|
> 			V
> 		machine_check_pSerie			  ME=1, IR=0, DR=0
> 			|
> 		(ME=1, IR=1, DR=1, RFID)
> 			|
> 			V
> 		machine_check_common			  ME=1, IR=1, DR=1
> 			.
> 			.
> 			.
> 
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/exception-64s.h |   43 ++++++++++++++++++++++++++
>  arch/powerpc/kernel/exceptions-64s.S     |   50 +++++++++++++++++++++++++++++-
>  arch/powerpc/kernel/traps.c              |   12 +++++++
>  3 files changed, 104 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/exception-64s.h b/arch/powerpc/include/asm/exception-64s.h
> index 2386d40..c5d2cbc 100644
> --- a/arch/powerpc/include/asm/exception-64s.h
> +++ b/arch/powerpc/include/asm/exception-64s.h
> @@ -174,6 +174,49 @@ END_FTR_SECTION_NESTED(ftr,ftr,943)
>  #define EXCEPTION_PROLOG_1(area, extra, vec)				\
>  	__EXCEPTION_PROLOG_1(area, extra, vec)
> 
> +/*
> + * Register contents:
> + * R12		= interrupt vector
> + * R13		= PACA
> + * R9		= CR
> + * R11 & R12 is saved on PACA_EXMC
> + *
> + * Swicth to emergency stack and handle re-entrancy (though we currently
> + * don't test for overflow). Save MCE registers srr1, srr0, dar and
> + * dsisr and then turn the ME bit on.
> + */
> +#define __EARLY_MACHINE_CHECK_HANDLER(area, label)			\
> +	/* Check if we are laready using emergency stack. */		\
> +	ld	r10,PACAEMERGSP(r13);					\
> +	subi	r10,r10,THREAD_SIZE;					\
> +	rldicr	r10,r10,0,(63 - THREAD_SHIFT);				\
> +	rldicr	r11,r1,0,(63 - THREAD_SHIFT);				\
> +	cmpd	r10,r11;	/* Are we using emergency stack? */	\
> +	mr	r11,r1;			/* Save current stack pointer */\
> +	beq	0f;							\
> +	ld	r1,PACAEMERGSP(r13);	/* Use emergency stack */	\
> +0:	subi	r1,r1,INT_FRAME_SIZE;	/* alloc stack frame */		\
> +	std	r11,GPR1(r1);						\
> +	std	r11,0(r1);		/* make stack chain pointer */	\
> +	mfspr	r11,SPRN_SRR0;		/* Save SRR0 */			\
> +	std	r11,_NIP(r1);						\
> +	mfspr	r11,SPRN_SRR1;		/* Save SRR1 */			\
> +	std	r11,_MSR(r1);						\
> +	mfspr	r11,SPRN_DAR;		/* Save DAR */			\
> +	std 	r11,_DAR(r1);						\
> +	mfspr	r11,SPRN_DSISR;		/* Save DSISR */		\
> +	std	r11,_DSISR(r1);						\
> +	mfmsr	r11;			/* get MSR value */		\
> +	ori	r11,r11,MSR_ME;		/* turn on ME bit */		\

You need to mention here the fact that we are vulnerable to a core check
stop possibility if we get another machine check exception till we set
the ME bit ON (from the occurrence of the interrupt).



More information about the Linuxppc-dev mailing list