[PATCH v5 5/7] powerpc/pseries: flush SLB contents on SLB MCE errors.

Michal Suchánek msuchanek at suse.de
Thu Jul 12 23:41:13 AEST 2018


On Tue, 3 Jul 2018 08:08:14 +1000
"Nicholas Piggin" <npiggin at gmail.com> wrote:

> On Mon, 02 Jul 2018 11:17:06 +0530
> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> wrote:
> 
> > From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> > 
> > On pseries, as of today system crashes if we get a machine check
> > exceptions due to SLB errors. These are soft errors and can be
> > fixed by flushing the SLBs so the kernel can continue to function
> > instead of system crash. We do this in real mode before turning on
> > MMU. Otherwise we would run into nested machine checks. This patch
> > now fetches the rtas error log in real mode and flushes the SLBs on
> > SLB errors.
> > 
> > Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> > ---
> >  arch/powerpc/include/asm/book3s/64/mmu-hash.h |    1 
> >  arch/powerpc/include/asm/machdep.h            |    1 
> >  arch/powerpc/kernel/exceptions-64s.S          |   42
> > +++++++++++++++++++++ arch/powerpc/kernel/mce.c
> > |   16 +++++++- arch/powerpc/mm/slb.c                         |
> > 6 +++ arch/powerpc/platforms/powernv/opal.c         |    1 
> >  arch/powerpc/platforms/pseries/pseries.h      |    1 
> >  arch/powerpc/platforms/pseries/ras.c          |   51
> > +++++++++++++++++++++++++
> > arch/powerpc/platforms/pseries/setup.c        |    1 9 files
> > changed, 116 insertions(+), 4 deletions(-) 
> 
> 
> > +TRAMP_REAL_BEGIN(machine_check_pSeries_early)
> > +BEGIN_FTR_SECTION
> > +	EXCEPTION_PROLOG_1(PACA_EXMC, NOTEST, 0x200)
> > +	mr	r10,r1			/* Save r1 */
> > +	ld	r1,PACAMCEMERGSP(r13)	/* Use MC emergency
> > stack */
> > +	subi	r1,r1,INT_FRAME_SIZE	/* alloc stack
> > frame		*/
> > +	mfspr	r11,SPRN_SRR0		/* Save SRR0 */
> > +	mfspr	r12,SPRN_SRR1		/* Save SRR1 */
> > +	EXCEPTION_PROLOG_COMMON_1()
> > +	EXCEPTION_PROLOG_COMMON_2(PACA_EXMC)
> > +	EXCEPTION_PROLOG_COMMON_3(0x200)
> > +	addi	r3,r1,STACK_FRAME_OVERHEAD
> > +	BRANCH_LINK_TO_FAR(machine_check_early) /* Function call
> > ABI */  
> 
> Is there any reason you can't use the existing
> machine_check_powernv_early code to do all this?
> 
> > diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
> > index efdd16a79075..221271c96a57 100644
> > --- a/arch/powerpc/kernel/mce.c
> > +++ b/arch/powerpc/kernel/mce.c
> > @@ -488,9 +488,21 @@ long machine_check_early(struct pt_regs *regs)
> >  {
> >  	long handled = 0;
> >  
> > -	__this_cpu_inc(irq_stat.mce_exceptions);
> > +	/*
> > +	 * For pSeries we count mce when we go into virtual mode
> > machine
> > +	 * check handler. Hence skip it. Also, We can't access per
> > cpu
> > +	 * variables in real mode for LPAR.
> > +	 */
> > +	if (early_cpu_has_feature(CPU_FTR_HVMODE))
> > +		__this_cpu_inc(irq_stat.mce_exceptions);
> >  
> > -	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> > +	/*
> > +	 * See if platform is capable of handling machine check.
> > +	 * Otherwise fallthrough and allow CPU to handle this
> > machine check.
> > +	 */
> > +	if (ppc_md.machine_check_early)
> > +		handled = ppc_md.machine_check_early(regs);
> > +	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
> >  		handled =
> > cur_cpu_spec->machine_check_early(regs);  
> 
> Would be good to add a powernv ppc_md handler which does the
> cur_cpu_spec->machine_check_early() call now that other platforms are
> calling this code. Because those aren't valid as a fallback call, but
> specific to powernv.
> 

Something like this (untested)?

Subject: [PATCH] powerpc/powernv: define platform MCE handler.

---
 arch/powerpc/kernel/mce.c              |  3 ---
 arch/powerpc/platforms/powernv/setup.c | 11 +++++++++++
 2 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c
index 221271c96a57..ae17d8aa60c4 100644
--- a/arch/powerpc/kernel/mce.c
+++ b/arch/powerpc/kernel/mce.c
@@ -498,12 +498,9 @@ long machine_check_early(struct pt_regs *regs)
 
 	/*
 	 * See if platform is capable of handling machine check.
-	 * Otherwise fallthrough and allow CPU to handle this machine check.
 	 */
 	if (ppc_md.machine_check_early)
 		handled = ppc_md.machine_check_early(regs);
-	else if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
-		handled = cur_cpu_spec->machine_check_early(regs);
 	return handled;
 }
 
diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c
index f96df0a25d05..b74c93bc2e55 100644
--- a/arch/powerpc/platforms/powernv/setup.c
+++ b/arch/powerpc/platforms/powernv/setup.c
@@ -431,6 +431,16 @@ static unsigned long pnv_get_proc_freq(unsigned int cpu)
 	return ret_freq;
 }
 
+static long pnv_machine_check_early(struct pt_regs *regs)
+{
+	long handled = 0;
+
+	if (cur_cpu_spec && cur_cpu_spec->machine_check_early)
+		handled = cur_cpu_spec->machine_check_early(regs);
+
+	return handled;
+}
+
 define_machine(powernv) {
 	.name			= "PowerNV",
 	.probe			= pnv_probe,
@@ -442,6 +452,7 @@ define_machine(powernv) {
 	.machine_shutdown	= pnv_shutdown,
 	.power_save             = NULL,
 	.calibrate_decr		= generic_calibrate_decr,
+	.machine_check_early	= pnv_machine_check_early,
 #ifdef CONFIG_KEXEC_CORE
 	.kexec_cpu_down		= pnv_kexec_cpu_down,
 #endif
-- 
2.13.7



More information about the Linuxppc-dev mailing list