[PATCH v2] powerpc: Handle MCE on POWER9 with only DSISR bit 33 set

Nicholas Piggin npiggin at gmail.com
Thu Sep 21 22:44:33 AEST 2017


On Thu, 21 Sep 2017 19:57:20 +1000
Michael Neuling <mikey at neuling.org> wrote:

> On Thu, 2017-09-21 at 18:18 +1000, Nicholas Piggin wrote:
> > On Thu, 21 Sep 2017 12:04:34 +1000
> > Michael Neuling <mikey at neuling.org> wrote:
> >   
> > > On POWER9 DD2.1 and below, it's possible to get Machine Check
> > > Exception (MCE) where only DSISR bit 33 is set. This will result in
> > > the linux MCE handler seeing an unknown event, which triggers linux to
> > > crash.
> > > 
> > > We change this by detecting unknown events in the MCE handler and
> > > marking them as handled so that we no longer crash. We do this only on
> > > chip revisions known to have this problem.
> > > 
> > > MCE that occurs like this is spurious, so we don't need to do anything
> > > in terms of servicing it. If there is something that needs to be
> > > serviced, the CPU will raise the MCE again with the correct DSISR so
> > > that it can be serviced properly.
> > > 
> > > Signed-off-by: Michael Neuling <mikey at neuling.org>
> > > ---
> > > v2 update commit message based on Balbir's comments
> > > ---
> > >  arch/powerpc/kernel/mce_power.c | 15 +++++++++++++++
> > >  1 file changed, 15 insertions(+)
> > > 
> > > diff --git a/arch/powerpc/kernel/mce_power.c
> > > b/arch/powerpc/kernel/mce_power.c
> > > index b76ca198e0..72ec667136 100644
> > > --- a/arch/powerpc/kernel/mce_power.c
> > > +++ b/arch/powerpc/kernel/mce_power.c
> > > @@ -595,6 +595,7 @@ static long mce_handle_error(struct pt_regs *regs,
> > >  	uint64_t addr;
> > >  	uint64_t srr1 = regs->msr;
> > >  	long handled;
> > > +	unsigned long pvr;
> > >  
> > >  	if (SRR1_MC_LOADSTORE(srr1))
> > >  		handled = mce_handle_derror(regs, dtable, &mce_err, &addr);
> > > @@ -604,6 +605,20 @@ static long mce_handle_error(struct pt_regs *regs,
> > >  	if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE)
> > >  		handled = mce_handle_ue_error(regs);
> > >  
> > > +	/*
> > > +	 * On POWER9 DD2.1 and below, it's possible to get machine
> > > +	 * check where only DSISR bit 33 is set. This will result in
> > > +	 * the MCE handler seeing an unknown event and us crashing.
> > > +	 * Change this to mark as handled on these revisions.
> > > +	 */
> > > +	pvr = mfspr(SPRN_PVR);
> > > +	if (((PVR_VER(pvr) == PVR_POWER9) &&
> > > +	     (PVR_CFG(pvr) == 2) &&
> > > +	     (PVR_MIN(pvr) <= 1)) || cpu_has_feature(CPU_FTR_POWER9_DD1))
> > > +		/* DD2.1 and below */
> > > +		if (mce_err.error_type == MCE_ERROR_TYPE_UNKNOWN)
> > > +		    handled = 1;  
> > 
> > I might be missing something, but can you just do
> > 
> >   if (regs->dsisr == 0x40000000)
> >       return 1;
> > 
> > In __machine_check_early_realmode_p9() ?  
> 
> You're right, thanks.

If you leave the PVR and DD1 checks in there, it would be a good
reminder for me to convert into a quirk if I can get this version
specific quirks stuff going

https://marc.info/?l=linuxppc-embedded&m=150597337720114&w=2

Thanks,
Nick


More information about the Linuxppc-dev mailing list