machine check exception

Todd Inglett tinglett at vnet.ibm.com
Sat Feb 1 09:47:16 EST 2003


Randy pointed out that we have a DI exposure on pre-power4 hardware in
the machine check handler.  The exception is not synchronous so the
current handler may attempt to send a SIGBUS to a user process when the
kernel was actually at fault.  This is bad.

Another point is that it is actually trivial to see via the FWNMI
handler if the machine check was recovered by firmware so we should take
advantage of that.

Here's the function after I reorganized it.  I can post a patch later,
but diff doesn't handle the reorg of the code very well :(.  Note that
we don't attempt to recover unless we have an fwnmi handler which AFAIK
is always present on power4 and beyond.

-todd

WARNING: this code is untested...please review :)


void
MachineCheckException(struct pt_regs *regs)
{
	struct rtas_error_log *errhdr;
	int recoverable;
	siginfo_t info;

	if (fwnmi_active) {
		struct rtas_error_log *errhdr = FWNMI_get_errinfo(regs);
		recoverable = errhdr ? errhdr->disposition == DISP_FULLY_RECOVERED : 0;
		FWNMI_release_errinfo();
		if (recoverable)
			return;	/* easy recovery */
		else if (regs->msr & MSR_RI) {
			if (user_mode(regs)) {
				/* Only need to kill user process */
				info.si_signo = SIGBUS;
				info.si_errno = 0;
				info.si_code = BUS_ADRERR;
				info.si_addr = (void *)regs->nip;
				_exception(SIGSEGV, &info, regs);
				return;
			} else if (power4_handle_mce(regs)) {
				return;
			}
		}
	}

	if (debugger_fault_handler) {
		debugger_fault_handler(regs);
		return;
	}
	if (debugger)
		debugger(regs);

	console_verbose();
	spin_lock_irq(&die_lock);
	bust_spinlocks(1);
	printk("Machine check in kernel mode.\n");
	printk("Caused by (from SRR1=%lx): ", regs->msr);
	show_regs(regs);
	bust_spinlocks(0);
	spin_unlock_irq(&die_lock);
	panic("Unrecoverable Machine Check");
}


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list