powernv: Invoke opal_cec_reboot2() on unrecoverable machine check errors.

Michael Ellerman mpe at ellerman.id.au
Mon Aug 10 19:27:05 AEST 2015


On Fri, 2015-31-07 at 15:54:38 UTC, Mahesh Salgaonkar wrote:
> From: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>
> 
> On non-recoverable MCE errors in kernel space, Linux kernel panics
> and system reboots. On BMC based system opal-prd runs as a daemon
> in the host. Hence, kernel crash may prevent opal-prd to detect and
> analyze this MCE error. This may land us in a situation where the faulty
> memory never gets de-configured and Linux would keep hitting same MCE error
> again and again. If this happens in early stage of kernel initialization,
> then Linux will keep crashing and rebooting in a loop.
> 
> This patch fixes this issue by invoking new opal_cec_reboot2() call with
> reboot type OPAL_REBOOT_PLATFORM_ERROR to inform BMC/OCC about this
> error, so that BMC can collect relevant data for error analysis and
> decide what component to de-configure before rebooting.
> 
> This patch is dependent on OPAL patchset posted on skiboot mailing list
> at https://lists.ozlabs.org/pipermail/skiboot/2015-July/001771.html that
> introduces opal_cec_reboot2() opal call.
> 
> Signed-off-by: Mahesh Salgaonkar <mahesh at linux.vnet.ibm.com>

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e784b6499d9cba83b7f3

cheers


More information about the Linuxppc-dev mailing list