[PATCH v2] powerpc/mce: Fix access error in mce handler

Daniel Axtens dja at axtens.net
Fri Sep 17 16:39:45 AEST 2021


Hi Ganesh,

> We queue an irq work for deferred processing of mce event
> in realmode mce handler, where translation is disabled.
> Queuing of the work may result in accessing memory outside
> RMO region, such access needs the translation to be enabled
> for an LPAR running with hash mmu else the kernel crashes.
>
> After enabling translation in mce_handle_error() we used to
> leave it enabled to avoid crashing here, but now with the
> commit 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before
> returning from handler") we are restoring the MSR to disable
> translation.
>
> Hence to fix this enable the translation before queuing the work.

[snip]

> Fixes: 74c3354bc1d89 ("powerpc/pseries/mce: restore msr before returning from handler")

That patch changes arch/powerpc/powerpc/platforms/pseries/ras.c just
below this comment:

    /*
     * Enable translation as we will be accessing per-cpu variables
     * in save_mce_event() which may fall outside RMO region, also
     * leave it enabled because subsequently we will be queuing work
     * to workqueues where again per-cpu variables accessed, besides
     * fwnmi_release_errinfo() crashes when called in realmode on
     * pseries.
     * Note: All the realmode handling like flushing SLB entries for
     *       SLB multihit is done by now.
     */

That suggests per-cpu variables need protection. In your patch, you
enable translations just around irq_work_queue:

> +	/* Queue irq work to process this event later. Before
> +	 * queuing the work enable translation for non radix LPAR,
> +	 * as irq_work_queue may try to access memory outside RMO
> +	 * region.
> +	 */
> +	if (!radix_enabled() && firmware_has_feature(FW_FEATURE_LPAR)) {
> +		msr = mfmsr();
> +		mtmsr(msr | MSR_IR | MSR_DR);
> +		irq_work_queue(&mce_event_process_work);
> +		mtmsr(msr);
> +	} else {
> +		irq_work_queue(&mce_event_process_work);
> +	}

However, just before that in the function, there are a few things that
access per-cpu variables via the local_paca, e.g.:

memcpy(&local_paca->mce_info->mce_event_queue[index],
       &evt, sizeof(evt));

Do we need to widen the window where translations are enabled in order
to protect accesses to local_paca?

Kind regards,
Daniel


More information about the Linuxppc-dev mailing list