[v3 PATCH 4/5] powerpc/pseries: Dump and flush SLB contents on SLB MCE errors.
Aneesh Kumar K.V
aneesh.kumar at linux.ibm.com
Wed Jun 13 14:06:54 AEST 2018
On 06/13/2018 09:36 AM, Michael Ellerman wrote:
> "Aneesh Kumar K.V" <aneesh.kumar at linux.ibm.com> writes:
>> On 06/12/2018 07:17 PM, Michael Ellerman wrote:
>>> Mahesh J Salgaonkar <mahesh at linux.vnet.ibm.com> writes:
>>>> diff --git a/arch/powerpc/platforms/pseries/ras.c b/arch/powerpc/platforms/pseries/ras.c
>>>> index 2edc673be137..e56759d92356 100644
>>>> --- a/arch/powerpc/platforms/pseries/ras.c
>>>> +++ b/arch/powerpc/platforms/pseries/ras.c
>>>> @@ -422,6 +422,31 @@ int pSeries_system_reset_exception(struct pt_regs *regs)
>>>> return 0; /* need to perform reset */
>>>> }
>>>>
>>>> +static int mce_handle_error(struct rtas_error_log *errp)
>>>> +{
>>>> + struct pseries_errorlog *pseries_log;
>>>> + struct pseries_mc_errorlog *mce_log;
>>>> + int disposition = rtas_error_disposition(errp);
>>>> + uint8_t error_type;
>>>> +
>>>> + pseries_log = get_pseries_errorlog(errp, PSERIES_ELOG_SECT_ID_MCE);
>>>> + if (pseries_log == NULL)
>>>> + goto out;
>>>> +
>>>> + mce_log = (struct pseries_mc_errorlog *)pseries_log->data;
>>>> + error_type = rtas_mc_error_type(mce_log);
>>>> +
>>>> + if ((disposition == RTAS_DISP_NOT_RECOVERED) &&
>>>> + (error_type == PSERIES_MC_ERROR_TYPE_SLB)) {
>>>> + slb_dump_contents();
>>>> + slb_flush_and_rebolt();
>>>
>>> Aren't we back in virtual mode here?
>>>
>>> Don't we need to do the flush in real mode before turning the MMU back
>>> on. Otherwise we'll just take another multi-hit?
>>
>> slb_flush_and_rebolt does slbia, which keeps slb index 0. So kernel code
>> should not get another slb miss. We also make sure we don't touch stack
>> in slb_flush_and_rebolt(). So we flush everything and put vmalloc and
>> stack back. That should be ok with MMU on?
>
> I don't think so.
>
> Imagine we take a multi-hit accessing the paca. The machine check is
> delivered in real mode, so we can run and access the paca by it's real
> address. But as soon as we turn the MMU back on, we'll take another
> multi-hit when we access the paca.
>
> If I'm reading the code right we are turning the MMU back on essentially
> straight away when we rfid to machine_check_common().
>
yes for linear mapped first 1TB we will take a multi-hit again
-aneesh
More information about the Linuxppc-dev
mailing list