[PATCH -next v5 6/8] arm64: add support for machine check error safe
Tong Tiangen
tongtiangen at huawei.com
Mon Jun 20 11:53:26 AEST 2022
在 2022/6/18 20:52, Mark Rutland 写道:
> On Sat, Jun 18, 2022 at 05:18:55PM +0800, Tong Tiangen wrote:
>> 在 2022/6/17 16:55, Mark Rutland 写道:
>>> On Sat, May 28, 2022 at 06:50:54AM +0000, Tong Tiangen wrote:
>>>> +static bool arm64_do_kernel_sea(unsigned long addr, unsigned int esr,
>>>> + struct pt_regs *regs, int sig, int code)
>>>> +{
>>>> + if (!IS_ENABLED(CONFIG_ARCH_HAS_COPY_MC))
>>>> + return false;
>>>> +
>>>> + if (user_mode(regs) || !current->mm)
>>>> + return false;
>>>
>>> What's the `!current->mm` check for? >>
>> At first, I considered that only user processes have the opportunity to
>> recover when they trigger memory error.
>>
>> But it seems that this restriction is unreasonable. When the kernel thread
>> triggers memory error, it can also be recovered. for instance:
>>
>> https://lore.kernel.org/linux-mm/20220527190731.322722-1-jiaqiyan@google.com/
>>
>> And i think if(!current->mm) shoud be added below:
>>
>> if(!current->mm) {
>> set_thread_esr(0, esr);
>> arm64_force_sig_fault(...);
>> }
>> return true;
>
> Why does 'current->mm' have anything to do with this, though?
Sorry, typo, my original logic was:
if(current->mm) {
[...]
}
>
> There can be kernel threads with `current->mm` set in unusual circumstances
> (and there's a lot of kernel code out there which handles that wrong), so if
> you want to treat user tasks differently, we should be doing something like
> checking PF_KTHREAD, or adding something like an is_user_task() helper.
>
OK, i do want to treat user tasks differently here and didn't take into
account what you said. will be fixed next version according to your
suggestiong.
As follows:
if (!(current->flags & PF_KTHREAD)) {
set_thread_esr(0, esr);
arm64_force_sig_fault(...);
}
return true;
> [...]
>
>>>> +
>>>> + if (apei_claim_sea(regs) < 0)
>>>> + return false;
>>>> +
>>>> + if (!fixup_exception_mc(regs))
>>>> + return false;
>>>
>>> I thought we still wanted to signal the task in this case? Or do you expect to
>>> add that into `fixup_exception_mc()` ?
>>
>> Yeah, here return false and will signal to task in do_sea() ->
>> arm64_notify_die().
>
> I mean when we do the fixup.
>
> I thought the idea was to apply the fixup (to stop the kernel from crashing),
> but still to deliver a fatal signal to the user task since we can't do what the
> user task asked us to.
>
Yes, that's what i mean. :)
>>>> +
>>>> + set_thread_esr(0, esr);
>>>
>>> Why are we not setting the address? Is that deliberate, or an oversight?
>>
>> Here set fault_address to 0, i refer to the logic of arm64_notify_die().
>>
>> void arm64_notify_die(...)
>> {
>> if (user_mode(regs)) {
>> WARN_ON(regs != current_pt_regs());
>> current->thread.fault_address = 0;
>> current->thread.fault_code = err;
>>
>> arm64_force_sig_fault(signo, sicode, far, str);
>> } else {
>> die(str, regs, err);
>> }
>> }
>>
>> I don't know exactly why and do you know why arm64_notify_die() did this? :)
>
> To be honest, I don't know, and that looks equally suspicious to me.
>
> Looking at the git history, that was added in commit:
>
> 9141300a5884b57c ("arm64: Provide read/write fault information in compat signal handlers")
>
> ... so maybe Catalin recalls why.
>
> Perhaps the assumption is just that this will be fatal and so unimportant? ...
> but in that case the same logic would apply to the ESR value, so it's not clear
> to me.
OK, let's proceed as set to 0, if there is any change later, the two
positions shall be changed together.
Thanks,
Tong.
>
> Mark.
>
> .
More information about the Linuxppc-dev
mailing list