[PATCH] powerpc/mce: check if event info is valid

Nicholas Piggin npiggin at gmail.com
Fri Oct 15 18:44:35 AEDT 2021


Excerpts from Michael Ellerman's message of October 7, 2021 10:09 pm:
> Ganesh <ganeshgr at linux.ibm.com> writes:
>> On 8/6/21 6:53 PM, Ganesh Goudar wrote:
>>
>>> Check if the event info is valid before printing the
>>> event information. When a fwnmi enabled nested kvm guest
>>> hits a machine check exception L0 and L2 would generate
>>> machine check event info, But L1 would not generate any
>>> machine check event info as it won't go through 0x200
>>> vector and prints some unwanted message.
>>>
>>> To fix this, 'in_use' variable in machine check event info is
>>> no more in use, rename it to 'valid' and check if the event
>>> information is valid before logging the event information.
>>>
>>> without this patch L1 would print following message for
>>> exceptions encountered in L2, as event structure will be
>>> empty in L1.
>>>
>>> "Machine Check Exception, Unknown event version 0".
>>>
>>> Signed-off-by: Ganesh Goudar <ganeshgr at linux.ibm.com>
>>> ---
>>
>> Hi mpe, Any comments on this patch.
> 
> The variable rename is a bit of a distraction.
> 
> But ignoring that, how do we end up processing a machine_check_event
> that has in_use = 0?
> 
> You don't give much detail on what call path you're talking about. I
> guess we're coming in via the calls in the KVM code?
> 
> In the definition of kvm_vcpu_arch we have:
> 
> 	struct machine_check_event mce_evt; /* Valid if trap == 0x200 */
> 
> And you said we're not going via 0x200 in L1. But so shouldn't we be
> teaching the KVM code not to use mce_evt when trap is not 0x200?

I'm not sure we want the MCE to skip the L1 either. It should match the 
L0 hypervisor behaviour as closely as reasonably possible.

We might have to teach the KVM pseries path to do something about the
0x200 before the common HV guest exit handler (which is where the L1
message comes from).

Thanks,
Nick


More information about the Linuxppc-dev mailing list