[PATCH v4 2/2] KVM: PPC: Exit guest upon MCE when FWNMI capability is enabled

Aravinda Prasad aravinda at linux.vnet.ibm.com
Thu Jan 12 21:14:12 AEDT 2017



On Thursday 12 January 2017 02:35 PM, Balbir Singh wrote:
> On Mon, Jan 09, 2017 at 05:10:45PM +0530, Aravinda Prasad wrote:

[ . . .]


>> The reasons for this approach is (i) it is not possible
>> to distinguish whether the exception occurred in the
>> guest or the host from the pt_regs passed on the
>> machine_check_exception(). Hence machine_check_exception()
>> calls panic, instead of passing on the exception to
>> the guest, if the machine check exception is not
>> recoverable. (ii) the approach introduced in this
>> patch gives opportunity to the host kernel to perform
>> actions in virtual mode before passing on the exception
>> to the guest. This approach does not require complex
>> tweaks to machine_check_fwnmi and friends.
> 
> It would be good to qualify the different types of MCE
> and what action we expect across hypervisor and guest.

The hypervisor performs actions depending on the type of MCE (SLB
multihit, UEs, etc). If the hypervisor is unable to recover from the MCE
and if the address in error belongs to the guest, then this patch set
forwards the error to the guest kernel for handling.

The main goal of this patch set is to pass on the unrecoverable MCE
errors in the guest address space to the guest kernel, instead of
crashing the hypervisor. The action taken by the hypervisor and the
guest kernel upon MCE remains unchanged.

[ . . . ]

> 
> Shouldn't the host take action for example poison bad pages?
> 

We want to give the guest kernel a chance to recover the clean part of
the page before poisoning. As in case of an UE only few bytes of a page
are affected. Hence we don't immediately poison the bad pages in the host.

It is expected that the guest kernel performs the poisoning of the bad
pages after performing recovery action. This prevents the guest from
reusing the bad page.

However, the missing part is to communicate back to the host when guest
is done with the recovery. This is mainly to prevent reuse of bad pages
by the host when the guest shutdowns/reboots/crashes/migrates.

We are planning to address this part as a separate patch set.

Regards,
Aravinda

>>  	if (opal_recover_mce(regs, &evt))
>>  		return 1;
>>  
>>
> 
> Balbir Singh 
> 

-- 
Regards,
Aravinda



More information about the Linuxppc-dev mailing list