[PATCH 4/4] powerpc/eeh: Avoid event on passed PE

Alexander Graf agraf at suse.de
Tue May 20 22:14:56 EST 2014


On 20.05.14 13:56, Gavin Shan wrote:
> On Tue, May 20, 2014 at 01:25:11PM +0200, Alexander Graf wrote:
>> On 20.05.14 10:30, Gavin Shan wrote:
>>> If we detects frozen state on PE that has been passed to guest, we
>>> needn't handle it. Instead, we rely on the guest to detect and recover
>>> it. The patch avoid EEH event on the frozen passed PE so that the guest
>>> can have chance to handle that.
>>>
>>> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>> How does the guest learn about this failure? We'd need to inject an
>> error into it, no?
>>
> When error is existing in HW level, 0xFF's will be turned on reading
> PCI config space or memory BARs. Guest retrieves the failure state,
> which is captured by HW automatically, via RTAS call
> "ibm,read-slot-reset-state2" when seeing 0xFF's on reading PCI config
> space or memory BARs. If "ibm,read-slot-reset-state2" reports errors in HW,
> the guest kernel starts to recovery.
>
> It can be called as "passive" reporting. There possible has one case that
> the error can't be reported for ever: No device driver binding to the VFIO
> PCI device and no access to device's config space and memory BARs. However,
> it doesn't matter. As we don't use the device, we needn't detect and recover
> the error at all.

So if the guest is waiting for an interrupt to happen it will wait 
forever? Not really nice.

>> I think what you want is an irqfd that the in-kernel eeh code
>> notifies when it sees a failure. When such an fd exists, the kernel
>> skips its own error handling.
>>
> Yeah, it's a good idea and something for me to improve in phase II. We
> can discuss for more later.

I think it makes sense to at least walk into that direction immediately. 
The reason I brought it up in the context of this patch is that with an 
irqfd you wouldn't need the passed flag at all.

>   For now, what I have in my head is something
> like this:
>
>        [ Host ] -> Error detected -> irqfd (or eventfd) -> QEMU
>                                                             |
>                                     -------------(A)---------
>                                     |
>                          Send one EEH event to guest kernel
>                                     |
>                          Guest kernel starts the recovery
>
> (A): I didn't figure out one convienent way to do the EEH event injection yet.

How does the guest learn about errors in pHyp?


Alex



More information about the Linuxppc-dev mailing list