[PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device

Gavin Shan gwshan at linux.vnet.ibm.com
Fri May 23 21:55:09 EST 2014

On Fri, May 23, 2014 at 11:58:22AM +0200, Alexander Graf wrote:
>On 23.05.14 09:37, Gavin Shan wrote:
>>On Fri, May 23, 2014 at 08:55:15AM +0200, Alexander Graf wrote:
>>>>Am 23.05.2014 um 06:37 schrieb Gavin Shan <gwshan at linux.vnet.ibm.com>:
>>>>>On Thu, May 22, 2014 at 09:10:53PM -0600, Alex Williamson wrote:
>>>>>>On Thu, 2014-05-22 at 18:23 +1000, Gavin Shan wrote:
>>>>>>The patch adds new IOCTL commands for VFIO PCI device to support
>>>>>>EEH functionality for PCI devices, which have been passed through
>>>>>>from host to somebody else via VFIO.
>>>>>>+ * Reset is the major step to recover problematic PE. The following
>>>>>>+ * command helps on that.
>>>>>>+ */
>>>>>>+struct vfio_eeh_pe_reset {
>>>>>>+    __u32 argsz;
>>>>>>+    __u32 option;
>>>>>>+#define VFIO_EEH_PE_RESET        _IO(VFIO_TYPE, VFIO_BASE + 24)
>>>>>>+ * One of the steps for recovery after PE reset is to configure the
>>>>>>+ * PCI bridges affected by the PE reset.
>>>>>>+ */
>>>>>>+#define VFIO_EEH_PE_CONFIGURE        _IO(VFIO_TYPE, VFIO_BASE + 25)
>>>>>What can the user do differently by making these separate ioctls?
>>>>hrm, I didn't understood as well. Alex.G could have the explaination.
>>>Alex raised the same concern as me: why separate reset and configure? When we want to recover a device, we need a reset call anyway, right?
>>Ok. With current ioctl commands, "reset+configure" is required to do
>>error recovery. Before the recovery, we also need call "configure"
>>in order to retrieve error log correctly.
>Well, the "configure" ioctl (which is a really bad name for what it
>does btw) currently only restores the BARs which doesn't sound like
>error log retrieval to me.

Could you please suggest a better name? I had VFIO_EEH_PE_CONFIGURE because
it's for RTAS call "ibm,configure-pe".

>>Also, they corresponds to 2 separate RTAS services: "ibm,set-slot-reset"
>>and "ibm,configure-pe".
>Does a guest always issue both? What's the order it calls them in?

For one error, the following RTAS calls was called in general:

< stop device drivers, no PCI traffic expected during recovery >
< error log retrival >
< resume device drivers >

We have other scenario. For example, PE reset failure and collect
the permanent log. Prior to that, "ibm,configure-pe" should be called.



More information about the Linuxppc-dev mailing list