[PATCH v7 3/3] drivers/vfio: EEH support for VFIO PCI device

Gavin Shan gwshan at linux.vnet.ibm.com
Wed May 28 22:49:47 EST 2014


On Wed, May 28, 2014 at 01:41:35PM +0200, Alexander Graf wrote:
>
>On 28.05.14 02:55, Gavin Shan wrote:
>>On Tue, May 27, 2014 at 12:15:27PM -0600, Alex Williamson wrote:
>>>On Tue, 2014-05-27 at 18:40 +1000, Gavin Shan wrote:
>>>>The patch adds new IOCTL commands for sPAPR VFIO container device
>>>>to support EEH functionality for PCI devices, which have been passed
>>>>through from host to somebody else via VFIO.
>>>>
>>>>Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
>>>>---
>>>>  Documentation/vfio.txt              | 92 ++++++++++++++++++++++++++++++++++++-
>>>>  drivers/vfio/pci/Makefile           |  1 +
>>>>  drivers/vfio/pci/vfio_pci.c         | 20 +++++---
>>>>  drivers/vfio/pci/vfio_pci_eeh.c     | 46 +++++++++++++++++++
>>>>  drivers/vfio/pci/vfio_pci_private.h |  5 ++
>>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 85 ++++++++++++++++++++++++++++++++++
>>>>  include/uapi/linux/vfio.h           | 66 ++++++++++++++++++++++++++
>>>>  7 files changed, 308 insertions(+), 7 deletions(-)
>>>>  create mode 100644 drivers/vfio/pci/vfio_pci_eeh.c
>
>[...]
>
>>>>diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>index cb9023d..c5fac36 100644
>>>>--- a/include/uapi/linux/vfio.h
>>>>+++ b/include/uapi/linux/vfio.h
>>>>@@ -455,6 +455,72 @@ struct vfio_iommu_spapr_tce_info {
>>>>  #define VFIO_IOMMU_SPAPR_TCE_GET_INFO	_IO(VFIO_TYPE, VFIO_BASE + 12)
>>>>+/*
>>>>+ * EEH functionality can be enabled or disabled on one specific device.
>>>>+ * Also, the DMA or IO frozen state can be removed from the frozen PE
>>>>+ * if required.
>>>>+ */
>>>>+struct vfio_eeh_pe_set_option {
>>>>+	__u32 argsz;
>>>>+	__u32 flags;
>>>>+	__u32 option;
>>>>+#define VFIO_EEH_PE_SET_OPT_DISABLE	0	/* Disable EEH	*/
>>>>+#define VFIO_EEH_PE_SET_OPT_ENABLE	1	/* Enable EEH	*/
>>>>+#define VFIO_EEH_PE_SET_OPT_IO		2	/* Enable IO	*/
>>>>+#define VFIO_EEH_PE_SET_OPT_DMA		3	/* Enable DMA	*/
>>>This is more of a "command" than an "option" isn't it?  Each of these
>>>probably needs a more significant description.
>>>
>>Yeah, it would be regarded as "opcode" and I'll add more description about
>>them in next revision.
>
>Please just call them commands.
>

Ok. I guess you want me to change the macro names like this ?

#define VFIO_EEH_CMD_DISABLE	0	/* Disable EEH functionality	*/
#define VFIO_EEH_CMD_ENABLE	1	/* Enable EEH functionality	*/
#define VFIO_EEH_CMD_ENABLE_IO	2	/* Enable IO for frozen PE	*/
#define VFIO_EEH_CMD_ENABLE_DMA	3	/* Enable DMA for frozen PE	*/

>>
>>>>+};
>>>>+
>>>>+#define VFIO_EEH_PE_SET_OPTION		_IO(VFIO_TYPE, VFIO_BASE + 21)
>>>>+
>>>>+/*
>>>>+ * Each EEH PE should have unique address to be identified. PE's
>>>>+ * sharing mode is also useful information as well.
>>>>+ */
>>>>+#define VFIO_EEH_PE_GET_ADDRESS		0	/* Get address	*/
>>>>+#define VFIO_EEH_PE_GET_MODE		1	/* Query mode	*/
>>>>+#define VFIO_EEH_PE_MODE_NONE		0	/* Not a PE	*/
>>>>+#define VFIO_EEH_PE_MODE_NOT_SHARED	1	/* Exclusive	*/
>>>>+#define VFIO_EEH_PE_MODE_SHARED		2	/* Shared mode	*/
>>>>+
>>>>+/*
>>>>+ * EEH PE might have been frozen because of PCI errors. Also, it might
>>>>+ * be experiencing reset for error revoery. The following command helps
>>>>+ * to get the state.
>>>>+ */
>>>>+struct vfio_eeh_pe_get_state {
>>>>+	__u32 argsz;
>>>>+	__u32 flags;
>>>>+	__u32 state;
>>>>+};
>>>Should state be a union to better describe the value returned?  What
>>>exactly is the address and why does the user need to know it?  Does this
>>>need user input or could we just return the address and mode regardless?
>>>
>>Ok. I think you want enum (not union) for state. I'll have macros for the
>>state in next revision as I did that for other cases.
>>
>>Those macros defined for "address" just for ABI stuff as Alex.G mentioned.
>>There isn't corresponding ioctl command for host to get address any more
>>because QEMU (user) will have to figure it out by himself. The "address"
>>here means PE address and user has to figure it out according to PE
>>segmentation.
>
>Why would the user ever need the address?
>

I will remove those "address" related macros in next revision because it's
user-level bussiness, not related to host kernel any more.

If the user is QEMU + guest, we need the address to identify the PE though PHB
BUID could be used as same purpose to get PHB, which is one-to-one mapping with
IOMMU group on sPAPR platform. However, once the PE address is built and returned
to guest, guest will use the PE address as input parameter in subsequent RTAS
calls.

If the user is some kind of application who just uses the ioctl() without supporting
RTAS calls. We don't need care about PE address. 

Thanks,
Gavin



More information about the Linuxppc-dev mailing list