[PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device
Alexander Graf
agraf at suse.de
Thu May 22 19:55:29 EST 2014
On 22.05.14 10:23, Gavin Shan wrote:
> The patch adds new IOCTL commands for VFIO PCI device to support
> EEH functionality for PCI devices, which have been passed through
> from host to somebody else via VFIO.
>
> Signed-off-by: Gavin Shan <gwshan at linux.vnet.ibm.com>
This already looks a *lot* more sane than the previous versions. We're
slowly getting there I think ;).
Ben, could you please check through the exported EEH interface itself
and verify whether it does all the lockings correctly, uses the API
properly and doesn't allow for a user space program to blow up the system?
> ---
> Documentation/vfio.txt | 88 ++++++++++-
> arch/powerpc/include/asm/eeh.h | 17 +++
> arch/powerpc/kernel/eeh.c | 321 +++++++++++++++++++++++++++++++++++++++++
> drivers/vfio/pci/vfio_pci.c | 131 ++++++++++++++++-
> include/uapi/linux/vfio.h | 53 +++++++
> 5 files changed, 603 insertions(+), 7 deletions(-)
>
> diff --git a/Documentation/vfio.txt b/Documentation/vfio.txt
> index b9ca023..dd13db6 100644
> --- a/Documentation/vfio.txt
> +++ b/Documentation/vfio.txt
> @@ -305,7 +305,10 @@ faster, the map/unmap handling has been implemented in real mode which provides
> an excellent performance which has limitations such as inability to do
> locked pages accounting in real time.
>
> -So 3 additional ioctls have been added:
> +4) PPC64 guests detect PCI errors and recover from them via EEH RTAS services,
> +which works on the basis of additional ioctl commands.
> +
> +So 8 additional ioctls have been added:
>
> VFIO_IOMMU_SPAPR_TCE_GET_INFO - returns the size and the start
> of the DMA window on the PCI bus.
> @@ -316,6 +319,20 @@ So 3 additional ioctls have been added:
>
> VFIO_IOMMU_DISABLE - disables the container.
>
> + VFIO_EEH_PE_SET_OPTION - enables or disables EEH functinality on the
functionality
> + specified device. Also, it can be used to remove IO or DMA
> + stopped state on the frozen PE.
> +
> + VFIO_EEH_PE_GET_ADDR - retrieve the unique address of the specified
> + PE or query PE sharing mode.
What is PE?
> +
> + VFIO_EEH_PE_GET_STATE - retrieve PE's state: frozen or normal state.
> +
> + VFIO_EEH_PE_RESET - do PE reset, which is one of the major steps for
> + error recovering.
> +
> + VFIO_EEH_PE_CONFIGURE - configure the PCI bridges after PE reset. It's
> + one of the major steps for error recoverying.
>
> The code flow from the example above should be slightly changed:
>
> @@ -346,6 +363,75 @@ The code flow from the example above should be slightly changed:
> ioctl(container, VFIO_IOMMU_MAP_DMA, &dma_map);
> .....
>
> +Based on the initial example we have, the following piece of code could be
> +reference for EEH setup and error handling:
> +
> + struct vfio_eeh_pe_set_option option = { .argsz = sizeof(option) };
> + struct vfio_eeh_pe_get_addr addr = { .argsz = sizeof(addr) };
> + struct vfio_eeh_pe_get_state state = { .argsz = sizeof(state) };
> + struct vfio_eeh_pe_reset reset = { .argsz = sizeof(reset) };
> + struct vfio_eeh_pe_configure config = { .argsz = sizeof(config) };
> +
> + ....
> +
> + /* Get a file descriptor for the device */
> + device = ioctl(group, VFIO_GROUP_GET_DEVICE_FD, "0000:06:0d.0");
> +
> + /* Enable the EEH functionality on the device */
> + option.option = EEH_OPT_ENABLE;
> + ioctl(device, VFIO_EEH_PE_SET_OPTION, &option);
> +
> + /* Retrieve PE address and create and maintain PE by yourself */
> + addr.option = EEH_OPT_GET_PE_ADDR;
> + ioctl(device, VFIO_EEH_PE_GET_ADDR, &addr);
> +
> + /* Assure EEH is supported on the PE and make PE functional */
> + ioctl(device, VFIO_EEH_PE_GET_STATE, &state);
> +
> + /* Save device's state. pci_save_state() would be good enough
> + * as an example.
> + */
> +
> + /* Test and setup the device */
> + ioctl(device, VFIO_DEVICE_GET_INFO, &device_info);
> +
> + ....
> +
> + /* When 0xFF's returned from reading PCI config space or IO BARs
> + * of the PCI device. Check the PE state to see if that has been
> + * frozen.
> + */
> + ioctl(device, VFIO_EEH_PE_GET_STATE, &state);
> +
> + /* Waiting for pending PCI transactions to be completed and don't
> + * produce any more PCI traffic from/to the affected PE until
> + * recovery is finished.
> + */
> +
> + /* Enable IO for the affected PE and collect logs. Usually, the
> + * standard part of PCI config space, AER registers are dumped
> + * as logs for further analysis.
> + */
> + option.option = EEH_OPT_THAW_MMIO;
> + ioctl(device, VFIO_EEH_PE_SET_OPTION, &option);
> +
> + /* Issue PE reset */
> + reset.option = EEH_RESET_HOT;
> + ioctl(device, VFIO_EEH_PE_RESET, &reset);
> +
> + /* Configure the PCI bridges for the affected PE */
> + ioctl(device, VFIO_EEH_PE_CONFIGURE, NULL);
> +
> + /* Restored state we saved at initialization time. pci_restore_state()
> + * is good enough as an example.
> + */
> +
> + /* Hopefully, error is recovered successfully. Now, you can resume to
> + * start PCI traffic to/from the affected PE.
> + */
> +
> + ....
> +
> -------------------------------------------------------------------------------
>
> [1] VFIO was originally an acronym for "Virtual Function I/O" in its
> diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
> index 34a2d83..dd5f1cf 100644
> --- a/arch/powerpc/include/asm/eeh.h
> +++ b/arch/powerpc/include/asm/eeh.h
> @@ -191,6 +191,11 @@ enum {
> #define EEH_OPT_ENABLE 1 /* EEH enable */
> #define EEH_OPT_THAW_MMIO 2 /* MMIO enable */
> #define EEH_OPT_THAW_DMA 3 /* DMA enable */
> +#define EEH_OPT_GET_PE_ADDR 0 /* Get PE addr */
> +#define EEH_OPT_GET_PE_MODE 1 /* Get PE mode */
> +#define EEH_PE_MODE_NONE 0 /* Invalid mode */
> +#define EEH_PE_MODE_NOT_SHARED 1 /* Not shared */
> +#define EEH_PE_MODE_SHARED 2 /* Shared mode */
> #define EEH_STATE_UNAVAILABLE (1 << 0) /* State unavailable */
> #define EEH_STATE_NOT_SUPPORT (1 << 1) /* EEH not supported */
> #define EEH_STATE_RESET_ACTIVE (1 << 2) /* Active reset */
> @@ -198,6 +203,11 @@ enum {
> #define EEH_STATE_DMA_ACTIVE (1 << 4) /* Active DMA */
> #define EEH_STATE_MMIO_ENABLED (1 << 5) /* MMIO enabled */
> #define EEH_STATE_DMA_ENABLED (1 << 6) /* DMA enabled */
> +#define EEH_PE_STATE_NORMAL 0 /* Normal state */
> +#define EEH_PE_STATE_RESET 1 /* PE reset */
> +#define EEH_PE_STATE_STOPPED_IO_DMA 2 /* Stopped */
> +#define EEH_PE_STATE_STOPPED_DMA 4 /* Stopped DMA */
> +#define EEH_PE_STATE_UNAVAIL 5 /* Unavailable */
> #define EEH_RESET_DEACTIVATE 0 /* Deactivate the PE reset */
> #define EEH_RESET_HOT 1 /* Hot reset */
> #define EEH_RESET_FUNDAMENTAL 3 /* Fundamental reset */
> @@ -305,6 +315,13 @@ void eeh_add_device_late(struct pci_dev *);
> void eeh_add_device_tree_late(struct pci_bus *);
> void eeh_add_sysfs_files(struct pci_bus *);
> void eeh_remove_device(struct pci_dev *);
> +int eeh_dev_open(struct pci_dev *pdev);
> +void eeh_dev_release(struct pci_dev *pdev);
> +int eeh_pe_set_option(struct pci_dev *pdev, int option);
> +int eeh_pe_get_addr(struct pci_dev *pdev, int option);
> +int eeh_pe_get_state(struct pci_dev *pdev);
> +int eeh_pe_reset(struct pci_dev *pdev, int option);
> +int eeh_pe_configure(struct pci_dev *pdev);
>
> /**
> * EEH_POSSIBLE_ERROR() -- test for possible MMIO failure.
> diff --git a/arch/powerpc/kernel/eeh.c b/arch/powerpc/kernel/eeh.c
> index 9c6b899..b90a474 100644
> --- a/arch/powerpc/kernel/eeh.c
> +++ b/arch/powerpc/kernel/eeh.c
> @@ -108,6 +108,9 @@ struct eeh_ops *eeh_ops = NULL;
> /* Lock to avoid races due to multiple reports of an error */
> DEFINE_RAW_SPINLOCK(confirm_error_lock);
>
> +/* Lock to protect passed flags */
> +static DEFINE_MUTEX(eeh_dev_mutex);
> +
> /* Buffer for reporting pci register dumps. Its here in BSS, and
> * not dynamically alloced, so that it ends up in RMO where RTAS
> * can access it.
> @@ -1098,6 +1101,324 @@ void eeh_remove_device(struct pci_dev *dev)
> edev->mode &= ~EEH_DEV_SYSFS;
> }
>
> +/**
> + * eeh_dev_open - Mark EEH device and PE as passed through
> + * @pdev: PCI device
> + *
> + * Mark the indicated EEH device and PE as passed through.
> + * In the result, the EEH errors detected on the PE won't be
> + * reported. The owner of the device will be responsible for
> + * detection and recovery.
> + */
> +int eeh_dev_open(struct pci_dev *pdev)
> +{
> + struct eeh_dev *edev;
> +
> + mutex_lock(&eeh_dev_mutex);
> +
> + /* No PCI device ? */
> + if (!pdev) {
> + mutex_unlock(&eeh_dev_mutex);
> + return -ENODEV;
> + }
> +
> + /* No EEH device ? */
> + edev = pci_dev_to_eeh_dev(pdev);
> + if (!edev || !edev->pe) {
> + mutex_unlock(&eeh_dev_mutex);
> + return -ENODEV;
> + }
> +
> + eeh_dev_set_passed(edev, true);
> + eeh_pe_set_passed(edev->pe, true);
> + mutex_unlock(&eeh_dev_mutex);
> +
> + return 0;
> +}
> +EXPORT_SYMBOL_GPL(eeh_dev_open);
> +
> +/**
> + * eeh_dev_release - Reclaim the ownership of EEH device
> + * @pdev: PCI device
> + *
> + * Reclaim ownership of EEH device, potentially the corresponding
> + * PE. In the result, the EEH errors detected on the PE will be
> + * reported and handled as usual.
> + */
> +void eeh_dev_release(struct pci_dev *pdev)
> +{
> + bool release_pe = true;
> + struct eeh_pe *pe = NULL;
> + struct eeh_dev *tmp, *edev;
> +
> + mutex_lock(&eeh_dev_mutex);
> +
> + /* No PCI device ? */
> + if (!pdev) {
> + mutex_unlock(&eeh_dev_mutex);
> + return;
> + }
> +
> + /* No EEH device ? */
> + edev = pci_dev_to_eeh_dev(pdev);
> + if (!edev || !eeh_dev_passed(edev) ||
> + !edev->pe || !eeh_pe_passed(pe)) {
> + mutex_unlock(&eeh_dev_mutex);
> + return;
> + }
> +
> + /* Release device */
> + pe = edev->pe;
> + eeh_dev_set_passed(edev, false);
> +
> + /* Release PE */
> + eeh_pe_for_each_dev(pe, edev, tmp) {
> + if (eeh_dev_passed(edev)) {
> + release_pe = false;
> + break;
> + }
> + }
> +
> + if (release_pe)
> + eeh_pe_set_passed(pe, false);
> +
> + mutex_unlock(&eeh_dev_mutex);
> +}
> +EXPORT_SYMBOL(eeh_dev_release);
> +
> +static int eeh_dev_check(struct pci_dev *pdev,
> + struct eeh_dev **pedev,
> + struct eeh_pe **ppe)
> +{
> + struct eeh_dev *edev;
> +
> + /* No device ? */
> + if (!pdev)
> + return -ENODEV;
> +
> + edev = pci_dev_to_eeh_dev(pdev);
> + if (!edev || !eeh_dev_passed(edev) ||
> + !edev->pe || !eeh_pe_passed(edev->pe))
> + return -ENODEV;
> +
> + if (pedev)
> + *pedev = edev;
> + if (ppe)
> + *ppe = edev->pe;
> +
> + return 0;
> +}
> +
> +/**
> + * eeh_pe_set_option - Set options for the indicated PE
> + * @pdev: PCI device
> + * @option: requested option
> + *
> + * The routine is called to enable or disable EEH functionality
> + * on the indicated PE, to enable IO or DMA for the frozen PE.
> + */
> +int eeh_pe_set_option(struct pci_dev *pdev, int option)
> +{
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int ret = 0;
> +
> + /* Device existing ? */
> + ret = eeh_dev_check(pdev, &edev, &pe);
> + if (ret)
> + return ret;
> +
> + switch (option) {
> + case EEH_OPT_DISABLE:
> + case EEH_OPT_ENABLE:
This deserves a comment
> + break;
> + case EEH_OPT_THAW_MMIO:
> + case EEH_OPT_THAW_DMA:
> + if (!eeh_ops || !eeh_ops->set_option) {
> + ret = -ENOENT;
> + break;
> + }
> +
> + ret = eeh_ops->set_option(pe, option);
> + break;
> + default:
> + pr_debug("%s: Option %d out of range (%d, %d)\n",
> + __func__, option, EEH_OPT_DISABLE, EEH_OPT_THAW_DMA);
> + ret = -EINVAL;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(eeh_pe_set_option);
> +
> +/**
> + * eeh_pe_get_addr - Retrieve the PE address or sharing mode
> + * @pdev: PCI device
> + * @option: option
> + *
> + * Retrieve the PE address or sharing mode.
> + */
> +int eeh_pe_get_addr(struct pci_dev *pdev, int option)
> +{
> + struct pci_bus *bus;
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int ret = 0;
> +
> + /* Device existing ? */
> + ret = eeh_dev_check(pdev, &edev, &pe);
> + if (ret)
Probably safer to write this as ret < 0. Positive return values are
success now.
> + return ret;
> +
> + switch (option) {
> + case EEH_OPT_GET_PE_ADDR:
> + bus = eeh_pe_bus_get(pe);
> + if (!bus) {
> + ret = -ENODEV;
> + break;
> + }
> +
> + /* PE address has format "00BBSS00" */
> + ret = bus->number << 16;
> + break;
> + case EEH_OPT_GET_PE_MODE:
> + /* Wa always have shared PE */
Wa?
> + ret = EEH_PE_MODE_SHARED;
> + break;
> + default:
> + pr_debug("%s: option %d out of range (0, 1)\n",
> + __func__, option);
> + ret = -EINVAL;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(eeh_pe_get_addr);
> +
> +/**
> + * eeh_pe_get_state - Retrieve PE's state
> + * @pdev: PCI device
> + *
> + * Retrieve the PE's state, which includes 3 aspects: enabled
> + * DMA, enabled IO and asserted reset.
> + */
> +int eeh_pe_get_state(struct pci_dev *pdev)
> +{
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int result, ret = 0;
> +
> + /* Device existing ? */
> + ret = eeh_dev_check(pdev, &edev, &pe);
> + if (ret)
> + return ret;
> +
> + if (!eeh_ops || !eeh_ops->get_state)
> + return -ENOENT;
> +
> + result = eeh_ops->get_state(pe, NULL);
> + if (!(result & EEH_STATE_RESET_ACTIVE) &&
> + (result & EEH_STATE_DMA_ENABLED) &&
> + (result & EEH_STATE_MMIO_ENABLED))
> + ret = EEH_PE_STATE_NORMAL;
> + else if (result & EEH_STATE_RESET_ACTIVE)
> + ret = EEH_PE_STATE_RESET;
> + else if (!(result & EEH_STATE_RESET_ACTIVE) &&
> + !(result & EEH_STATE_DMA_ENABLED) &&
> + !(result & EEH_STATE_MMIO_ENABLED))
> + ret = EEH_PE_STATE_STOPPED_IO_DMA;
> + else if (!(result & EEH_STATE_RESET_ACTIVE) &&
> + (result & EEH_STATE_DMA_ENABLED) &&
> + !(result & EEH_STATE_MMIO_ENABLED))
> + ret = EEH_PE_STATE_STOPPED_DMA;
> + else
> + ret = EEH_PE_STATE_UNAVAIL;
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(eeh_pe_get_state);
> +
> +/**
> + * eeh_pe_reset - Issue PE reset according to specified type
> + * @pdev: PCI device
> + * @option: reset type
> + *
> + * The routine is called to reset the specified PE with the
> + * indicated type, either fundamental reset or hot reset.
> + * PE reset is the most important part for error recovery.
> + */
> +int eeh_pe_reset(struct pci_dev *pdev, int option)
> +{
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int ret = 0;
> +
> + /* Device existing ? */
> + ret = eeh_dev_check(pdev, &edev, &pe);
> + if (ret)
> + return ret;
> +
> + if (!eeh_ops || !eeh_ops->set_option || !eeh_ops->reset)
> + return -ENOENT;
> +
> + switch (option) {
> + case EEH_RESET_DEACTIVATE:
> + ret = eeh_ops->reset(pe, option);
> + if (ret)
> + break;
> +
> + /*
> + * The PE is still in frozen state and we need clear
to
> + * that. It's good to clear frozen state after deassert
> + * to avoid messy IO access during reset, which might
> + * cause recursive frozen PE.
> + */
> + ret = eeh_ops->set_option(pe, EEH_OPT_THAW_MMIO);
> + if (!ret)
> + ret = eeh_ops->set_option(pe, EEH_OPT_THAW_DMA);
> + if (!ret)
> + eeh_pe_state_clear(pe, EEH_PE_ISOLATED);
> + break;
> + case EEH_RESET_HOT:
> + case EEH_RESET_FUNDAMENTAL:
> + ret = eeh_ops->reset(pe, option);
> + break;
> + default:
> + pr_debug("%s: Unsupported option %d\n",
> + __func__, option);
> + ret = -EINVAL;
> + }
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(eeh_pe_reset);
> +
> +/**
> + * eeh_pe_configure - Configure PCI bridges after PE reset
> + * @pdev: PCI device
> + *
> + * The routine is called to restore the PCI config space for
> + * those PCI devices, especially PCI bridges affected by PE
> + * reset issued previously.
So would it make sense to combine this with the reset call?
> + */
> +int eeh_pe_configure(struct pci_dev *pdev)
> +{
> + struct eeh_dev *edev;
> + struct eeh_pe *pe;
> + int ret = 0;
> +
> + /* Device existing ? */
> + ret = eeh_dev_check(pdev, &edev, &pe);
> + if (ret)
> + return ret;
> +
> + /* Restore config space for the affected devices */
> + eeh_pe_restore_bars(pe);
> +
> + return ret;
> +}
> +EXPORT_SYMBOL_GPL(eeh_pe_configure);
> +
> static int proc_eeh_show(struct seq_file *m, void *v)
> {
> if (!eeh_enabled()) {
> diff --git a/drivers/vfio/pci/vfio_pci.c b/drivers/vfio/pci/vfio_pci.c
> index 7ba0424..301ac18 100644
> --- a/drivers/vfio/pci/vfio_pci.c
> +++ b/drivers/vfio/pci/vfio_pci.c
> @@ -25,6 +25,9 @@
> #include <linux/types.h>
> #include <linux/uaccess.h>
> #include <linux/vfio.h>
> +#ifdef CONFIG_EEH
> +#include <asm/eeh.h>
> +#endif
>
> #include "vfio_pci_private.h"
>
> @@ -152,32 +155,57 @@ static void vfio_pci_disable(struct vfio_pci_device *vdev)
> pci_restore_state(pdev);
> }
>
> +static void vfio_eeh_pci_release(struct pci_dev *pdev)
> +{
> +#ifdef CONFIG_EEH
> + eeh_dev_release(pdev);
> +#endif
> +}
> +
> static void vfio_pci_release(void *device_data)
> {
> struct vfio_pci_device *vdev = device_data;
>
> - if (atomic_dec_and_test(&vdev->refcnt))
> + if (atomic_dec_and_test(&vdev->refcnt)) {
> + vfio_eeh_pci_release(vdev->pdev);
> vfio_pci_disable(vdev);
> + }
>
> module_put(THIS_MODULE);
> }
>
> +static int vfio_eeh_pci_open(struct pci_dev *pdev)
> +{
> + int ret = 0;
> +
> +#ifdef CONFIG_EEH
> + ret = eeh_dev_open(pdev);
> +#endif
> + return ret;
> +}
> +
> static int vfio_pci_open(void *device_data)
> {
> struct vfio_pci_device *vdev = device_data;
> + int ret;
>
> if (!try_module_get(THIS_MODULE))
> return -ENODEV;
>
> if (atomic_inc_return(&vdev->refcnt) == 1) {
> - int ret = vfio_pci_enable(vdev);
> - if (ret) {
> - module_put(THIS_MODULE);
> - return ret;
> - }
> + ret = vfio_pci_enable(vdev);
> + if (ret)
> + goto error;
> +
> + ret = vfio_eeh_pci_open(vdev->pdev);
> + if (ret)
> + goto error;
> }
>
> return 0;
> +error:
> + module_put(THIS_MODULE);
> + return ret;
> }
>
> static int vfio_pci_get_irq_count(struct vfio_pci_device *vdev, int irq_type)
> @@ -321,6 +349,91 @@ static int vfio_pci_for_each_slot_or_bus(struct pci_dev *pdev,
> return walk.ret;
> }
>
> +static int vfio_eeh_pci_ioctl(struct pci_dev *pdev,
> + unsigned int cmd,
> + unsigned long arg)
> +{
> + unsigned long minsz;
> + int ret = 0;
> +
> +#ifdef CONFIG_EEH
> + switch (cmd) {
> + case VFIO_EEH_PE_SET_OPTION: {
> + struct vfio_eeh_pe_set_option option;
> +
> + minsz = offsetofend(struct vfio_eeh_pe_set_option, option);
> + if (copy_from_user(&option, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (option.argsz < minsz)
> + return -EINVAL;
> +
> + ret = eeh_pe_set_option(pdev, option.option);
> + break;
> + }
> + case VFIO_EEH_PE_GET_ADDR: {
> + struct vfio_eeh_pe_get_addr addr;
> +
> + minsz = offsetofend(struct vfio_eeh_pe_get_addr, info);
> + if (copy_from_user(&addr, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (addr.argsz < minsz)
> + return -EINVAL;
> +
> + ret = eeh_pe_get_addr(pdev, addr.option);
> + if (ret >= 0) {
> + addr.info = ret;
> + if (copy_to_user((void __user *)arg, &addr, minsz))
> + return -EFAULT;
> + ret = 0;
> + }
> +
> + break;
> + }
> + case VFIO_EEH_PE_GET_STATE: {
> + struct vfio_eeh_pe_get_state state;
> +
> + minsz = offsetofend(struct vfio_eeh_pe_get_state, state);
> + if (copy_from_user(&state, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (state.argsz < minsz)
> + return -EINVAL;
> +
> + ret = eeh_pe_get_state(pdev);
> + if (ret >= 0) {
> + state.state = ret;
> + if (copy_to_user((void __user *)arg, &state, minsz))
> + return -EFAULT;
> + ret = 0;
> + }
> + break;
> + }
> + case VFIO_EEH_PE_RESET: {
> + struct vfio_eeh_pe_reset reset;
> +
> + minsz = offsetofend(struct vfio_eeh_pe_reset, option);
> + if (copy_from_user(&reset, (void __user *)arg, minsz))
> + return -EFAULT;
> + if (reset.argsz < minsz)
> + return -EINVAL;
> +
> + ret = eeh_pe_reset(pdev, reset.option);
> + break;
> + }
> + case VFIO_EEH_PE_CONFIGURE:
> + ret = eeh_pe_configure(pdev);
> + break;
> + default:
> + ret = -EINVAL;
> + pr_debug("%s: Cannot handle command %d\n",
> + __func__, cmd);
> + }
> +#else
> + ret = -ENOENT;
> +#endif
> +
> + return ret;
> +}
> +
> static long vfio_pci_ioctl(void *device_data,
> unsigned int cmd, unsigned long arg)
> {
> @@ -682,6 +795,12 @@ hot_reset_release:
>
> kfree(groups);
> return ret;
> + } else if (cmd == VFIO_EEH_PE_SET_OPTION ||
> + cmd == VFIO_EEH_PE_GET_ADDR ||
> + cmd == VFIO_EEH_PE_GET_STATE ||
> + cmd == VFIO_EEH_PE_RESET ||
> + cmd == VFIO_EEH_PE_CONFIGURE) {
> + return vfio_eeh_pci_ioctl(vdev->pdev, cmd, arg);
> }
>
> return -ENOTTY;
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index cb9023d..ef55682 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -455,6 +455,59 @@ struct vfio_iommu_spapr_tce_info {
>
> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO _IO(VFIO_TYPE, VFIO_BASE + 12)
>
> +/*
> + * EEH functionality can be enabled or disabled on one specific device.
> + * Also, the DMA or IO frozen state can be removed from the frozen PE
> + * if required.
> + */
> +struct vfio_eeh_pe_set_option {
> + __u32 argsz;
What is this argsz thing? Is this your way of maintaining backwards
compatibility when we introduce new fields? A new field will change the
ioctl number, so I don't think that makes a lot of sense :).
Just make the ioctl have a u32 as incoming argument. No fancy structs,
no complicated code.
The same applies for a number of structs below.
> + __u32 option;
> +};
> +
> +#define VFIO_EEH_PE_SET_OPTION _IO(VFIO_TYPE, VFIO_BASE + 21)
> +
> +/*
> + * Each EEH PE should have unique address to be identified. The command
> + * helps to retrieve the address and the sharing mode of the PE.
> + */
> +struct vfio_eeh_pe_get_addr {
> + __u32 argsz;
> + __u32 option;
> + __u32 info;
Any particular reason you need the info field? Can't the return value of
the ioctl hold this? Then you only have a single u32 argument left to
the ioctl again.
Alex
> +};
> +
> +#define VFIO_EEH_PE_GET_ADDR _IO(VFIO_TYPE, VFIO_BASE + 22)
> +
> +/*
> + * EEH PE might have been frozen because of PCI errors. Also, it might
> + * be experiencing reset for error revoery. The following command helps
> + * to get the state.
> + */
> +struct vfio_eeh_pe_get_state {
> + __u32 argsz;
> + __u32 state;
> +};
> +
> +#define VFIO_EEH_PE_GET_STATE _IO(VFIO_TYPE, VFIO_BASE + 23)
> +
> +/*
> + * Reset is the major step to recover problematic PE. The following
> + * command helps on that.
> + */
> +struct vfio_eeh_pe_reset {
> + __u32 argsz;
> + __u32 option;
> +};
> +
> +#define VFIO_EEH_PE_RESET _IO(VFIO_TYPE, VFIO_BASE + 24)
> +
> +/*
> + * One of the steps for recovery after PE reset is to configure the
> + * PCI bridges affected by the PE reset.
> + */
> +#define VFIO_EEH_PE_CONFIGURE _IO(VFIO_TYPE, VFIO_BASE + 25)
> +
> /* ***************************************************************** */
>
> #endif /* _UAPIVFIO_H */
More information about the Linuxppc-dev
mailing list