[PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device

Alex Williamson alex.williamson at redhat.com
Sat May 24 00:49:19 EST 2014


On Fri, 2014-05-23 at 14:30 +0200, Alexander Graf wrote:
> On 23.05.14 13:58, Gavin Shan wrote:
> > On Fri, May 23, 2014 at 08:52:23AM +0200, Alexander Graf wrote:
> >>
> >>> Am 23.05.2014 um 05:23 schrieb Alex Williamson <alex.williamson at redhat.com>:
> >>>
> >>>> On Fri, 2014-05-23 at 10:37 +1000, Gavin Shan wrote:
> >>>>> On Fri, May 23, 2014 at 10:17:30AM +1000, Gavin Shan wrote:
> >>>>>> On Thu, May 22, 2014 at 11:55:29AM +0200, Alexander Graf wrote:
> >>>>>> On 22.05.14 10:23, Gavin Shan wrote:
> >>>> .../...
> >>>>
> >>>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >>>>>>> index cb9023d..ef55682 100644
> >>>>>>> --- a/include/uapi/linux/vfio.h
> >>>>>>> +++ b/include/uapi/linux/vfio.h
> >>>>>>> @@ -455,6 +455,59 @@ struct vfio_iommu_spapr_tce_info {
> >>>>>>> #define VFIO_IOMMU_SPAPR_TCE_GET_INFO    _IO(VFIO_TYPE, VFIO_BASE + 12)
> >>>>>>> +/*
> >>>>>>> + * EEH functionality can be enabled or disabled on one specific device.
> >>>>>>> + * Also, the DMA or IO frozen state can be removed from the frozen PE
> >>>>>>> + * if required.
> >>>>>>> + */
> >>>>>>> +struct vfio_eeh_pe_set_option {
> >>>>>>> +    __u32 argsz;
> >>>>>> What is this argsz thing? Is this your way of maintaining backwards
> >>>>>> compatibility when we introduce new fields? A new field will change
> >>>>>> the ioctl number, so I don't think that makes a lot of sense :).
> >>>>>>
> >>>>>> Just make the ioctl have a u32 as incoming argument. No fancy
> >>>>>> structs, no complicated code.
> >>>>>>
> >>>>>> The same applies for a number of structs below.
> >>>>> ok. Will do in next revision.
> >>>> Rechecked include/uapi/linux/vfio.h, the data struct for each ioctl command
> >>>> always has "argsz". I guess it was used as checker by Alex.W. Do you really
> >>>> want remove "argsz" ?
> >>>
> >>> IIRC, this was actually a suggestion incorporated from David Gibson, but
> >>> using _IO with an argsz and flags field we can maintain compatibility
> >>> without bumping the ioctl number.  It really only makes sense if we have
> >>> a flags field so we can identify what additional information is being
> >>> provided.  Flags can be used as a bitmap of trailing structures or as
> >>> revision if we want a set of trailing structures that may change over
> >>> time.  Unless you can come up with a good argument against it that would
> >>> prevent us inventing a new ioctl as soon as we need a minor tweak, I'd
> >>> prefer to keep it.  As I noted in a previous comment, the one ioctl we
> >>> have for reset that doesn't take any options is likely going to be the
> >>> first ioctl that we need to entirely replace.  If we don't keep argsz,
> >>> it seems like we probably need a flags field and reserved structures.
> >>>
> >>>>>>> +    __u32 option;
> >>>>>>> +};
> >>>>>>> +
> >>>>>>> +#define VFIO_EEH_PE_SET_OPTION        _IO(VFIO_TYPE, VFIO_BASE + 21)
> >>>>>>> +
> >>>>>>> +/*
> >>>>>>> + * Each EEH PE should have unique address to be identified. The command
> >>>>>>> + * helps to retrieve the address and the sharing mode of the PE.
> >>>>>>> + */
> >>>>>>> +struct vfio_eeh_pe_get_addr {
> >>>>>>> +    __u32 argsz;
> >>>>>>> +    __u32 option;
> >>>>>>> +    __u32 info;
> >>>>>> Any particular reason you need the info field? Can't the return value
> >>>>>> of the ioctl hold this? Then you only have a single u32 argument left
> >>>>>> to the ioctl again.
> >>>>> ok. Will do in next revision.
> >>>> If we eventually remove "argsz" and let ioctl() return value to hold
> >>>> information (or negative number for errors), we don't need any data
> >>>> struct because the 3rd parameter of ioctl() would be used as input
> >>>> and I only need one input parameter. Do you want see this ?
> >>>>
> >>>> Hopefully, Alex.W saw this and hasn't objections :)
> >>> I'm not sure why we're pushing for the minimal data set to pass to an
> >>> ioctl.  Seems like a recipe for dead, useless ioctls.  Thanks,
> >>>
> >> The ioctl number includes sizeof(payload). So if a new parameter gets added, that would be a different ioctl number.
> >>
> >> If you want to maintain backwards compatibility ioctl number wise in the kernel, you'll have to have a "flags" field to indicate whether new data is available and a "pad" field, prefarably in a union, that ensures the size of the struct doesn't change.
> >>
> >> I'm not sure it's really necessary here to have identical ioctl numbers if we add parameters, since we can always just define a new ioctl with a bigger payload that can then become the default handler and a shim backwards compatible handler with the old number.
> >>
> >> But if you think it is important, let's do it for real, not just halfway.
> >>
> > So I need add additional field "flags" ?
> 
> This is Alex' call on how he prefers the interface to look like. The 
> struct size that you pass into an ioctl is part of the ioctl number,

Only when using _IOR/W/RW, the size is not part of the ioctl number for
_IO, which is why VFIO uses _IO for interfaces with argsz + flags.

>  so 
> putting that into an argsz field only makes sense when you do ugly 
> things like
> 
> struct foo *x;
> uint32_t *y;
> 
> x = malloc(sizeof(*x) + sizeof(*y));
> y = (void*)&x[1];
> 
> ioctl(..., x);
> 
> But I'd prefer not to have such nasty code really. It breaks assumptions 
> on *anything* that wraps ioctls, such as strace or qemu's linux-user 
> emulation.

If there's a reason to use something other than _IO+argsz+flags, we can
discuss it, but this is not it.

> >   Also, I need keep the return value from
> > ioctl() less or equal to 0 ? :-)
> 
> Usually return values are
> 
>    < 0 means error
>    == 0 means success
>    > 0 means return payload

Agree, >0 only makes sense if the return value of the ioctl can be
expressed as a positive int, ex. VFIO_EEH_ERROR_COUNT (if such a thing
was needed).  Thanks,

Alex




More information about the Linuxppc-dev mailing list