[PATCH v6 2/3] drivers/vfio: EEH support for VFIO PCI device

Alexander Graf agraf at suse.de
Fri May 23 22:30:28 EST 2014

On 23.05.14 13:58, Gavin Shan wrote:
> On Fri, May 23, 2014 at 08:52:23AM +0200, Alexander Graf wrote:
>>> Am 23.05.2014 um 05:23 schrieb Alex Williamson <alex.williamson at redhat.com>:
>>>> On Fri, 2014-05-23 at 10:37 +1000, Gavin Shan wrote:
>>>>> On Fri, May 23, 2014 at 10:17:30AM +1000, Gavin Shan wrote:
>>>>>> On Thu, May 22, 2014 at 11:55:29AM +0200, Alexander Graf wrote:
>>>>>> On 22.05.14 10:23, Gavin Shan wrote:
>>>> .../...
>>>>>>> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>>>>>>> index cb9023d..ef55682 100644
>>>>>>> --- a/include/uapi/linux/vfio.h
>>>>>>> +++ b/include/uapi/linux/vfio.h
>>>>>>> @@ -455,6 +455,59 @@ struct vfio_iommu_spapr_tce_info {
>>>>>>> +/*
>>>>>>> + * EEH functionality can be enabled or disabled on one specific device.
>>>>>>> + * Also, the DMA or IO frozen state can be removed from the frozen PE
>>>>>>> + * if required.
>>>>>>> + */
>>>>>>> +struct vfio_eeh_pe_set_option {
>>>>>>> +    __u32 argsz;
>>>>>> What is this argsz thing? Is this your way of maintaining backwards
>>>>>> compatibility when we introduce new fields? A new field will change
>>>>>> the ioctl number, so I don't think that makes a lot of sense :).
>>>>>> Just make the ioctl have a u32 as incoming argument. No fancy
>>>>>> structs, no complicated code.
>>>>>> The same applies for a number of structs below.
>>>>> ok. Will do in next revision.
>>>> Rechecked include/uapi/linux/vfio.h, the data struct for each ioctl command
>>>> always has "argsz". I guess it was used as checker by Alex.W. Do you really
>>>> want remove "argsz" ?
>>> IIRC, this was actually a suggestion incorporated from David Gibson, but
>>> using _IO with an argsz and flags field we can maintain compatibility
>>> without bumping the ioctl number.  It really only makes sense if we have
>>> a flags field so we can identify what additional information is being
>>> provided.  Flags can be used as a bitmap of trailing structures or as
>>> revision if we want a set of trailing structures that may change over
>>> time.  Unless you can come up with a good argument against it that would
>>> prevent us inventing a new ioctl as soon as we need a minor tweak, I'd
>>> prefer to keep it.  As I noted in a previous comment, the one ioctl we
>>> have for reset that doesn't take any options is likely going to be the
>>> first ioctl that we need to entirely replace.  If we don't keep argsz,
>>> it seems like we probably need a flags field and reserved structures.
>>>>>>> +    __u32 option;
>>>>>>> +};
>>>>>>> +
>>>>>>> +#define VFIO_EEH_PE_SET_OPTION        _IO(VFIO_TYPE, VFIO_BASE + 21)
>>>>>>> +
>>>>>>> +/*
>>>>>>> + * Each EEH PE should have unique address to be identified. The command
>>>>>>> + * helps to retrieve the address and the sharing mode of the PE.
>>>>>>> + */
>>>>>>> +struct vfio_eeh_pe_get_addr {
>>>>>>> +    __u32 argsz;
>>>>>>> +    __u32 option;
>>>>>>> +    __u32 info;
>>>>>> Any particular reason you need the info field? Can't the return value
>>>>>> of the ioctl hold this? Then you only have a single u32 argument left
>>>>>> to the ioctl again.
>>>>> ok. Will do in next revision.
>>>> If we eventually remove "argsz" and let ioctl() return value to hold
>>>> information (or negative number for errors), we don't need any data
>>>> struct because the 3rd parameter of ioctl() would be used as input
>>>> and I only need one input parameter. Do you want see this ?
>>>> Hopefully, Alex.W saw this and hasn't objections :)
>>> I'm not sure why we're pushing for the minimal data set to pass to an
>>> ioctl.  Seems like a recipe for dead, useless ioctls.  Thanks,
>> The ioctl number includes sizeof(payload). So if a new parameter gets added, that would be a different ioctl number.
>> If you want to maintain backwards compatibility ioctl number wise in the kernel, you'll have to have a "flags" field to indicate whether new data is available and a "pad" field, prefarably in a union, that ensures the size of the struct doesn't change.
>> I'm not sure it's really necessary here to have identical ioctl numbers if we add parameters, since we can always just define a new ioctl with a bigger payload that can then become the default handler and a shim backwards compatible handler with the old number.
>> But if you think it is important, let's do it for real, not just halfway.
> So I need add additional field "flags" ?

This is Alex' call on how he prefers the interface to look like. The 
struct size that you pass into an ioctl is part of the ioctl number, so 
putting that into an argsz field only makes sense when you do ugly 
things like

struct foo *x;
uint32_t *y;

x = malloc(sizeof(*x) + sizeof(*y));
y = (void*)&x[1];

ioctl(..., x);

But I'd prefer not to have such nasty code really. It breaks assumptions 
on *anything* that wraps ioctls, such as strace or qemu's linux-user 

>   Also, I need keep the return value from
> ioctl() less or equal to 0 ? :-)

Usually return values are

   < 0 means error
   == 0 means success
   > 0 means return payload


More information about the Linuxppc-dev mailing list