[RFC PATCH 17/17] KVM: PPC: Add an ioctl for userspace to select which platform to emulate

Alexander Graf agraf at suse.de
Sun Jul 3 18:34:33 EST 2011


On 03.07.2011, at 10:15, Avi Kivity wrote:

> On 06/30/2011 07:33 PM, Alexander Graf wrote:
>> On 30.06.2011, at 18:00, Avi Kivity<avi at redhat.com>  wrote:
>> 
>> >  On 06/30/2011 06:22 PM, Alexander Graf wrote:
>> >>>  Regarding that.  There's another option - the ioctl code embeds the structure size.  So if we extend the ioctl parsing to pad up (or truncate down) from the user's size to our size, and similarly in the other direction, we can get away from this ugliness.
>> >>>
>> >>>  Some years ago I posted a generic helper that did this (and also kmalloc'ed and kfree'd the data itself), but it wasn't received favourably.  Maybe I should try again (and we can possibly use it in kvm even if it is rejected for general use, though that's against our principles of pushing all generic infrastructure to the wider kernel).
>> >>
>> >>
>> >>  That does sound interesting, but requires a lot more thought to be put into the actual code, as we basically need to read out the feature bitmap, then provide a minimum size for the chosen features and then decide if they fit in.
>> >
>> >
>> >  Why? just put the things you want in the structure.
>> >
>> >  old userspace ->  new kernel: we auto-zero the parts userspace left out, and zero means old behaviour, so everthing works
>> >  new userspace ->  old kernel: truncate.  Userspace shouldn't have used any new features (KVM_CAP), and we can -EINVAL if the truncated section contains a nonzero bit.
>> 
>> Yup, which requires knowledge in the code on what actually fits :). Logic we don't have today.
> 
> I don't follow.  What knowledge is required?  Please give an example.

Sure. Let's take an easy example Currently we have for get_pvinfo:

        case KVM_PPC_GET_PVINFO: {
                struct kvm_ppc_pvinfo pvinfo;
                memset(&pvinfo, 0, sizeof(pvinfo));
                r = kvm_vm_ioctl_get_pvinfo(&pvinfo);
                if (copy_to_user(argp, &pvinfo, sizeof(pvinfo))) {
                        r = -EFAULT;
                        goto out;
                }

                break;
        }

        /* for KVM_PPC_GET_PVINFO */
        struct kvm_ppc_pvinfo {
                /* out */
                __u32 flags;
                __u32 hcall[4];
                __u8  pad[108];
        };

The padding would not be there with your idea. An updated version could look like this:

        /* for KVM_PPC_GET_PVINFO */
        struct kvm_ppc_pvinfo {
                /* out */
                __u32 flags;
                __u32 hcall[4];
                __u64 features;  /* only there with PVINFO_FLAGS_FEATURES */
        };

Now, your idea was to not use copy_from/to_user directly, but instead some wrapper that could pad with zeros on read or truncate on write. So instead we would essentially get:

        int kvm_vm_ioctl_get_pvinfo(struct kvm_ppc_pvinfo *pvinfo, int *required_size)
        {
                [...]
		if (pvinfo_flags & PVINFO_FLAGS_FEATURES) {
                        *required_size = 16;
                } else {
                        *required_size = 8;
                }
                [...]
        }

        case KVM_PPC_GET_PVINFO: {
                struct kvm_ppc_pvinfo pvinfo;
                int required_size = 0;
                memset(&pvinfo, 0, sizeof(pvinfo));
                r = kvm_vm_ioctl_get_pvinfo(&pvinfo, &required_size);
                if (copy_to_user(argp, &pvinfo, required_size) {
                        r = -EFAULT;
                        goto out;
                }

                break;
        }

Otherwise we might write over data the user expected. And that logic that tells to copy_to_user how much data it actually takes to put all the information in is not there today and would have to be added. You can even verify that required_size with the ioctl passed size to make 100% sure user space is sane, but I'd claim that a feature bitmap is plenty of information to ensure that we're not doing something stupid.


Alex



More information about the Linuxppc-dev mailing list