[PATCH kernel 14/15] vfio/spapr_tce: Export container API for external users

Alexey Kardashevskiy aik at ozlabs.ru
Wed Aug 10 15:37:17 AEST 2016


On 09/08/16 22:16, Alex Williamson wrote:
> On Tue, 9 Aug 2016 15:19:39 +1000
> Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
> 
>> On 09/08/16 02:43, Alex Williamson wrote:
>>> On Wed,  3 Aug 2016 18:40:55 +1000
>>> Alexey Kardashevskiy <aik at ozlabs.ru> wrote:
>>>   
>>>> This exports helpers which are needed to keep a VFIO container in
>>>> memory while there are external users such as KVM.
>>>>
>>>> Signed-off-by: Alexey Kardashevskiy <aik at ozlabs.ru>
>>>> ---
>>>>  drivers/vfio/vfio.c                 | 30 ++++++++++++++++++++++++++++++
>>>>  drivers/vfio/vfio_iommu_spapr_tce.c | 16 +++++++++++++++-
>>>>  include/linux/vfio.h                |  6 ++++++
>>>>  3 files changed, 51 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
>>>> index d1d70e0..baf6a9c 100644
>>>> --- a/drivers/vfio/vfio.c
>>>> +++ b/drivers/vfio/vfio.c
>>>> @@ -1729,6 +1729,36 @@ long vfio_external_check_extension(struct vfio_group *group, unsigned long arg)
>>>>  EXPORT_SYMBOL_GPL(vfio_external_check_extension);
>>>>  
>>>>  /**
>>>> + * External user API for containers, exported by symbols to be linked
>>>> + * dynamically.
>>>> + *
>>>> + */
>>>> +struct vfio_container *vfio_container_get_ext(struct file *filep)
>>>> +{
>>>> +	struct vfio_container *container = filep->private_data;
>>>> +
>>>> +	if (filep->f_op != &vfio_fops)
>>>> +		return ERR_PTR(-EINVAL);
>>>> +
>>>> +	vfio_container_get(container);
>>>> +
>>>> +	return container;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_container_get_ext);
>>>> +
>>>> +void vfio_container_put_ext(struct vfio_container *container)
>>>> +{
>>>> +	vfio_container_put(container);
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_container_put_ext);
>>>> +
>>>> +void *vfio_container_get_iommu_data_ext(struct vfio_container *container)
>>>> +{
>>>> +	return container->iommu_data;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_container_get_iommu_data_ext);
>>>> +
>>>> +/**
>>>>   * Sub-module support
>>>>   */
>>>>  /*
>>>> diff --git a/drivers/vfio/vfio_iommu_spapr_tce.c b/drivers/vfio/vfio_iommu_spapr_tce.c
>>>> index 3594ad3..fceea3d 100644
>>>> --- a/drivers/vfio/vfio_iommu_spapr_tce.c
>>>> +++ b/drivers/vfio/vfio_iommu_spapr_tce.c
>>>> @@ -1331,6 +1331,21 @@ const struct vfio_iommu_driver_ops tce_iommu_driver_ops = {
>>>>  	.detach_group	= tce_iommu_detach_group,
>>>>  };
>>>>  
>>>> +struct iommu_table *vfio_container_spapr_tce_table_get_ext(void *iommu_data,
>>>> +		u64 offset)
>>>> +{
>>>> +	struct tce_container *container = iommu_data;
>>>> +	struct iommu_table *tbl = NULL;
>>>> +
>>>> +	if (tce_iommu_find_table(container, offset, &tbl) < 0)
>>>> +		return NULL;
>>>> +
>>>> +	iommu_table_get(tbl);
>>>> +
>>>> +	return tbl;
>>>> +}
>>>> +EXPORT_SYMBOL_GPL(vfio_container_spapr_tce_table_get_ext);
>>>> +
>>>>  static int __init tce_iommu_init(void)
>>>>  {
>>>>  	return vfio_register_iommu_driver(&tce_iommu_driver_ops);
>>>> @@ -1348,4 +1363,3 @@ MODULE_VERSION(DRIVER_VERSION);
>>>>  MODULE_LICENSE("GPL v2");
>>>>  MODULE_AUTHOR(DRIVER_AUTHOR);
>>>>  MODULE_DESCRIPTION(DRIVER_DESC);
>>>> -
>>>> diff --git a/include/linux/vfio.h b/include/linux/vfio.h
>>>> index 0ecae0b..1c2138a 100644
>>>> --- a/include/linux/vfio.h
>>>> +++ b/include/linux/vfio.h
>>>> @@ -91,6 +91,12 @@ extern void vfio_group_put_external_user(struct vfio_group *group);
>>>>  extern int vfio_external_user_iommu_id(struct vfio_group *group);
>>>>  extern long vfio_external_check_extension(struct vfio_group *group,
>>>>  					  unsigned long arg);
>>>> +extern struct vfio_container *vfio_container_get_ext(struct file *filep);
>>>> +extern void vfio_container_put_ext(struct vfio_container *container);
>>>> +extern void *vfio_container_get_iommu_data_ext(
>>>> +		struct vfio_container *container);
>>>> +extern struct iommu_table *vfio_container_spapr_tce_table_get_ext(
>>>> +		void *iommu_data, u64 offset);
>>>>  
>>>>  /*
>>>>   * Sub-module helpers  
>>>
>>>
>>> I think you need to take a closer look of the lifecycle of a container,
>>> having a reference means the container itself won't go away, but only
>>> having a group set within that container holds the actual IOMMU
>>> references.  container->iommu_data is going to be NULL once the
>>> groups are lost.  Thanks,  
>>
>>
>> Container owns the iommu tables and this is what I care about here, groups
>> attached or not - this is handled separately via IOMMU group list in a
>> specific iommu_table struct, these groups get detached from iommu_table
>> when they are removed from a container.
> 
> The container doesn't own anything, the container is privileged by the
> groups being attached to it.  When groups are closed, they detach from
> the container and once the container group list is empty the iommu
> backend is released and iommu_data is NULL.  A container reference
> doesn't give you what you're looking for.  It implies nothing about the
> iommu backend.


Well. Backend is a part of a container and since a backend owns tables, a
container owns them too.

The problem I am trying to solve here is when KVM may release the
iommu_table objects.

"Set" ioctl() to KVM-spapr-tce-table (or KVM itself, does not really
matter) makes a link between KVM-spapr-tce-table and container and KVM can
start using tables (with referencing them).

First I tried adding an "unset" ioctl to KVM-spapr-tce-table, called it
from region_del() and this works if QEMU removes a window. However if QEMU
removes a vfio-pci device, region_del() is not called and KVM does not get
notified that it can release the iommu_table's because the
KVM-spapr-tce-table remains alive and does not get destroyed (as it is
still used by emulated devices or other containers).

So it was suggested that we could do such "unset" somehow later assuming,
for example, on every "set" I could check if some of currently attached
containers are no more used - and this is where being able to know if there
is no backend helps - KVM remembers a container pointer and can check this
via vfio_container_get_iommu_data_ext().

The other option would be changing vfio_container_get_ext() to take a
callback+opaque which container would call when it destroys iommu_data.
This looks more intrusive and not very intuitive how to make it right -
container would have to keep track of all registered external users and
vfio_container_put_ext() would have to pass the same callback+opaque to
unregister the exact external user.


Or I could store container file* in KVM. Then iommu_data would never be
released until KVM-spapr-tce-table is destroyed.

Recreating KVM-spapr-tce-table on every vfio-pci hotunplug (closing its fd
would "unset" container from KVM-spapr-tce-table) is not an option as there
still may be devices using this KVM-spapr-tce-table.

What obvious and nice solution am I missing here? Thanks.


-- 
Alexey


More information about the Linuxppc-dev mailing list