[PATCH kernel] powerpc/iommu: Add iommu_ops to report capabilities and allow blocking domains

Alexey Kardashevskiy aik at ozlabs.ru
Fri Jul 8 15:00:03 AEST 2022



On 7/8/22 01:10, Jason Gunthorpe wrote:
> On Thu, Jul 07, 2022 at 11:55:52PM +1000, Alexey Kardashevskiy wrote:
>> Historically PPC64 managed to avoid using iommu_ops. The VFIO driver
>> uses a SPAPR TCE sub-driver and all iommu_ops uses were kept in
>> the Type1 VFIO driver. Recent development though has added a coherency
>> capability check to the generic part of VFIO and essentially disabled
>> VFIO on PPC64; the similar story about iommu_group_dma_owner_claimed().
>>
>> This adds an iommu_ops stub which reports support for cache
>> coherency. Because bus_set_iommu() triggers IOMMU probing of PCI devices,
>> this provides minimum code for the probing to not crash.
>>
>> Because now we have to set iommu_ops to the system (bus_set_iommu() or
>> iommu_device_register()), this requires the POWERNV PCI setup to happen
>> after bus_register(&pci_bus_type) which is postcore_initcall
>> TODO: check if it still works, read sha1, for more details:
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5537fcb319d016ce387
>>
>> Because setting the ops triggers probing, this does not work well with
>> iommu_group_add_device(), hence the move to iommu_probe_device().
>>
>> Because iommu_probe_device() does not take the group (which is why
>> we had the helper in the first place), this adds
>> pci_controller_ops::device_group.
>>
>> So, basically there is one iommu_device per PHB and devices are added to
>> groups indirectly via series of calls inside the IOMMU code.
>>
>> pSeries is out of scope here (a minor fix needed for barely supported
>> platform in regard to VFIO).
>>
>> The previous discussion is here:
>> https://patchwork.ozlabs.org/project/kvm-ppc/patch/20220701061751.1955857-1-aik@ozlabs.ru/
> 
> I think this is basically OK, for what it is. It looks like there is
> more some-day opportunity to make use of the core infrastructure though.
> 
>> does it make sense to have this many callbacks, or
>> the generic IOMMU code can safely operate without some
>> (given I add some more checks for !NULL)? thanks,
> 
> I wouldn't worry about it..
> 
>> @@ -1156,7 +1158,10 @@ int iommu_add_device(struct iommu_table_group *table_group, struct device *dev)
>>   	pr_debug("%s: Adding %s to iommu group %d\n",
>>   		 __func__, dev_name(dev),  iommu_group_id(table_group->group));
>>   
>> -	return iommu_group_add_device(table_group->group, dev);
>> +	ret = iommu_probe_device(dev);
>> +	dev_info(dev, "probed with %d\n", ret);
> 
> For another day, but it seems a bit strange to call iommu_probe_device() like this?
> Shouldn't one of the existing call sites cover this? The one in
> of_iommu.c perhaps?


It looks to me that of_iommu.c expects the iommu setup to happen before 
linux starts as linux looks for #iommu-cells or iommu-map properties in 
the device tree. The powernv firmware (aka skiboot) does not do this and 
it is linux which manages iommu groups.


>> +static bool spapr_tce_iommu_is_attach_deferred(struct device *dev)
>> +{
>> +       return false;
>> +}
> 
> I think you can NULL this op:
> 
> static bool iommu_is_attach_deferred(struct device *dev)
> {
> 	const struct iommu_ops *ops = dev_iommu_ops(dev);
> 
> 	if (ops->is_attach_deferred)
> 		return ops->is_attach_deferred(dev);
> 
> 	return false;
> }
> 
>> +static struct iommu_group *spapr_tce_iommu_device_group(struct device *dev)
>> +{
>> +	struct pci_controller *hose;
>> +	struct pci_dev *pdev;
>> +
>> +	/* Weirdly iommu_device_register() assigns the same ops to all buses */
>> +	if (!dev_is_pci(dev))
>> +		return ERR_PTR(-EPERM);
>> +
>> +	pdev = to_pci_dev(dev);
>> +	hose = pdev->bus->sysdata;
>> +
>> +	if (!hose->controller_ops.device_group)
>> +		return ERR_PTR(-ENOENT);
>> +
>> +	return hose->controller_ops.device_group(hose, pdev);
>> +}
> 
> Is this missing a refcount get on the group?
> 
>> +
>> +static int spapr_tce_iommu_attach_dev(struct iommu_domain *dom,
>> +				      struct device *dev)
>> +{
>> +	return 0;
>> +}
> 
> It is important when this returns that the iommu translation is all
> emptied. There should be no left over translations from the DMA API at
> this point. I have no idea how power works in this regard, but it
> should be explained why this is safe in a comment at a minimum.
>
 > It will turn into a security problem to allow kernel mappings to leak
 > past this point.
 >

I've added for v2 checking for no valid mappings for a device (or, more 
precisely, in the associated iommu_group), this domain does not need 
checking, right?

In general, is "domain" something from hardware or it is a software 
concept? Thanks,


> Jason

-- 
Alexey


More information about the Linuxppc-dev mailing list