[PATCH v13 06/10] iommu: Add a page fault handler
Jean-Philippe Brucker
jean-philippe at linaro.org
Tue Mar 23 21:50:03 AEDT 2021
On Tue, Mar 02, 2021 at 03:59:57PM -0800, Jacob Pan wrote:
> Hi Jean-Philippe,
>
> A few comments from the p.o.v of converting VT-d to this framework. Mostly
> about potential optimization. I think VT-d SVA code will be able to use this
> work.
> +Ashok provided many insight.
>
> FWIW,
> Reviewed-by:Jacob Pan <jacob.jun.pan at linux.intel.com>
Thanks!
> On Tue, 2 Mar 2021 10:26:42 +0100, Jean-Philippe Brucker
> <jean-philippe at linaro.org> wrote:
> > +int iommu_queue_iopf(struct iommu_fault *fault, void *cookie)
> > +{
> > + int ret;
> > + struct iopf_group *group;
> > + struct iopf_fault *iopf, *next;
> > + struct iopf_device_param *iopf_param;
> > +
> > + struct device *dev = cookie;
> > + struct dev_iommu *param = dev->iommu;
> > +
> > + lockdep_assert_held(¶m->lock);
> > +
> > + if (fault->type != IOMMU_FAULT_PAGE_REQ)
> > + /* Not a recoverable page fault */
> > + return -EOPNOTSUPP;
> > +
> > + /*
> > + * As long as we're holding param->lock, the queue can't be
> > unlinked
> > + * from the device and therefore cannot disappear.
> > + */
> > + iopf_param = param->iopf_param;
> > + if (!iopf_param)
> > + return -ENODEV;
> > +
> > + if (!(fault->prm.flags & IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE)) {
> > + iopf = kzalloc(sizeof(*iopf), GFP_KERNEL);
> > + if (!iopf)
> > + return -ENOMEM;
> > +
> > + iopf->fault = *fault;
> > +
> > + /* Non-last request of a group. Postpone until the last
> > one */
> Would be nice to have an option here to allow non-deferred handle_mm_fault.
> Some devices may have a large group.
>
> FYI, VT-d also needs to send page response before the last one (LPIG).
> "A Page Group Response Descriptor is issued by software in response to a
> page request with data or a page request (with or without data) that
> indicated that it was the last request in a group."
>
> But I think we deal with that when we convert. Perhaps just treat the
> request with data as LPIG.
Sure that seems fine. Do you mean that the vt-d driver will set the
IOMMU_FAULT_PAGE_REQUEST_LAST_PAGE flag for PR-with-data? Otherwise we
could introduce a new flag. I prefer making it explicit rather than having
IOPF consumers infer that a response is needed when seeing
IOMMU_FAULT_PAGE_REQUEST_PRIV_DATA set, because private_data wouldn't be
reusable by other architectures.
> Also adding a trace event would be nice, similar to CPU page fault.
Agreed, the tracepoints in my development tree (along with the lower-level
page_request/response tracepoints) have been indispensable for debugging
hardware and SVA applications
> > + list_add(&iopf->list, &iopf_param->partial);
> > +
> > + return 0;
> > + }
> > +
> > + group = kzalloc(sizeof(*group), GFP_KERNEL);
> > + if (!group) {
> > + /*
> > + * The caller will send a response to the hardware. But
> > we do
> > + * need to clean up before leaving, otherwise partial
> > faults
> > + * will be stuck.
> > + */
> > + ret = -ENOMEM;
> > + goto cleanup_partial;
> > + }
> > +
> > + group->dev = dev;
> > + group->last_fault.fault = *fault;
> > + INIT_LIST_HEAD(&group->faults);
> > + list_add(&group->last_fault.list, &group->faults);
> > + INIT_WORK(&group->work, iopf_handle_group);
> > +
> > + /* See if we have partial faults for this group */
> > + list_for_each_entry_safe(iopf, next, &iopf_param->partial, list)
> > {
> > + if (iopf->fault.prm.grpid == fault->prm.grpid)
> Just curious, the iopf handler is registered per arm_smmu_master dev. Is
> the namespace of group ID also within an arm_smmu_master?
Yes for both PCIe PRI and SMMU stall, the namespace is within one device
(one RequesterID or StreamID)
> Can one arm_smmu_master support multiple devices?
No, it's a single device
>
> For VT-d, group ID is per PCI device.
>
> > + /* Insert *before* the last fault */
> > + list_move(&iopf->list, &group->faults);
> > + }
> > +
> This is fine with devices supports small number of outstanding PRQs.
> VT-d is setting the limit to 32. But the incoming DSA device will support
> 512.
>
> So if we pre-sort IOPF by group ID and put them in a per group list, would
> it be faster? I mean once the LPIG comes in, we just need to search the
> list of groups instead of all partial faults. I am not against what is done
> here, just exploring optimization.
Yes I think that's worth exploring, keeping the groups in a rb_tree or something
like that, for easy access and update. Note that I don't have a system
with non-LPIG faults, so I can't test any of this at the moment
Thanks,
Jean
More information about the Linux-accelerators
mailing list