[PATCH v9 03/10] iommu: Separate IOMMU_DEV_FEAT_IOPF from IOMMU_DEV_FEAT_SVA

Wed Jan 20 12:57:42 AEDT 2021

Hi Jean,

On 1/19/21 6:16 PM, Jean-Philippe Brucker wrote:
> On Mon, Jan 18, 2021 at 06:54:28AM +0000, Tian, Kevin wrote:
>>> From: Lu Baolu <baolu.lu at linux.intel.com>
>>> Sent: Saturday, January 16, 2021 11:54 AM
>>>
>>> Hi Jean,
>>>
>>> On 2021/1/15 0:41, Jean-Philippe Brucker wrote:
>>>> I guess detailing what's needed for nested IOPF can help the discussion,
>>>> although I haven't seen any concrete plan about implementing it, and it
>>>> still seems a couple of years away. There are two important steps with
>>>> nested IOPF:
>>>>
>>>> (1) Figuring out whether a fault comes from L1 or L2. A SMMU stall event
>>>>       comes with this information, but a PRI page request doesn't. The
>>> IOMMU
>>>>       driver has to first translate the IOVA to a GPA, injecting the fault
>>>>       into the guest if this translation fails by using the usual
>>>>       iommu_report_device_fault().
>>
>> The IOMMU driver can walk the page tables to find out the level information.
>> If the walk terminates at the 1st level, inject to the guest. Otherwise fix the
>> mm fault at 2nd level. It's not efficient compared to hardware-provided info,
>> but it's doable and actual overhead needs to be measured (optimization exists
>> e.g. having fault client to hint no 2nd level fault expected when registering fault
>> handler in pinned case).
>>
>>>>
>>>> (2) Translating the faulting GPA to a HVA that can be fed to
>>>>       handle_mm_fault(). That requires help from KVM, so another interface -
>>>>       either KVM registering GPA->HVA translation tables or IOMMU driver
>>>>       querying each translation. Either way it should be reusable by device
>>>>       drivers that implement IOPF themselves.
>>
>> Or just leave to the fault client (say VFIO here) to figure it out. VFIO has the
>> information about GPA->HPA and can then call handle_mm_fault to fix the
>> received fault. The IOMMU driver just exports an interface for the device drivers
>> which implement IOPF themselves to report a fault which is then handled by
>> the IOMMU core by reusing the same faulting path.
>>
>>>>
>>>> (1) could be enabled with iommu_dev_enable_feature(). (2) requires a
>>> more
>>>> complex interface. (2) alone might also be desirable - demand-paging for
>>>> level 2 only, no SVA for level 1.
>>
>> Yes, this is what we want to point out. A general FEAT_IOPF implies more than
>> what this patch intended to address.
>>
>>>>
>>>> Anyway, back to this patch. What I'm trying to convey is "can the IOMMU
>>>> receive incoming I/O page faults for this device and, when SVA is enabled,
>>>> feed them to the mm subsystem?  Enable that or return an error." I'm stuck
>>>> on the name. IOPF alone is too vague. Not IOPF_L1 as Kevin noted, since L1
>>>> is also used in virtualization. IOPF_BIND and IOPF_SVA could also mean (2)
>>>> above. IOMMU_DEV_FEAT_IOPF_FLAT?
>>>>
>>>> That leaves space for the nested extensions. (1) above could be
>>>> IOMMU_FEAT_IOPF_NESTED, and (2) requires some new interfacing with
>>> KVM (or
>>>> just an external fault handler) and could be used with either IOPF_FLAT or
>>>> IOPF_NESTED. We can figure out the details later. What do you think?
>>>
>>> I agree that we can define IOPF_ for current usage and leave space for
>>> future extensions.
>>>
>>> IOPF_FLAT represents IOPF on first-level translation, currently first
>>> level translation could be used in below cases.
>>>
>>> 1) FL w/ internal Page Table: Kernel IOVA;
>>> 2) FL w/ external Page Table: VFIO passthrough;
>>> 3) FL w/ shared CPU page table: SVA
>>>
>>> We don't need to support IOPF for case 1). Let's put it aside.
>>>
>>> IOPF handling of 2) and 3) are different. Do we need to define different
>>> names to distinguish these two cases?
>>>
>>
>> Defining feature names according to various use cases does not sound a
>> clean way. In an ideal way we should have just a general FEAT_IOPF since
>> the hardware (at least VT-d) does support fault in either 1st-level, 2nd-
>> level or nested configurations. We are entering this trouble just because
>> there is difficulty for the software evolving to enable full hardware cap
>> in one batch. My last proposal was sort of keeping FEAT_IOPF as a general
>> capability for whether delivering fault through the IOMMU or the ad-hoc
>> device, and then having a separate interface for whether IOPF reporting
>> is available under a specific configuration. The former is about the path
>> between the IOMMU and the device, while the latter is about the interface
>> between the IOMMU driver and its faulting client.
>>
>> The reporting capability can be checked when the fault client is registering
>> its fault handler, and at this time the IOMMU driver knows how the related
>> mapping is configured (1st, 2nd, or nested) and whether fault reporting is
>> supported in such configuration. We may introduce IOPF_REPORT_FLAT and
>> IOPF_REPORT_NESTED respectively. while IOPF_REPORT_FLAT detection is
>> straightforward (2 and 3 can be differentiated internally based on configured
>> level), IOPF_REPORT_NESTED needs additional info to indicate which level is
>> concerned since the vendor driver may not support fault reporting in both
>> levels or the fault client may be interested in only one level (e.g. with 2nd
>> level pinned).
> 
> I agree with this plan (provided I understood it correctly this time):
> have IOMMU_DEV_FEAT_IOPF describing the IOPF interface between device and
> IOMMU. Enabling it on its own doesn't do anything visible to the driver,
> it just probes for capabilities and enables PRI if necessary. For host
> SVA, since there is no additional communication between IOMMU and device
> driver, enabling IOMMU_DEV_FEAT_SVA in addition to IOPF is sufficient.
> Then when implementing nested we'll extend iommu_register_fault_handler()
> with flags and parameters. That will also enable advanced dispatching (1).
> 
> Will it be necessary to enable FEAT_IOPF when doing VFIO passthrough
> (injecting to the guest or handling it with external page tables)?
> I think that would be better. Currently a device driver registering a
> fault handler doesn't know if it will get recoverable page faults or only
> unrecoverable ones.
> 
> So I don't think this patch needs any change. Baolu, are you ok with
> keeping this and patch 4?

It sounds good to me. Keep FEAT_IOPF as the IOMMU capability of
generating I/O page fault and differentiate different I/O page faults by
extending the fault handler register interface.

> 
> Thanks,
> Jean
> 

Best regards,
baolu