[PATCH 15/31] KVM: PPC: Book3S HV: XIVE: Fix mapping of passthrough interrupts

Cédric Le Goater clg at kaod.org
Thu May 20 22:09:41 AEST 2021

On 5/15/21 12:40 PM, Marc Zyngier wrote:
> On Fri, 14 May 2021 21:51:51 +0100,
> Thomas Gleixner <tglx at linutronix.de> wrote:
>> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
>> CC: +Marc
> Thanks Thomas.
>>> PCI MSI interrupt numbers are now mapped in a PCI-MSI domain but the
>>> underlying calls handling the passthrough of the interrupt in the
>>> guest need a number in the XIVE IRQ domain.
>>> Use the IRQ data mapped in the XIVE IRQ domain and not the one in the
>>> PCI-MSI domain.
>>> Exporting irq_get_default_host() might not be the best solution.
>>> Cc: Thomas Gleixner <tglx at linutronix.de>
>>> Cc: Paul Mackerras <paulus at ozlabs.org>
>>> Signed-off-by: Cédric Le Goater <clg at kaod.org>
>>> ---
>>>  arch/powerpc/kvm/book3s_xive.c | 3 ++-
>>>  kernel/irq/irqdomain.c         | 1 +
>>>  2 files changed, 3 insertions(+), 1 deletion(-)
>>> diff --git a/arch/powerpc/kvm/book3s_xive.c b/arch/powerpc/kvm/book3s_xive.c
>>> index 3a7da42bed57..81b9f4fc3978 100644
>>> --- a/arch/powerpc/kvm/book3s_xive.c
>>> +++ b/arch/powerpc/kvm/book3s_xive.c
>>> @@ -861,7 +861,8 @@ int kvmppc_xive_set_mapped(struct kvm *kvm, unsigned long guest_irq,
>>>  	struct kvmppc_xive *xive = kvm->arch.xive;
>>>  	struct kvmppc_xive_src_block *sb;
>>>  	struct kvmppc_xive_irq_state *state;
>>> -	struct irq_data *host_data = irq_get_irq_data(host_irq);
>>> +	struct irq_data *host_data =
>>> +		irq_domain_get_irq_data(irq_get_default_host(), host_irq);
>>>  	unsigned int hw_irq = (unsigned int)irqd_to_hwirq(host_data);
>>>  	u16 idx;
>>>  	u8 prio;
>>> diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c
>>> index d10ab1d689d5..8a073d1ce611 100644
>>> --- a/kernel/irq/irqdomain.c
>>> +++ b/kernel/irq/irqdomain.c
>>> @@ -481,6 +481,7 @@ struct irq_domain *irq_get_default_host(void)
>>>  {
>>>  	return irq_default_domain;
>>>  }
>>> +EXPORT_SYMBOL_GPL(irq_get_default_host);
>>>  static void irq_domain_clear_mapping(struct irq_domain *domain,
>>>  				     irq_hw_number_t hwirq)
> Is there any reason why we should add more users of the "default host"
> fallback? I would really hope that new code would actually track their
> irqdomain in a more fine-grained way, specially when using the
> hierarchical MSi setup, which seems to be the goal of this series.
> Don't you have enough topology information that you can make use of to
> correctly assign a domain identifier (of_node or otherwise)?

PHB have a node ID and this is taken into account by the MSI domains.
However, one thing PPC (pSeries and PowerNV) lacks is an interrupt
controller node per chip which makes the IRQ domain hierarchy a bit

It will be difficult to change the pseries platform (VM) since the
PAPR architecture only specifies a single interrupt domain for the
whole machine. The PowerNV platform is designed in a similar way
(because the pseries platform preexisted) and the OPAL firmware hides
the interrupt controllers of each chip behind a single node. The
underlying topology is encoded in HW interrupt numbers. This is a bit
unfortunate since some PowerNV Linux drivers need that information.
Rewriting a new interrupt controller driver in OPAL would be a lot of
work and it won't happen any time soon. But it's feasible.

All that to say that we have a default IRQ domain on these platforms
and not one  IRQ domain per node/chip.

Also, there are two types of interrupt models to consider: the older
XICS (for P8/P7 processors) and the newer XIVE (for P9/P10).

Regarding MSI passthrough, the XIVE side is simpler (I can't believe I
am saying that, XIVE is anything but simple) and I think we can rework
kvmppc_xive_set_mapped() and xive_irq_set_vcpu_affinity() to remove
the IRQ domain bypass. 

XICS adds optimizations for passthrough done in real mode:

 e3c13e56a471 ("KVM: PPC: Book3S HV: Handle passthrough interrupts in guest")
 5d375199ea96 ("KVM: PPC: Book3S HV: Set server for passed-through interrupts")

That's a ~10% bandwidth improvements on CX5 adapters, it's good to
have but they are much more complex to rework. I took some time to
look for a solution for these because of the use of irq_to_desc() and
the use of the host IRQ in the XICS domain which are ugly but nothing
comes to mind yet.

For the time being, I think these changes bypassing the IRQ domains
are fine. I need some more time to mature an alternative.



More information about the Linuxppc-dev mailing list