[PATCH 07/31] powerpc/xive: Fix xive_irq_set_affinity for MSI

Cédric Le Goater clg at kaod.org
Fri May 21 03:25:06 AEST 2021


On 5/14/21 10:48 PM, Thomas Gleixner wrote:
> On Fri, Apr 30 2021 at 10:03, Cédric Le Goater wrote:
>> The MSI affinity is automanaged and it can be set before starting the
>> associated IRQ.
>>
>> ( Should we simply remove the irqd_is_started() test ? )
> 
> If the hardware can handle it properly.
> 
> But see:
> 
>   cffb717ceb8e ("powerpc/xive: Ensure active irqd when setting affinity")

Thanks for digging. That's a patch from the early days of XIVE support. 

> which introduced that condition. It mutters something about migration of
> shutdown interrupts:
> 
>        [  123.053037264,3] XIVE[ IC 00  ] ISN 2 lead to invalid IVE !

The XIVE driver in OPAL is complaining.

Linux is trying to configure the target of HW IRQ number 2 but OPAL refuses
because it's invalid. The first 16 are reserved (like on Linux).

So it's another problem. 2 could be a value from an "interrupts" property,
giving the INTx number assigned to a PCI device or an OPAL event IRQ 
number leaked into the XIVE domain. Given the low Linux IRQ number that 
might be the latter. 

>        [   77.885859] xive: Error -6 reconfiguring irq 17
>        [   77.885862] IRQ17: set affinity failed(-6).
> 
> Not that I can decode that :)

A device name would help but you have guessed most of it ;)

> 
> Non-managed interrupts have the sequence:
> 
>       startup()
>       set_affinity()
> 
> which is historical and an earlier attempt to flip it caused havoc in
> some places.
> 
> With managed we needed to make sure that the affinity is set correctly
> right at start. So it needs to be done the other way round and it turned
> out that for MSI this works.
> 
> I have no idea, whether that might make the above issue reappear or
> not. If so, then we need some extra state to make it work.
> 
> The root cause which triggered the problem got fixed, so there should be
> no issue _if_ this was specifically related to that CPU unplug case.

I would vote for this option. I will simply remove the irqd_is_started() 
test which looks bogus and do some extra tests on all platforms.

Thanks,

C.


 
>> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
>> index 96737938e8e3..3485baf9ec8c 100644
>> --- a/arch/powerpc/sysdev/xive/common.c
>> +++ b/arch/powerpc/sysdev/xive/common.c
>> @@ -710,7 +710,7 @@ static int xive_irq_set_affinity(struct irq_data *d,
>>  		return -EINVAL;
>>  
>>  	/* Don't do anything if the interrupt isn't started */
>> -	if (!irqd_is_started(d))
>> +	if (!irqd_is_started(d) && !irqd_affinity_is_managed(d))
>>  		return IRQ_SET_MASK_OK;
>>  
>>  	/*
> 
> Thanks,
> 
>         tglx
> 



More information about the Linuxppc-dev mailing list