[Skiboot-stable] [PATCH] xive: fix return value of opal_xive_allocate_irq()

Cédric Le Goater clg at kaod.org
Fri Sep 6 16:27:01 AEST 2019


On 06/09/2019 04:12, Oliver O'Halloran wrote:
> On Fri, Sep 6, 2019 at 1:59 AM Cédric Le Goater <clg at kaod.org> wrote:
>>
>> When the maximum number of interrupts per chip is reached,
>> xive_try_allocate_irq() returns an internal XIVE error:
>> XIVE_ALLOC_NO_SPACE. But its value 0xffffffff is interpreted as a
>> positive value by its caller opal_xive_allocate_irq() and not as an
>> error.
>>
>> opal_xive_allocate_irq() returns this value to Linux which also
>> considers 0xffffffff as a valid interrupt number and tries to get the
>> interrupt characteritics using opal_xive_get_irq_info(). This OPAL
>> calls finally fails leading to all sort of errors on the host which is
>> not prepared for such a scenario. Code impacted are the IPI setup and
>> the both XIVE KVM devices.
> 
> How do you replicate this bug? 

Not easily. Today, you need to try to run as many VMs as you can after 
having fixed other errors on the way, which is what Greg is up to.

Theoretically, a test would be to allocate as many IRQ as possible and
check that limits are correctly handled. I suppose we could do that in 
a runtime unit test (when skiboot loads). 

> It'd be nice if we had some way to
> exercise some of the XIVE API in op-test and this bug seems like a
> good place to start.

It would require kernel support and userspace support. 

OpenCAPI has a way to allocate IPI IRQs in kernel but it's hand made 
and its needs a rewrite. Then, on top of that, we would need a driver 
of some sort to allocate IRQs from user space.

>> Fix by returning OPAL_RESOURCE from xive_try_allocate_irq() which is
>> consistent with the other errors returned by this routine. This fixes
>> the behavior in opal_xive_allocate_irq() and in Linux.
>>
>> A workaround could be introduced in Linux to consider 0xffffffff as a
>> OPAL_RESOURCE value. This assumption is valid with the current XIVE
>> IRQ number encoding.
> 
> This is probably going to be needed anyway to support old firmware.

Yes. If we can fix the other errors to allocate more VMs.

> 
>> Reported-by: Greg Kurz <groug at kaod.org>
>> Signed-off-by: Cédric Le Goater <clg at kaod.org>
> 
> Needs a Fixes: tag. Also can you also add Cc:
> skiboot-stable at lists.ozlabs.org so Vasant picks it up for the stable
> branches.

Fixes: 07946e68f47a ("xive: Add interrupt allocator")

Thanks,

C.

>> ---
>>  hw/xive.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/hw/xive.c b/hw/xive.c
>> index 76b41a9ee95f..5dc66598ee12 100644
>> --- a/hw/xive.c
>> +++ b/hw/xive.c
>> @@ -5059,7 +5059,7 @@ static int64_t xive_try_allocate_irq(struct xive *x)
>>         idx = bitmap_find_zero_bit(*x->ipi_alloc_map, base_idx, max_count);
>>         if (idx < 0) {
>>                 unlock(&x->lock);
>> -               return XIVE_ALLOC_NO_SPACE;
>> +               return OPAL_RESOURCE;
>>         }
>>         bitmap_set_bit(*x->ipi_alloc_map, idx);
>>         girq = x->int_base + idx;
>> --
>> 2.21.0
>>



More information about the Skiboot-stable mailing list