[PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

Cédric Le Goater clg at kaod.org
Mon Aug 2 22:59:01 AEST 2021


On 8/2/21 8:37 AM, Michael Ellerman wrote:
> Cédric Le Goater <clg at kaod.org> writes:
>> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
>> runtime. Today, the IPI is not created for such nodes, and hot-plugged
>> CPUs use a bogus IPI, which leads to soft lockups.
>>
>> We could create the node IPI on demand but it is a bit complex because
>> this code would be called under bringup_up() and some IRQ locking is
>> being done. The simplest solution is to create the IPIs for all nodes
>> at startup.
>>
>> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
>> Cc: stable at vger.kernel.org # v5.13
>> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1 at ibm.com>
>> Cc: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
>> Signed-off-by: Cédric Le Goater <clg at kaod.org>
>> ---
>>
>> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
>> are collected from /sys/devices/system/node/ but CPU-less nodes are
>> not listed there. When interrupts are scanned, the link representing
>> the node structure is NULL and segfault occurs.
> 
> Breaking userspace is usually frowned upon, even if it is irqbalance.
>
> If CPU-less nodes appeared in /sys/devices/system/node would that fix
> it? Could we do that or is that not possible for other reasons?
> 
>> Version 1.7 seems immune. 
> 
> Which was released in August 2020.
> 
> Looks like some distros still ship 1.6, I take it you're not sure if
> that is broken or not.

I did a bisect on irqbalance and the "bad" commit was introduced between 
version 1.7 and version 1.8 :

  commit 31dea01f3a47 ("Also fetch node info for non-PCI devices") 
  https://github.com/Irqbalance/irqbalance/commit/31dea01f3a47aa6374560638486879e5129f9c94

which was backported on RHEL 8 in RPM irqbalance-1.4.0-6.el8.

Any distro using irqbalance <= 1.7 without the patch above is fine. 

Since irqbalance handled cleanly irqs referencing offline nodes before 
this patch, I am inclined to think that the irqbalance fix is incomplete. 
Unfortunately, the commit log lacks some context on the non-PCI devices. 

Thanks,

C. 


More information about the Linuxppc-dev mailing list