[PATCH] powerpc/xive: Do not skip CPU-less nodes when creating the IPIs

Cédric Le Goater clg at kaod.org
Fri Aug 6 21:50:43 AEST 2021


On 6/29/21 3:15 PM, Cédric Le Goater wrote:
> On PowerVM, CPU-less nodes can be populated with hot-plugged CPUs at
> runtime. Today, the IPI is not created for such nodes, and hot-plugged
> CPUs use a bogus IPI, which leads to soft lockups.
> 
> We could create the node IPI on demand but it is a bit complex because
> this code would be called under bringup_up() and some IRQ locking is
> being done. The simplest solution is to create the IPIs for all nodes
> at startup.
> 
> Fixes: 7dcc37b3eff9 ("powerpc/xive: Map one IPI interrupt per node")
> Cc: stable at vger.kernel.org # v5.13
> Reported-by: Geetika Moolchandani <Geetika.Moolchandani1 at ibm.com>
> Cc: Srikar Dronamraju <srikar at linux.vnet.ibm.com>
> Signed-off-by: Cédric Le Goater <clg at kaod.org>
> ---
> 
> This patch breaks old versions of irqbalance (<= v1.4). Possible nodes
> are collected from /sys/devices/system/node/ but CPU-less nodes are
> not listed there. When interrupts are scanned, the link representing
> the node structure is NULL and segfault occurs.

This is an irqbalance regression due to :

  https://github.com/Irqbalance/irqbalance/pull/172

I will report through an issue. 

Anyhow, there is a better approach which is to allocate IPIs for all 
nodes at boot time and do the mapping on demand. Removing the mapping 
on last use seems more complex though. 

I will send a v2 after some tests.  

Thanks,

C.

> Version 1.7 seems immune. 
> 
> ---
>  arch/powerpc/sysdev/xive/common.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/arch/powerpc/sysdev/xive/common.c b/arch/powerpc/sysdev/xive/common.c
> index f3b16ed48b05..5d2c58dba57e 100644
> --- a/arch/powerpc/sysdev/xive/common.c
> +++ b/arch/powerpc/sysdev/xive/common.c
> @@ -1143,10 +1143,6 @@ static int __init xive_request_ipi(void)
>  		struct xive_ipi_desc *xid = &xive_ipis[node];
>  		struct xive_ipi_alloc_info info = { node };
>  
> -		/* Skip nodes without CPUs */
> -		if (cpumask_empty(cpumask_of_node(node)))
> -			continue;
> -
>  		/*
>  		 * Map one IPI interrupt per node for all cpus of that node.
>  		 * Since the HW interrupt number doesn't have any meaning,
> 



More information about the Linuxppc-dev mailing list