[PATCH 1/1] powerpc/smp: Set numa node before updating mask

Srikar Dronamraju srikar at linux.vnet.ibm.com
Fri Apr 9 20:14:09 AEST 2021


Hey Nathan,

>
> Regardless of your change: at boot time, this set of calls to
> set_numa_node() and set_numa_mem() is redundant, right? Because
> smp_prepare_cpus() has:
>
> 	for_each_possible_cpu(cpu) {
> 		...
> 		if (cpu_present(cpu)) {
> 			set_cpu_numa_node(cpu, numa_cpu_lookup_table[cpu]);
> 			set_cpu_numa_mem(cpu,
> 				local_memory_node(numa_cpu_lookup_table[cpu]));
> 		}
>
> I would rather that, when onlining a CPU that happens to have been
> dynamically added after boot, we enter start_secondary() with conditions
> equivalent to those at boot time. Or as close to that as is practical.
>
> So I'd suggest that pseries_add_processor() be made to update
> these things when the CPUs are marked present, before onlining them.

I did try updating at pseries_add_processor when things were being marked as
present. But looks like the zonelists may not be updated at that time.
I saw couple of crashes in local_memory_node() when dplar adding CPUs.
(Appending the patch that causes these crash to this mail for your reference)

[  293.669901] Kernel attempted to read user page (1508) - exploit attempt? (uid: 0)                                                                                                                     [1309/17174]
[  293.669925] BUG: Kernel NULL pointer dereference on read at 0x00001508
[  293.669935] Faulting instruction address: 0xc000000000484ae0
[  293.669947] Oops: Kernel access of bad area, sig: 11 [#1]
[  293.669956] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
[  293.669969] Modules linked in: nft_counter nft_compat rpadlpar_io rpaphp
mptcp_diag xsk_diag tcp_diag udp_diag raw_diag inet_diag unix_diag
af_packet_diag netlink_diag bonding tls nft_fib_inet nft_fib_ipv4 nft_ fib_ipv6
nft_fib nft_reject_inet nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_ct
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 rfkill ip_set
nf_tables nfnetlink dm_multipath pseries_rng xts vmx_c rypto uio_pdrv_genirq
uio binfmt_misc ip_tables xfs libcrc32c sd_mod t10_pi sg ibmvscsi ibmveth
scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod fuse
[  293.670101] CPU: 17 PID: 3183 Comm: drmgr Not tainted 5.12.0-rc5-master+ #2
[  293.670113] NIP:  c000000000484ae0 LR: c00000000010a548 CTR: 00000000006dd130
[  293.670121] REGS: c0000025a9997160 TRAP: 0300   Not tainted  (5.12.0-rc5-master+)
[  293.670131] MSR:  8000000000009033 <SF,EE,ME,IR,DR,RI,LE>  CR: 48008422  XER: 00000008
[  293.670155] CFAR: c00000000010a544 DAR: 0000000000001508 DSISR: 40000000 IRQMASK: 0
[  293.670155] GPR00: c00000000010a548 c0000025a9997400 c000000001f55500 0000000000000000
[  293.670155] GPR04: c000000001a3c598 0000000000000006 0000000000000027 c0000035873b8018
[  293.670155] GPR08: 0000000000000023 c0000000020464f8 00000035894d0000 c000003584c8ffe8
[  293.670155] GPR12: 0000000028008424 c0000035bf22ce80 0000000000000000 0000000000000000
[  293.670155] GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[  293.670155] GPR20: 0000000000000000 0000000000000000 0000000000000000 c000000001fbc160
[  293.670155] GPR24: 0000000000000002 0000000000000200 c000000001fc04c8 0000000000000001
[  293.670155] GPR28: c0000000010c6fc8 c000000001fc08a0 c000002f8fb99e60 0000000000000040
[  293.670263] NIP [c000000000484ae0] local_memory_node+0x20/0x80
[  293.670281] LR [c00000000010a548] pseries_add_processor+0x368/0x3b0
[  293.670292] Call Trace:
[  293.670297] [c0000025a9997400] [c00000000010a524] pseries_add_processor+0x344/0x3b0 (unreliable)
[  293.670311] [c0000025a99976c0] [c00000000010a790] pseries_smp_notifier+0x200/0x2a0
[  293.670322] [c0000025a9997780] [c000000000197288] blocking_notifier_call_chain+0xa8/0x110
[  293.670335] [c0000025a99977d0] [c000000000b27010] of_attach_node+0xc0/0x110
[  293.670347] [c0000025a9997830] [c0000000001032a0] dlpar_attach_node+0x30/0x70
[  293.670358] [c0000025a99978a0] [c000000000109ac0] dlpar_cpu_add+0x1d0/0x510
[  293.670368] [c0000025a9997980] [c00000000010a898] dlpar_cpu+0x68/0x6e0
[  293.670377] [c0000025a9997a80] [c0000000001036b8] handle_dlpar_errorlog+0xf8/0x190
[  293.670388] [c0000025a9997af0] [c000000000103928] dlpar_store+0x178/0x4a0
[  293.670396] [c0000025a9997bb0] [c0000000007df050] kobj_attr_store+0x30/0x50
[  293.670408] [c0000025a9997bd0] [c00000000062f0b0] sysfs_kf_write+0x70/0xb0
[  293.670421] [c0000025a9997c10] [c00000000062d4e0] kernfs_fop_write_iter+0x1d0/0x280
[  293.670432] [c0000025a9997c60] [c00000000051673c] new_sync_write+0x14c/0x1d0
[  293.670445] [c0000025a9997d00] [c000000000519df4] vfs_write+0x264/0x380
[  293.670455] [c0000025a9997d60] [c00000000051a0ec] ksys_write+0x7c/0x140
[  293.670464] [c0000025a9997db0] [c000000000032af0] system_call_exception+0x150/0x290
[  293.670475] [c0000025a9997e10] [c00000000000d45c] system_call_common+0xec/0x278
[  293.670486] --- interrupt: c00 at 0x20000025bd74

That leaves us with just 2 options for now.
1. Update numa_mem later and only update numa_node here.
- Over a longer period of time, this would be more confusing since we
may lose track of why we are splitting the set of numa_node and numa_mem.

or
2. Use my earlier patch.

My choice would be to go with my earlier patch.
Please do let me know your thoughts on the same.

-- 
Thanks and Regards
Srikar Dronamraju

diff --git a/arch/powerpc/include/asm/topology.h b/arch/powerpc/include/asm/topology.h
index 3beeb030cd78..1cdd83703f93 100644
--- a/arch/powerpc/include/asm/topology.h
+++ b/arch/powerpc/include/asm/topology.h
@@ -44,6 +44,7 @@ extern void __init dump_numa_cpu_topology(void);
 
 extern int sysfs_add_device_to_node(struct device *dev, int nid);
 extern void sysfs_remove_device_from_node(struct device *dev, int nid);
+extern int numa_setup_cpu(unsigned long cpu);
 
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node)
 {
@@ -81,6 +82,11 @@ static inline void sysfs_remove_device_from_node(struct device *dev,
 {
 }
 
+static int numa_setup_cpu(unsigned long cpu)
+{
+	return first_online_node;
+}
+
 static inline void update_numa_cpu_lookup_table(unsigned int cpu, int node) {}
 
 static inline int cpu_distance(__be32 *cpu1_assoc, __be32 *cpu2_assoc)
diff --git a/arch/powerpc/kernel/smp.c b/arch/powerpc/kernel/smp.c
index 5a4d59a1070d..0d0c71ba4672 100644
--- a/arch/powerpc/kernel/smp.c
+++ b/arch/powerpc/kernel/smp.c
@@ -1539,9 +1539,6 @@ void start_secondary(void *unused)
 			shared_caches = true;
 	}
 
-	set_numa_node(numa_cpu_lookup_table[cpu]);
-	set_numa_mem(local_memory_node(numa_cpu_lookup_table[cpu]));
-
 	smp_wmb();
 	notify_cpu_starting(cpu);
 	set_cpu_online(cpu, true);
diff --git a/arch/powerpc/mm/numa.c b/arch/powerpc/mm/numa.c
index f2bf98bdcea2..11914a28db67 100644
--- a/arch/powerpc/mm/numa.c
+++ b/arch/powerpc/mm/numa.c
@@ -501,7 +501,7 @@ static int vphn_get_nid(long unused)
  * Figure out to which domain a cpu belongs and stick it there.
  * Return the id of the domain used.
  */
-static int numa_setup_cpu(unsigned long lcpu)
+int numa_setup_cpu(unsigned long lcpu)
 {
 	struct device_node *cpu;
 	int fcpu = cpu_first_thread_sibling(lcpu);
diff --git a/arch/powerpc/platforms/pseries/hotplug-cpu.c b/arch/powerpc/platforms/pseries/hotplug-cpu.c
index 12cbffd3c2e3..311fbe916d5b 100644
--- a/arch/powerpc/platforms/pseries/hotplug-cpu.c
+++ b/arch/powerpc/platforms/pseries/hotplug-cpu.c
@@ -198,9 +198,14 @@ static int pseries_add_processor(struct device_node *np)
 	}
 
 	for_each_cpu(cpu, tmp) {
+		int nid;
+
 		BUG_ON(cpu_present(cpu));
 		set_cpu_present(cpu, true);
 		set_hard_smp_processor_id(cpu, be32_to_cpu(*intserv++));
+		nid = numa_setup_cpu(cpu);
+		set_cpu_numa_node(cpu, nid);
+		set_cpu_numa_mem(cpu, local_memory_node(nid));
 	}
 	err = 0;
 out_unlock:



More information about the Linuxppc-dev mailing list