[PATCH v2 3/4] powerpc/numa: Early request for home node associativity
Nathan Lynch
nathanl at linux.ibm.com
Fri Sep 6 06:04:00 AEST 2019
Hi Srikar,
Srikar Dronamraju <srikar at linux.vnet.ibm.com> writes:
> Currently the kernel detects if its running on a shared lpar platform
> and requests home node associativity before the scheduler sched_domains
> are setup. However between the time NUMA setup is initialized and the
> request for home node associativity, workqueue initializes its per node
> cpumask. The per node workqueue possible cpumask may turn invalid
> after home node associativity resulting in weird situations like
> workqueue possible cpumask being a subset of workqueue online cpumask.
>
> This can be fixed by requesting home node associativity earlier just
> before NUMA setup. However at the NUMA setup time, kernel may not be in
> a position to detect if its running on a shared lpar platform. So
> request for home node associativity and if the request fails, fallback
> on the device tree property.
>
> While here, fix a problem where of_node_put could be called even when
> of_get_cpu_node was not successful.
of_node_put() handles NULL arguments, so this should not be necessary.
> +static int vphn_get_nid(unsigned long cpu, bool get_hwid)
[...]
> +static int numa_setup_cpu(unsigned long lcpu, bool get_hwid)
[...]
> @@ -528,7 +561,7 @@ static int ppc_numa_cpu_prepare(unsigned int cpu)
> {
> int nid;
>
> - nid = numa_setup_cpu(cpu);
> + nid = numa_setup_cpu(cpu, true);
> verify_cpu_node_mapping(cpu, nid);
> return 0;
> }
> @@ -875,7 +908,7 @@ void __init mem_topology_setup(void)
> reset_numa_cpu_lookup_table();
>
> for_each_present_cpu(cpu)
> - numa_setup_cpu(cpu);
> + numa_setup_cpu(cpu, false);
> }
I'm open to other points of view here, but I would prefer two separate
functions, something like vphn_get_nid() for runtime and
vphn_get_nid_early() (which could be __init) for boot-time
initialization. Propagating a somewhat unexpressive boolean flag through
two levels of function calls in this code is unappealing...
Regardless, I have an annoying question :-) Isn't it possible that,
while Linux is calling vphn_get_nid() for each logical cpu in sequence,
the platform could change a virtual processor's node assignment,
potentially causing sibling threads to get different node assignments
and producing an incoherent topology (which then leads to sched domain
assertions etc)?
If so, I think more care is needed. The algorithm should make the vphn
call only once per cpu node, I think?
More information about the Linuxppc-dev
mailing list