[PATCH v5 3/4] arm64: topology: Support SMT control on ACPI based system
Pierre Gondois
pierre.gondois at arm.com
Thu Aug 29 22:46:09 AEST 2024
Hello Yicong,
On 8/29/24 09:40, Yicong Yang wrote:
> Hi Pierre,
>
> On 2024/8/27 23:40, Pierre Gondois wrote:
>> Hello Yicong,
>> IIUC we are looking for the maximum number of threads a CPU can have
>> in the platform. But is is actually possible to have a platform with
>> CPUs not having the same number of threads ?
>>
>
> I was thinking of the heterogenous platforms, for example part of the
> cores have SMT and others don't (I don't have any though, but there
> should be some such platforms for arm64).
>
>> For instance kernel/cpu.c::cpu_smt_num_threads_valid() will check
>> that the number of threads is either 1 or cpu_smt_max_threads (as
>> CONFIG_SMT_NUM_THREADS_DYNAMIC is not enabled for arm64). However
>> a (hypothetical) platform with:
>> - CPU0: 2 threads
>> - CPU1: 4 threads
>> should not be able to set the number of threads to 2, as
>> 1 < 2 < cpu_smt_max_threads (cf. cpu_smt_num_threads_valid()).
>>
>
> It's a bit more complex. If we enable/disable the SMT using on/off control
> then we won't have this problem. We'll online all the CPUs on enabling and
> offline CPUs which is not primary thread and don't care about the thread
> number of each core.
>
> Control using thread number is introduced by CONFIG_SMT_NUM_THREADS_DYNAMIC
> and only enabled on powerpc. I think this interface is not enough to handle
> the hypothetical we assumed, since it assumes the homogenous platform that
> all the CPUs have the same thread number. But this should be fine for
> the platforms with SMT(with same thread number) and non-SMT cores, since it do
> indicates the real thread number of the SMT cores.
>
>> So if there is an assumption that all the CPUs have the same number of
>> threads, it should be sufficient to count the number of threads for one
>> CPU right ?
>>
>
> Naturally and conveniently I may think use of the threads number of CPU0 (boot
> cpu) in such a solution. But this will be wrong if CPU0 is non-SMT on a heterogenous
> platform :(
>
>> In the other (unlikely) case (i.e. CPUs can have various number of threads),
>> I think we should either:
>> - use your method and check that all the CPUs have the same number of threads
>> - get the number of threads for one CPU and check that all the CPUs are
>> identical using the ACPI_PPTT_ACPI_IDENTICAL tag
>
> I think this won't be simpler since we still need to iterate all the CPUs to see
> if they have the same hetero_id (checked how we're using this ACPI tag in
> arm_acpi_register_pmu_device()). We could already know if they're identical by
> comparing the thread number and do the update if necessary.
>
>> - have a per-cpu cpu_smt_max_threads value ?
>
> This should be unnecessary in currently stage if there's no platforms
> with several kind cores have different thread number like in your assumption
> and enable CONFIG_SMT_NUM_THREADS_DYNAMIC on such platforms. Otherwise using
> a global cpu_smt_max_threads to enable the SMT control should be enough.
> Does it make sense?
Yes, I agree with all the things you said:
- the current smt/control interface cannot handle asymmetric SMT platforms
- I am not aware if such platform exist so far
I think it would still be good to check all the CPUs have the same number of
threads. I tried to enable the SMT_NUM_THREADS_DYNAMIC Kconfig with the
patch attached (and to check CPUs have the same number of threads). Feel free
to take what you like/ignore what is inside the attached patch, or let me know
if I should submit a part in a separate patchset,
----------------------------
[RFC] arm64: topology: Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
- On arm64 ACPI systems, change the thread_id assignment to have
increasing values starting from 0. This is already the case for DT
based systems. Doing so allows to uniformly identify the n-th
thread of a given CPU.
- Check that all CPUs have the same number of threads (for DT/ACPI)
- Enable CONFIG_SMT_NUM_THREADS_DYNAMIC
On a Tx2, with 256 CPUs, threads siblings being 0,32,64,96
for socket0 and 128 + (0,32,64,96) for socket1:
$ cd /sys/devices/system/cpu/smt/
$ cat ../online
0-255
$ echo 2 > control
$ cat ../online
0-63,128-191
$ echo 3 > control
$ cat ../online
0-95,128-223
$ echo on > control
$ cat ../online
0-255
diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index bd3bc2f5e0ec..1d8521483065 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -239,6 +239,7 @@ config ARM64
select HAVE_GENERIC_VDSO
select HOTPLUG_CORE_SYNC_DEAD if HOTPLUG_CPU
select HOTPLUG_SMT if (SMP && HOTPLUG_CPU)
+ select SMT_NUM_THREADS_DYNAMIC if HOTPLUG_SMT
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select KASAN_VMALLOC if KASAN
diff --git a/arch/arm64/include/asm/topology.h b/arch/arm64/include/asm/topology.h
index 0f6ef432fb84..7dd211f81687 100644
--- a/arch/arm64/include/asm/topology.h
+++ b/arch/arm64/include/asm/topology.h
@@ -39,6 +39,14 @@ void update_freq_counters_refs(void);
#define arch_scale_hw_pressure topology_get_hw_pressure
#define arch_update_hw_pressure topology_update_hw_pressure
+#ifdef CONFIG_SMT_NUM_THREADS_DYNAMIC
+#include <linux/cpu_smt.h>
+static inline bool topology_smt_thread_allowed(unsigned int cpu)
+{
+ return topology_thread_id(cpu) < cpu_smt_num_threads;
+}
+#endif
+
#include <asm-generic/topology.h>
#endif /* _ASM_ARM_TOPOLOGY_H */
diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
index f72e1e55b05e..a83babe19972 100644
--- a/arch/arm64/kernel/topology.c
+++ b/arch/arm64/kernel/topology.c
@@ -47,7 +47,9 @@ int __init parse_acpi_topology(void)
{
int thread_num, max_smt_thread_num = 1;
struct xarray core_threads;
+ bool have_thread = false;
int cpu, topology_id;
+ unsigned long i;
void *entry;
if (acpi_disabled)
@@ -61,6 +63,8 @@ int __init parse_acpi_topology(void)
return topology_id;
if (acpi_cpu_is_threaded(cpu)) {
+ have_thread = true;
+
cpu_topology[cpu].thread_id = topology_id;
topology_id = find_acpi_cpu_topology(cpu, 1);
cpu_topology[cpu].core_id = topology_id;
@@ -69,9 +73,10 @@ int __init parse_acpi_topology(void)
if (!entry) {
xa_store(&core_threads, topology_id,
xa_mk_value(1), GFP_KERNEL);
+ cpu_topology[cpu].thread_id = 0;
} else {
thread_num = xa_to_value(entry);
- thread_num++;
+ cpu_topology[cpu].thread_id = thread_num++;
xa_store(&core_threads, topology_id,
xa_mk_value(thread_num), GFP_KERNEL);
@@ -86,8 +91,17 @@ int __init parse_acpi_topology(void)
cpu_topology[cpu].cluster_id = topology_id;
topology_id = find_acpi_cpu_topology_package(cpu);
cpu_topology[cpu].package_id = topology_id;
+
+ pr_debug("CPU%u: package=0x%x cluster=0x%x core=0x%x thread=0x%x\n",
+ cpu, cpu_topology[cpu].package_id, cpu_topology[cpu].cluster_id,
+ cpu_topology[cpu].core_id, cpu_topology[cpu].thread_id);
}
+ if (have_thread)
+ xa_for_each(&core_threads, i, entry)
+ if (xa_to_value(entry) != max_smt_thread_num)
+ pr_warn("Heterogeneous SMT topology not handled");
+
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
xa_destroy(&core_threads);
diff --git a/drivers/base/arch_topology.c b/drivers/base/arch_topology.c
index 95513abd664f..20d7f5b72ddd 100644
--- a/drivers/base/arch_topology.c
+++ b/drivers/base/arch_topology.c
@@ -532,13 +532,15 @@ static int __init get_cpu_for_node(struct device_node *node)
return cpu;
}
-static void __init update_smt_num_threads(unsigned int num_threads)
+static void __init update_smt_num_threads(int num_threads)
{
- static unsigned int max_smt_thread_num = 1;
+ static int max_smt_thread_num = -1;
- if (num_threads > max_smt_thread_num) {
+ if (max_smt_thread_num < 0) {
max_smt_thread_num = num_threads;
cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
+ } else if (num_threads != max_smt_thread_num) {
+ pr_warn("Heterogeneous SMT topology not handled");
}
}
diff --git a/include/linux/arch_topology.h b/include/linux/arch_topology.h
index b721f360d759..afdfdc64a0a1 100644
--- a/include/linux/arch_topology.h
+++ b/include/linux/arch_topology.h
@@ -87,6 +87,7 @@ extern struct cpu_topology cpu_topology[NR_CPUS];
#define topology_physical_package_id(cpu) (cpu_topology[cpu].package_id)
#define topology_cluster_id(cpu) (cpu_topology[cpu].cluster_id)
#define topology_core_id(cpu) (cpu_topology[cpu].core_id)
+#define topology_thread_id(cpu) (cpu_topology[cpu].thread_id)
#define topology_core_cpumask(cpu) (&cpu_topology[cpu].core_sibling)
#define topology_sibling_cpumask(cpu) (&cpu_topology[cpu].thread_sibling)
#define topology_cluster_cpumask(cpu) (&cpu_topology[cpu].cluster_sibling)
----------------------------
Regards,
Pierre
>
> Thanks,
> Yicong
>
>>
>> Same comment for the DT patch. If there is an assumption that all CPUs have
>> the same number of threads, then update_smt_num_threads() could only be called
>> once I suppose,
>>
>> Regards,
>> Pierre
>>
>>
>> On 8/6/24 10:53, Yicong Yang wrote:
>>> From: Yicong Yang <yangyicong at hisilicon.com>
>>>
>>> For ACPI we'll build the topology from PPTT and we cannot directly
>>> get the SMT number of each core. Instead using a temporary xarray
>>> to record the SMT number of each core when building the topology
>>> and we can know the largest SMT number in the system. Then we can
>>> enable the support of SMT control.
>>>
>>> Signed-off-by: Yicong Yang <yangyicong at hisilicon.com>
>>> ---
>>> arch/arm64/kernel/topology.c | 24 ++++++++++++++++++++++++
>>> 1 file changed, 24 insertions(+)
>>>
>>> diff --git a/arch/arm64/kernel/topology.c b/arch/arm64/kernel/topology.c
>>> index 1a2c72f3e7f8..f72e1e55b05e 100644
>>> --- a/arch/arm64/kernel/topology.c
>>> +++ b/arch/arm64/kernel/topology.c
>>> @@ -15,8 +15,10 @@
>>> #include <linux/arch_topology.h>
>>> #include <linux/cacheinfo.h>
>>> #include <linux/cpufreq.h>
>>> +#include <linux/cpu_smt.h>
>>> #include <linux/init.h>
>>> #include <linux/percpu.h>
>>> +#include <linux/xarray.h>
>>> #include <asm/cpu.h>
>>> #include <asm/cputype.h>
>>> @@ -43,11 +45,16 @@ static bool __init acpi_cpu_is_threaded(int cpu)
>>> */
>>> int __init parse_acpi_topology(void)
>>> {
>>> + int thread_num, max_smt_thread_num = 1;
>>> + struct xarray core_threads;
>>> int cpu, topology_id;
>>> + void *entry;
>>> if (acpi_disabled)
>>> return 0;
>>> + xa_init(&core_threads);
>>> +
>>> for_each_possible_cpu(cpu) {
>>> topology_id = find_acpi_cpu_topology(cpu, 0);
>>> if (topology_id < 0)
>>> @@ -57,6 +64,20 @@ int __init parse_acpi_topology(void)
>>> cpu_topology[cpu].thread_id = topology_id;
>>> topology_id = find_acpi_cpu_topology(cpu, 1);
>>> cpu_topology[cpu].core_id = topology_id;
>>> +
>>> + entry = xa_load(&core_threads, topology_id);
>>> + if (!entry) {
>>> + xa_store(&core_threads, topology_id,
>>> + xa_mk_value(1), GFP_KERNEL);
>>> + } else {
>>> + thread_num = xa_to_value(entry);
>>> + thread_num++;
>>> + xa_store(&core_threads, topology_id,
>>> + xa_mk_value(thread_num), GFP_KERNEL);
>>> +
>>> + if (thread_num > max_smt_thread_num)
>>> + max_smt_thread_num = thread_num;
>>> + }
>>> } else {
>>> cpu_topology[cpu].thread_id = -1;
>>> cpu_topology[cpu].core_id = topology_id;
>>> @@ -67,6 +88,9 @@ int __init parse_acpi_topology(void)
>>> cpu_topology[cpu].package_id = topology_id;
>>> }
>>> + cpu_smt_set_num_threads(max_smt_thread_num, max_smt_thread_num);
>>> +
>>> + xa_destroy(&core_threads);
>>> return 0;
>>> }
>>> #endif
>>
>> .
-------------- next part --------------
A non-text attachment was scrubbed...
Name: 0001-RFC-arm64-topology-Enable-CONFIG_SMT_NUM_THREADS_DYN.patch
Type: text/x-patch
Size: 5575 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20240829/c7cda9e0/attachment.bin>
More information about the Linuxppc-dev
mailing list