From ntl at pobox.com Sun Oct 5 10:14:12 2008 From: ntl at pobox.com (Nathan Lynch) Date: Sat, 4 Oct 2008 18:14:12 -0500 Subject: [libtopology-dev] sysfs cpu core_id attribute useless? Message-ID: <20081004231412.GA1401@localdomain> Hi libtopology-dev! Project home page: http://libtopology.ozlabs.org (nothing there yet really) Git repo: git://ozlabs.org/srv/projects/libtopology/libtopology.git Big thanks to all those behind ozlabs.org for project and mailing list hosting. Hugh Blemings and Martin Schwenke got everything up and running -- thanks guys! For the list's inaugural message I'll describe an issue I'm having with the topology information in sysfs. Here's how libtopology builds a context: for each cpu in sysfs (/sys/devices/system/cpu/cpu*) instantiate a "thread" object read "core_id" attribute if this core id has not been seen already instantiate a core object link thread to its core read "package_id" attribute if this package id has not been seen already instantiate a package object link up thread, core, and package The problem I'm encountering is that value of core_id can be '0' when the kernel doesn't actually know the core id -- '0' is the "fallback" value the kernel provides in this case (for some reason, package_id's fallback value is -1, which is easier to accommodate). But '0' is a perfectly valid core id, so it looks to the library like all logical CPUs are on a single core... with the NUMA support I added recently the library blows up because it looks like we have the same core present in multiple nodes. I'm thinking about working around this by ignoring core_id and instead using a hash of the contents of each logical CPU's thread_siblings attribute to uniquely identify cores. On systems where I've seen the problem described above, thread_siblings is accurate. Using a hash table will likely bloat the library's memory footprint a bit so I'd appreciate any suggestions on alternative solutions. Any takers? From ntl at pobox.com Sun Oct 5 17:43:45 2008 From: ntl at pobox.com (Nathan Lynch) Date: Sun, 5 Oct 2008 01:43:45 -0500 Subject: [libtopology-dev] sysfs cpu core_id attribute useless? In-Reply-To: <20081004231412.GA1401@localdomain> References: <20081004231412.GA1401@localdomain> Message-ID: <20081005064345.GB1401@localdomain> Nathan Lynch wrote: > > I'm thinking about working around this by ignoring core_id and instead > using a hash of the contents of each logical CPU's thread_siblings > attribute to uniquely identify cores. On systems where I've seen the > problem described above, thread_siblings is accurate. Here's what I whipped up this evening, but I'll sit on it for a little while. An unforeseen bonus is that this actually gets accurate package/socket information on ppc64 with a 2.6.27-rc8 kernel (where core_siblings is exported but physical_package_id is always -1). commit 126281f7f8fec3c17fc00ff2621ca4b38a603d94 Author: Nathan Lynch Date: Sat Oct 4 23:39:27 2008 -0500 identify cores and packages via thread_siblings and core_siblings On machines where the kernel is not able to discover core and package information, the /sys/devices/system/cpu/cpu*/core_id are all '0' and /sys/devices/system/cpu/cpu*/package_id are all '-1'. The latter was actually handled fine by sysfs_cpu_package_id(), but there's no sane way to handle the former -- '0' is a perfectly valid core identifier and it's beyond me why this was chosen as a fallback value. An example of a machine where this problem occurs is an ia64 IBM 8855-2RZ with 2 NUMA nodes and 4 processors per node (plain SMP, no multi-core or multi-threading). libtopology blows up on this machine because it's as if individual cores and packages are spanning nodes. So abandon core_id and package_id since we know at least the first is unreliable. Instead, use hash tables keyed on the thread_siblings and core_siblings bitmaps to keep track of which cores and packages have been seen already. Update the one_core_smt4 test to have thread_siblings attributes. Add a new fake_core_id testcase which was used to recreate the failure. --- lib/topology.c | 293 +++++++++++++++++++++++++++++++-------------- tests/fake_core_id.c | 34 +++++ tests/sysfs/fake_core_id | 5 + tests/sysfs/one_core_smt4 | 4 + 4 files changed, 245 insertions(+), 91 deletions(-) diff --git a/lib/topology.c b/lib/topology.c index 2cedec1..b5ead0d 100644 --- a/lib/topology.c +++ b/lib/topology.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -47,6 +48,12 @@ struct list_elem { struct list_elem *next; }; +struct htable { + size_t count; + size_t maxsize; + struct hsearch_data tbl; +}; + struct topo_ent { struct topology_context *context; cpu_set_t *cpumask; @@ -88,6 +95,7 @@ struct topo_package { struct topo_node *node; struct list_elem sibling; struct list_anchor cores; + char *hash_key; }; struct topo_core { @@ -96,6 +104,7 @@ struct topo_core { struct topo_ent ent; struct list_elem sibling; struct list_anchor threads; + char *hash_key; }; struct topo_cpu { @@ -122,6 +131,11 @@ struct topology_context { struct list_anchor cores; struct list_anchor packages; struct list_anchor nodes; + struct htable core_htable; + struct htable package_htable; /* temporary hash tables for + * enumerating cores and + * packages during + * initialization */ }; static void list_init(struct list_anchor *anchor) @@ -163,6 +177,69 @@ static struct topo_ent *list_to_ent(struct list_elem *elem) return container_of(elem, struct topo_ent, list); } +static int htable_init(struct htable *htab, size_t maxsize) +{ + int rc; + + memset(htab, 0, sizeof(*htab)); + + rc = hcreate_r(maxsize, &htab->tbl); + if (rc == 0) + return 1; + + htab->maxsize = maxsize; + return 0; +} + +static int htable_insert(struct htable *htab, const char *key, void *data) +{ + ENTRY entry; + ENTRY *ret; + int rc; + + assert(htab->count < htab->maxsize); + + entry.key = (char *)key; + entry.data = data; + + rc = hsearch_r(entry, ENTER, &ret, &htab->tbl); + if (rc == 0) + goto err; + + htab->count++; + return 0; +err: + return 1; +} + +static void *htable_lookup(struct htable *htab, const char *key) +{ + ENTRY entry; + ENTRY *ret; + + if (!key) + return NULL; + + entry.key = (char *)key; + entry.data = NULL; + + hsearch_r(entry, FIND, &ret, &htab->tbl); + if (!ret) + goto err; + + return ret->data; +err: + return NULL; +} + +static void htable_release(struct htable *htab) +{ + if (htab->maxsize == 0) + return; + hdestroy_r(&htab->tbl); + memset(htab, 0, sizeof(*htab)); +} + static void topo_ent_cleanup(struct topo_ent *ent) { cpu_set_t *cpumask; @@ -347,12 +424,18 @@ static void cpu_release(struct topo_ent *ent) static void core_release(struct topo_ent *ent) { - free(to_core(ent)); + struct topo_core *core = to_core(ent); + + free(core->hash_key); + free(core); } static void package_release(struct topo_ent *ent) { - free(to_package(ent)); + struct topo_package *pkg = to_package(ent); + + free(pkg->hash_key); + free(pkg); } static void node_release(struct topo_ent *ent) @@ -517,35 +600,6 @@ static void link_cpu_to_node(struct topo_cpu *cpu, struct topo_node *node) topo_ent_cpumask_set(&node->ent, cpu->id); } -/* stopgap until we hash cores as they are discovered */ -static struct topo_core *lookup_core_by_id(struct topology_context *ctx, int id) -{ - struct list_elem *elem; - elem = list_first(&ctx->cores); - while (elem) { - struct topo_ent *ent = list_to_ent(elem); - struct topo_core *core = to_core(ent); - if (core->id == id) - return core; - elem = list_next(elem); - } - return NULL; -} - -static struct topo_package *lookup_package_by_id(struct topology_context *ctx, int id) -{ - struct list_elem *elem; - elem = list_first(&ctx->packages); - while (elem) { - struct topo_ent *ent = list_to_ent(elem); - struct topo_package *package = to_package(ent); - if (package->id == id) - return package; - elem = list_next(elem); - } - return NULL; -} - /* Return true if: * - /sys/devices/system/cpu/cpu$cpu/online exists and has value 1, or * - /sys/devices/system/cpu/cpu$cpu/online does not exist (e.g. x86 boot cpu, @@ -691,53 +745,10 @@ size_t topology_sizeof_cpu_set_t(topology_context_t _ctx) return ctx->cpu_set_t_size; } -/* cpu must be online */ -static int sysfs_cpu_core_id(const char *sysfs, int cpu) +static size_t context_max_cpus(struct topology_context *ctx) { - char path[PATH_MAX]; - int core_id = cpu; - char *buf; - - snprintf(path, sizeof(path), - "%s/devices/system/cpu/cpu%d/topology/core_id", - sysfs, cpu); - - buf = slurp_text_file(path); - if (!buf) /* assume no core information, each cpu is its own core */ - goto out; - - if (sscanf(buf, "%d", &core_id) != 1) - core_id = cpu; /* this shouldn't happen... */ - -out: - free(buf); - return core_id; -} - -static int sysfs_cpu_package_id(const char *sysfs, int cpu, int core) -{ - char path[PATH_MAX]; - int package_id = core; - char *buf; - - snprintf(path, sizeof(path), - "%s/devices/system/cpu/cpu%d/topology/physical_package_id", - sysfs, cpu); - - buf = slurp_text_file(path); - if (!buf) /* each cpu is its own package */ - goto out; - - if (sscanf(buf, "%d", &package_id) != 1) - package_id = core; /* this shouldn't happen... */ - - /* Some kernels return -1 when no package info is available */ - if (package_id == -1) - package_id = core; - -out: - free(buf); - return package_id; + assert(ctx->cpu_set_t_size != 0); + return ctx->cpu_set_t_size * 8; } static int sysfs_count_caches(const char *sysfs, int cpu) @@ -918,14 +929,117 @@ err: return 1; } +static char *sysfs_cpu_thread_siblings(const char *sysfs, int cpu_id) +{ + char path[PATH_MAX]; + char *buf; + int rc; + + snprintf(path, sizeof(path), + "%s/devices/system/cpu/cpu%d/topology/thread_siblings", + sysfs, cpu_id); + + buf = slurp_text_file(path); + if (buf) + return buf; + + rc = asprintf(&buf, "%d", cpu_id); + if (rc == -1) + buf = NULL; + + return buf; +} + +static char *sysfs_cpu_core_siblings(const char *sysfs, int cpu_id) +{ + char path[PATH_MAX]; + char *buf; + + snprintf(path, sizeof(path), + "%s/devices/system/cpu/cpu%d/topology/core_siblings", + sysfs, cpu_id); + + buf = slurp_text_file(path); + if (buf) + return buf; + + /* thread_siblings must be a subset of core_siblings so assume + * one core per package */ + return sysfs_cpu_thread_siblings(sysfs, cpu_id); +} + +static struct topo_core *core_lookup_or_create(struct topo_cpu *cpu) +{ + struct topology_context *ctx; + struct topo_core *core; + char *siblings; + + ctx = cpu->ent.context; + + siblings = sysfs_cpu_thread_siblings(ctx->sysfs_root, cpu->id); + if (!siblings) + goto err; + + core = htable_lookup(&ctx->core_htable, siblings); + if (core) { + free(siblings); + goto found; + } + + core = new_core(ctx, cpu->id); + if (!core) + goto err; + + if (htable_insert(&ctx->core_htable, siblings, core)) + goto err; + + core->hash_key = siblings; +found: + return core; +err: + free(siblings); + return NULL; +} + +static struct topo_package *package_lookup_or_create(struct topo_cpu *cpu) +{ + struct topology_context *ctx; + struct topo_package *package; + char *siblings; + + ctx = cpu->ent.context; + + siblings = sysfs_cpu_core_siblings(ctx->sysfs_root, cpu->id); + if (!siblings) + goto err; + + package = htable_lookup(&ctx->package_htable, siblings); + if (package) { + free(siblings); + goto found; + } + + package = new_package(ctx, cpu->id); + if (!package) + goto err; + + if (htable_insert(&ctx->package_htable, siblings, package)) + goto err; + + package->hash_key = siblings; +found: + return package; +err: + free(siblings); + return NULL; +} + static int do_one_cpu(struct topo_node *node, int cpu_id) { struct topology_context *ctx; struct topo_package *pkg; struct topo_core *core; struct topo_cpu *cpu; - int core_id; - int pkg_id; ctx = node->ent.context; @@ -933,23 +1047,13 @@ static int do_one_cpu(struct topo_node *node, int cpu_id) if (!cpu) goto err; - core_id = sysfs_cpu_core_id(ctx->sysfs_root, cpu_id); - - /* O(n*n)-ish behavior here... */ - core = lookup_core_by_id(ctx, core_id); - if (!core) - core = new_core(ctx, core_id); - + core = core_lookup_or_create(cpu); if (!core) goto err; link_cpu_to_core(cpu, core); - pkg_id = sysfs_cpu_package_id(ctx->sysfs_root, cpu_id, core_id); - pkg = lookup_package_by_id(ctx, pkg_id); - if (!pkg) - pkg = new_package(ctx, pkg_id); - + pkg = package_lookup_or_create(cpu); if (!pkg) goto err; @@ -1063,6 +1167,8 @@ static void context_release(struct topology_context *ctx) { if (!ctx) return; + htable_release(&ctx->core_htable); + htable_release(&ctx->package_htable); list_topo_ent_cleanup(&ctx->threads); list_topo_ent_cleanup(&ctx->cores); list_topo_ent_cleanup(&ctx->packages); @@ -1085,6 +1191,11 @@ static struct topology_context *new_context(void) if (probe_cpumask_size(ctx)) goto err; + if (htable_init(&ctx->core_htable, context_max_cpus(ctx))) + goto err; + if (htable_init(&ctx->package_htable, context_max_cpus(ctx))) + goto err; + list_init(&ctx->threads); list_init(&ctx->cores); list_init(&ctx->packages); diff --git a/tests/fake_core_id.c b/tests/fake_core_id.c new file mode 100644 index 0000000..10d8164 --- /dev/null +++ b/tests/fake_core_id.c @@ -0,0 +1,34 @@ +/* + * libtopology - a library for discovering Linux system topology + * + * Copyright 2008 Nathan Lynch, IBM Corporation + * + * This library is free software; you can redistribute it and/or + * modify it under the terms of the GNU Lesser General Public License + * version 2.1 as published by the Free Software Foundation. + * + * This library is distributed in the hope that it will be useful, but + * WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU + * Lesser General Public License for more details. + * + * You should have received a copy of the GNU Lesser General Public + * License along with this library; if not, write to the Free Software + * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA + * 02110-1301 USA + */ + +#define _GNU_SOURCE +#include +#include +#include + +#define NAME "fake core id" +#include "boilerplate-begin.h" +{ + fail_on(topology_nr_nodes(tctx) != 2); + fail_on(topology_nr_packages(tctx) != 2); + fail_on(topology_nr_cores(tctx) != 2); + fail_on(topology_nr_threads(tctx) != 2); +} +#include "boilerplate-end.h" diff --git a/tests/sysfs/fake_core_id b/tests/sysfs/fake_core_id new file mode 100644 index 0000000..8fa014d --- /dev/null +++ b/tests/sysfs/fake_core_id @@ -0,0 +1,5 @@ +# simulate a system that uses 0 as the default core id +/devices/system/cpu/cpu0/topology/core_id:0 +/devices/system/cpu/cpu1/topology/core_id:0 +/devices/system/node/node0/cpu0/topology/core_id:0 +/devices/system/node/node1/cpu1/topology/core_id:0 diff --git a/tests/sysfs/one_core_smt4 b/tests/sysfs/one_core_smt4 index f9afc70..249ee61 100644 --- a/tests/sysfs/one_core_smt4 +++ b/tests/sysfs/one_core_smt4 @@ -1,8 +1,12 @@ /devices/system/cpu/cpu0/ /devices/system/cpu/cpu0/topology/core_id:0 +/devices/system/cpu/cpu0/topology/thread_siblings:f /devices/system/cpu/cpu1/ /devices/system/cpu/cpu1/topology/core_id:0 +/devices/system/cpu/cpu1/topology/thread_siblings:f /devices/system/cpu/cpu2/ /devices/system/cpu/cpu2/topology/core_id:0 +/devices/system/cpu/cpu2/topology/thread_siblings:f /devices/system/cpu/cpu3/ /devices/system/cpu/cpu3/topology/core_id:0 +/devices/system/cpu/cpu3/topology/thread_siblings:f From ntl at pobox.com Fri Oct 17 17:35:23 2008 From: ntl at pobox.com (Nathan Lynch) Date: Fri, 17 Oct 2008 01:35:23 -0500 Subject: [libtopology-dev] libtopology v0.3 Message-ID: <20081017063523.GC6830@localdomain> Hi- I've created a v0.3 tag and pushed this out as the first release: http://libtopology.ozlabs.org/releases/libtopology-0.3.tar.gz This contains a few basic capabilities: * Support for enumerating/traversing NUMA nodes, processor packages/sockets, cores, and hardware threads; support for discovering these objects' associated logical CPUs. * Support for querying processor caches and discovering their type, size, level, and associated logical CPUs. Full exploitation of these capabilities depends on the information the kernel makes available in sysfs. I've tested on i386, x86_64, ia64, and powerpc[1] with a variety of distributions and kernels (going back to 2.6.18 or so). Hugh Blemings implemented support for generating documentation via doxygen ("make doc"). Thanks Hugh! >From this point on, I don't intend to make any ABI/API-breaking changes without incrementing the major version number, and barring major design mistakes this won't happen any time soon. The only potential ABI-breaker on the horizon is hotplug notification support, but I'm optimistic that it won't come to that. If you haven't looked at the source in the last couple of weeks, the user-visible API is now much slimmer (and more maintainable IMO); please see include/topology.h. Planned features for future releases include: * Reporting cache line size and associativity * Cell SPU support [2] * Reporting NUMA node memory and distance information * Reporting of processor/node affinity for additional devices * Processor (and memory?) hotplug event notification Do speak up if you'd like anything added to this list. [1] powerpc needs a 2.6.27 kernel for accurate enumeration of sockets and cores; obtaining cache info via libtopology on powerpc does not work on current kernels because of the lack of a "shared_cpu_map" attribute in the sysfs cache hierarchies. [2] It would be nice if someone could send me the output of "grep -r . /sys/devices/system/spu" from a Cell system as I don't have easy access to a working one right now. This would enable me to write a testcase. From mikey at neuling.org Sat Oct 18 16:51:18 2008 From: mikey at neuling.org (Michael Neuling) Date: Sat, 18 Oct 2008 16:51:18 +1100 Subject: [libtopology-dev] libtopology v0.3 In-Reply-To: <20081017063523.GC6830@localdomain> References: <20081017063523.GC6830@localdomain> Message-ID: <17290.1224309078@neuling.org> > [2] It would be nice if someone could send me the output of > "grep -r . /sys/devices/system/spu" from a Cell system as I don't have > easy access to a working one right now. This would enable me to write > a testcase. I'm not sure this is what you really need but... [root at tuebingen ~]# grep -r . /sys/devices/system/spu [root at tuebingen ~]# find /sys/devices/system/spu /sys/devices/system/spu /sys/devices/system/spu/spu15 /sys/devices/system/spu/spu14 /sys/devices/system/spu/spu13 /sys/devices/system/spu/spu12 /sys/devices/system/spu/spu11 /sys/devices/system/spu/spu10 /sys/devices/system/spu/spu9 /sys/devices/system/spu/spu8 /sys/devices/system/spu/spu7 /sys/devices/system/spu/spu6 /sys/devices/system/spu/spu5 /sys/devices/system/spu/spu4 /sys/devices/system/spu/spu3 /sys/devices/system/spu/spu2 /sys/devices/system/spu/spu1 /sys/devices/system/spu/spu0 [root at tuebingen ~]# uname -a Linux tuebingen.ozlabs.ibm.com 2.6.20-CBE #1 SMP Wed Mar 21 10:24:39 CET 2007 ppc64 ppc64 ppc64 GNU/Linux [root at tuebingen ~]# This is a Cell blade. Mikey From ntl at pobox.com Sun Oct 19 15:40:12 2008 From: ntl at pobox.com (Nathan Lynch) Date: Sat, 18 Oct 2008 23:40:12 -0500 Subject: [libtopology-dev] libtopology v0.3 In-Reply-To: <17290.1224309078@neuling.org> References: <20081017063523.GC6830@localdomain> <17290.1224309078@neuling.org> Message-ID: <20081019044012.GD6830@localdomain> Michael Neuling wrote: > > [2] It would be nice if someone could send me the output of > > "grep -r . /sys/devices/system/spu" from a Cell system as I don't have > > easy access to a working one right now. This would enable me to write > > a testcase. > > I'm not sure this is what you really need but... > > [root at tuebingen ~]# grep -r . /sys/devices/system/spu > [root at tuebingen ~]# find /sys/devices/system/spu > /sys/devices/system/spu > .... Thanks. I should have been more explicit that I was trying to find out how SPU affinity to processors/nodes is expressed in sysfs. I hadn't counted on the spu sydevs having no attributes at all. Turns out that the node directories have symlinks to local SPUs. e.g. /sys/devices/system/node/node0 -> ../../spu/spu0 That should be good enough.