NUMA memory block size

Olof Johansson olof at austin.ibm.com
Sat Apr 3 16:50:13 EST 2004


On Fri, 2 Apr 2004, Dave Hansen wrote:

> On Fri, 2004-04-02 at 19:20, Olof Johansson wrote:
> > If there are no significant problems caused by this I'd like to feed this
> > upstream soon.
>
> I don't see it doing much besides making that array 16 times bigger.
> It's right at 1MB now.   I guess it will compress down, but that's an
> awful lot of uniform initialized variables to be sitting in an
> executable image.  Should we leave it uninitialized and compile time,
> and run through the whole thing once we're booted up?

Yes, that's a big increase in wasted memory. Note that unless DEBUG_NUMA
is defined, it's initialized to 0 so it'll go in the BSS anyway. Still, it
needs a new solution since MAX_MEMORY will increase in the not so distant
future too.

A couple of questions pop up:

1. Why do we use a full int for node ID? It's quite unlikely that we will
have 2 billion nodes anytime soon. Current limit is 16. :-) Switching to a
char instead of int might be worth it.

2. A lmb_alloc() approach has the benefit of only allocating as much table
as we actually have physical memory in the system. At least this way we'd
only allocate in proportion to how much memory the machine has. 1MB table
for a 2TB machine isn't too bad. On a 128GB system, size will be the same
as before (32KB).

I'll take it as a later todo to look at a better data structure for this,
to avoid wasting too much space (but keep lookups fast).

> Might want to change the 256MB comment :)

DOH, yes.

New patch below.

-Olof

===== include/asm-ppc64/mmzone.h 1.18 vs edited =====
--- 1.18/include/asm-ppc64/mmzone.h	Fri Mar 12 21:18:15 2004
+++ edited/include/asm-ppc64/mmzone.h	Sat Apr  3 00:29:44 2004
@@ -19,13 +19,13 @@
  */

 extern int numa_cpu_lookup_table[];
-extern int numa_memory_lookup_table[];
+extern int *numa_memory_lookup_table;
 extern cpumask_t numa_cpumask_lookup_table[];
 extern int nr_cpus_in_node[];

 #define MAX_MEMORY (1UL << 41)
-/* 256MB regions */
-#define MEMORY_INCREMENT_SHIFT 28
+/* 16MB regions */
+#define MEMORY_INCREMENT_SHIFT 24
 #define MEMORY_INCREMENT (1UL << MEMORY_INCREMENT_SHIFT)

 /* NUMA debugging, will not work on a DLPAR machine */
===== arch/ppc64/mm/numa.c 1.30 vs edited =====
--- 1.30/arch/ppc64/mm/numa.c	Sat Mar 20 18:59:12 2004
+++ edited/arch/ppc64/mm/numa.c	Sat Apr  3 00:31:46 2004
@@ -16,6 +16,7 @@
 #include <linux/module.h>
 #include <asm/lmb.h>
 #include <asm/machdep.h>
+#include <asm/abs_addr.h>

 #if 1
 #define dbg(args...) udbg_printf(args)
@@ -31,9 +32,7 @@

 int numa_cpu_lookup_table[NR_CPUS] = { [ 0 ... (NR_CPUS - 1)] =
 	ARRAY_INITIALISER};
-int numa_memory_lookup_table[MAX_MEMORY >> MEMORY_INCREMENT_SHIFT] =
-	{ [ 0 ... ((MAX_MEMORY >> MEMORY_INCREMENT_SHIFT) - 1)] =
-	ARRAY_INITIALISER};
+int *numa_memory_lookup_table;
 cpumask_t numa_cpumask_lookup_table[MAX_NUMNODES];
 int nr_cpus_in_node[MAX_NUMNODES] = { [0 ... (MAX_NUMNODES -1)] = 0};

@@ -65,12 +64,20 @@
 	int *memory_associativity;
 	int depth;
 	int max_domain = 0;
+	long entries = lmb_end_of_DRAM() >> MEMORY_INCREMENT_SHIFT;
+	long i;

 	if (strstr(saved_command_line, "numa=off")) {
 		printk(KERN_WARNING "NUMA disabled by user\n");
 		return -1;
 	}

+	numa_memory_lookup_table =
+		(int *)abs_to_virt(lmb_alloc(entries * sizeof(int), 1));
+
+	for (i = 0; i < entries ; i++)
+		numa_memory_lookup_table[i] = ARRAY_INITIALISER;
+
 	cpu = of_find_node_by_type(NULL, "cpu");
 	if (!cpu)
 		goto err;
@@ -243,6 +250,14 @@
 	       top_of_ram, total_ram);
 	printk(KERN_INFO "Memory hole size: %ldMB\n",
 	       (top_of_ram - total_ram) >> 20);
+
+	if (!numa_memory_lookup_table) {
+		long entries = top_of_ram >> MEMORY_INCREMENT_SHIFT;
+		numa_memory_lookup_table =
+			(int *)abs_to_virt(lmb_alloc(entries * sizeof(int), 1));
+		for (i = 0; i < entries ; i++)
+			numa_memory_lookup_table[i] = ARRAY_INITIALISER;
+	}

 	for (i = 0; i < NR_CPUS; i++)
 		map_cpu_to_node(i, 0);


** Sent via the linuxppc64-dev mail list. See http://lists.linuxppc.org/





More information about the Linuxppc64-dev mailing list