[PATCH 0/7] [RFC] Sizing zones and holes in an architecture independent manner V2
Mel Gorman
mel at skynet.ie
Thu Apr 13 20:24:32 EST 2006
On Thu, 13 Apr 2006, Andi Kleen wrote:
> On Thursday 13 April 2006 02:22, Mel Gorman wrote:
>
>> I experimented with the idea of all architectures sharing the struct
>> node_active_region rather than storing the information twice. It got very
>> messy, particularly for x86 because it needs to store more than nid,
>> start_pfn and end_pfn for a range of page frames (see node_memory_chunk_s
>> in arch/i386/kernel/srat.c). Worse, some architecture-specific code
>> remembers the ranges of active memory as addresses and others as pfn's. In
>> the end, I was not too worried about having the information in two places,
>> because the active ranges are kept in __initdata and gets freed.
>
> The problem is not memory consumption but complexity of code/data structures.
The architecture-independent code is simpler than i386's SRAT messing,
about the same complexity as ppcs dealings with LMB (in fact, much of the
code is lifted from ppc) and comparable in complexity to what IA64 does.
For x86_64, there is less architecture-specific code that has to be
understood.
> Keeping information in two places is usually a good cue that something
> is wrong. This code is also fragile and hard to test.
>
At minimum, it requires a boot test - not that massive a burden. For the
active, a look at the value of the zones before and after the patches.
To test architectures that register PFNs in unexpected ways that I don't
have a test machine for (like IA64), I wrote the attached test program. It
was a simply case of
1. Few #defines to pretend it's compiled in-kernel
2. Cut and paste from the architecture-independent code in mem_init.c to
the driver program
3. Pass in sample input from main() and see what pops out
It caught a number of simple bugs (including one this morning) without
having to even boot a machine. The same type of testing is hard with the
architecture specific code. This is sample output of the driver program
handing PFN ranges supplied by IA64;
mel at joshua:~/tmp$ gcc driver_test.c -o driver_test && ./driver_test | grep
-v "active with no"
Stage 1: Registering active ranges
add_active_range(0, 0, 4096): New
add_active_range(0, 0, 131072): Merging forward
add_active_range(0, 0, 131072): Merging backwards
add_active_range(0, 393216, 523264): New
add_active_range(0, 393216, 523264): Merging backwards
add_active_range(0, 393216, 524288): Merging forward
add_active_range(0, 393216, 524288): Merging backwards
Stage 2: Calculating zone sizes and holes
Hole found zone 1 index 1: 131072 -> 393216
Stage 3: Dumping zone sizes and holes
zone_size[0][0] = 131020 zone_holes[0][0] = 0
zone_size[0][1] = 393268 zone_holes[0][1] = 262144
Stage 4: Printing present pages
On node 0, 262144 pages
zone 0 present_pages = 131020
zone 1 present_pages = 131124
So, testing is not that hard.
How is the code fragile? Even *if* it is fragile, it only has to be fixed
once to benefit any architecture using the code path.
>> I'll admit that for x86_64, the entire code path for initialisation (i.e.
>> architecture specific and architecture independent paths) is now more
>> complex. The architecture independent code needed to be able to handle
>> every variety of node layout which is overkill for x86_64. Nevertheless,
>> without size_zones(), I thought the architecture-specific code for x86_64
>> memory initialisation was a bit easier to read. With
>> architecture-independent zone size and hole calculation, you only have to
>> understand the relevant code once, not once for each architecture.
>
>
> I think i386 SRAT NUMA should be just removed at some point - it never
> worked all that well and is quite complicated.
Assuming you mean the code, are these patches not a readable replacement?
> That leaves IA64, x86-64
> and ppc64. I suspect keeping the code there near their low level
> data structures is better.
>
For PPC64, the architecture-independent representation *is* the only copy
(which is why 128 arch-specific LOC were deleted for ppc including a
stryct init_node_data array of nids, start_pfns and end_pfns). IA64's low
level representation uses addresses, not pfns, so having only one copy
would be a very invasive patch which is not a good idea without a test
box.
In many cases, the low-level representation between architectures is
similar. The representation I use is the common elements all the
architectures need - nid, start_pfn, end_pfn.
>>> I have my doubts that is really a improvement over the old state.
>>>
>>
>> For x86_64 in isolation or the entire set of patches?
>
> For x86-64/i386. I haven't read the other architectures.
>
ok.
>>> I think it would be better if you just defined some simple "library functions"
>>> that can be called from the architecture specific code instead of adding
>>> all this new high level code.
>>>
>>
>> What sort of library functions would you recommend? x86_64 uses
>> add_active_range() and free_area_init_nodes() from this patchset which
>> seemed fairly straight-forward.
>
> e.g. a generic size_zones(). Possibly some others.
>
For a generic size_zones(), one would need an architecture independent way
to pass in active page frame ranges and what node they are on. So we end
up with code very similar to what I've posted.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
-------------- next part --------------
A non-text attachment was scrubbed...
Name: driver_test.c
Type: text/x-csrc
Size: 9413 bytes
Desc: driver_test.c
URL: <http://lists.ozlabs.org/pipermail/linuxppc-dev/attachments/20060413/9c49f69f/attachment.c>
More information about the Linuxppc-dev
mailing list