[ccan] AVR, hashtable and general containers...
Rusty Russell
rusty at rustcorp.com.au
Sat Oct 23 13:18:12 EST 2010
On Fri, 22 Oct 2010 05:19:09 pm Joseph Adams wrote:
> Google searching "avl tree implementation" yields implementations
> cluttered with commentary, redundant functions for the sake of
> pedagogy (e.g. separate RotateLeft and RotateRight functions), and/or
> syntax that doesn't work with a C compiler. My goal was to write a
> sorted container suitable for low-memory environments (like a TI-89
> calculator), and while I was at it, an AVL implementation that's
> simple and works.
That's my favorite feature of your avl module :)
> On Thu, Oct 21, 2010 at 8:17 PM, Rusty Russell <rusty at rustcorp.com.au> wrote:
> > One difference between our interfaces is that you chose separate keys
> > and values, and I chose to assume a single object which contains both. In
> > my experience that's the norm; you have an object you want to put into the
> > hash. If that's true, then in the other case you could always make a
> > key/value object.
>
> It sort of depends on what the user needs. The way some programming
> languages deal with this (such as C++ and Haskell) is by having two
> different libraries for each case. For just storing keys, it's
> usually called a "set"; for storing keys and values, it's usually
> called a "map".
Pulling me out of my C-thinking rut :) Let's use that nomenclature.
> I chose to implement the AVL module as a "map" because I think it's
> easier to use a map as a set than it is to use a set as a map. For
> instance, a map from strings to values can be created with
> avl_new((AvlCompare) strcmp), as opposed to having to wrap the
> string/value pair in a struct and write a compare function as well.
This has made me think very hard. Thank you.
Firstly, for avl, an extra pointer (as needed for a map) might not matter,
but for a hashtable it doubles the size.
The accepted wisdom in linked lists in C is to embed rather than dissociate
the list element. This avoids allocations and (sometimes) maintains type
safety. So let's apply a congruent argument here.
If our objects know they are in a container, then we should not only prefer
sets to maps, but the AVL struct should be placed into the object. (Linear
hash has the advantage that there is no per-object struct).
The counter-argument is that if you want hashes you know where to find them,
and that this should be a higher-level module. Since linear hash does so
well at the set model, avl should explicitly be aimed at the "I really want
a generic map".
> I originally wasn't even going to
> have AvlIter, instead having a function called avl_walk that takes a
> callback function. However, I quickly came to find that using
> avl_walk was really ugly (though slightly less ugly if you use GNU C's
> nested functions), so I implemented a simple iterator system so I
> could write avl_foreach and avl_foreach_reverse macros.
That matches my experience, though I have a preference for open-coded
control structures rather than loops where possible, eg:
for (iter = avl_iter_fwd(avl); iter.node; avl_next(avl, &iter))
This only works if names are short enough to make a one-liner; you can see
the tricks I used here.
> Perhaps modules named after algorithms (e.g. avl, btree) rather than
> purpose (e.g. hash) should be more concerned with presentation of the
> algorithm for people to learn from rather than features and uniformity
> (though they should still be well-tested!).
The simplicity of avl is definitely attractive. I'm thinking now
that we should avoid gratuitous differences but not aim for identical APIs
at all.
On the question of higher-level interfaces, I started implementing a "poor
man's template" for avl and hashtable; basically a mega-macro
AVL_DEFINE(prefix, ktype, dtype, cmpfn) which declared typesafe wrappers
like so:
struct avl_##prefix;
static inline struct avl_##prefix *avl_##prefix##_new(void)
{
// Check type of cmpfn is correct...
(void)sizeof((cmpfn)((const ktype *)NULL, (const ktype *)NULL));
return (struct avl_##prefix *)avl_new(cmpfn);
}
...
If such a thing becomes the norm, I think it should be in a separate header
(perhaps not a separate module) because it's ugly, even if the usage is nicer.
> For a map or set meant to be used the same way C++ programmers use
> std::map and std::set, perhaps those CCAN modules should be called
> "map" and "set" and be filled with all the bells and whistles. The
> ideal map/set would probably be a balanced binary tree based on
> red-black trees and would feature insertion/deletion-safe iterators
> like I described above. Of course, just copying the AVL code and
> adding bells/whistles would be fine too, but I won't have time to do
> that for a while.
Do we really want to go there? I think Judy trees would be optimal for
the map case (though benchmarking it vs. bitstealing hash is on my TODO),
though I don't want to take other's packaged and supported libraries and
drag them into CCAN.
Thanks,
Rusty.
More information about the ccan
mailing list