[ccan] AVR, hashtable and general containers...

Fri Oct 22 17:49:09 EST 2010

Google searching "avl tree implementation" yields implementations
cluttered with commentary, redundant functions for the sake of
pedagogy (e.g. separate RotateLeft and RotateRight functions), and/or
syntax that doesn't work with a C compiler.  My goal was to write a
sorted container suitable for low-memory environments (like a TI-89
calculator), and while I was at it, an AVL implementation that's
simple and works.

On Thu, Oct 21, 2010 at 8:17 PM, Rusty Russell <rusty at rustcorp.com.au> wrote:
> One difference between our interfaces is that you chose separate keys
> and values, and I chose to assume a single object which contains both.  In
> my experience that's the norm; you have an object you want to put into the
> hash.  If that's true, then in the other case you could always make a
> key/value object.

It sort of depends on what the user needs.  The way some programming
languages deal with this (such as C++ and Haskell) is by having two
different libraries for each case.  For just storing keys, it's
usually called a "set"; for storing keys and values, it's usually
called a "map".

I chose to implement the AVL module as a "map" because I think it's
easier to use a map as a set than it is to use a set as a map.  For
instance, a map from strings to values can be created with
avl_new((AvlCompare) strcmp), as opposed to having to wrap the
string/value pair in a struct and write a compare function as well.

> After some thought, I like the iterator object approach you used.  Should
> find return an iterator?  Should delete take one?  Or should we have two
> deletes: a "delete this object" (ie. find by pointer) and "delete by key"
> (ie. lookup, then delete)?  Assuming lookup is efficient, the latter is
> probably enough.

My goal was to make the AVL module simple, so I specifically stayed
away from these complexities (note that I actually did what you're
suggesting in the btree module).  I originally wasn't even going to
have AvlIter, instead having a function called avl_walk that takes a
callback function.  However, I quickly came to find that using
avl_walk was really ugly (though slightly less ugly if you use GNU C's
nested functions), so I implemented a simple iterator system so I
could write avl_foreach and avl_foreach_reverse macros.

Note that the iterators in the AVL module are not safe to use after
insertions/deletions, nor can you start walking the other direction
simply by flipping the "direction" field of the iterator.  However, if
a parent field were added to AVL nodes, it would be possible and
simple to have insertion/deletion-safe iterators.  Again, my goal was
simplicity, so I chose not to do it that way.

> I think if we make some decisions for these two, we can get future containers
> to follow suit.

Perhaps modules named after algorithms (e.g. avl, btree) rather than
purpose (e.g. hash) should be more concerned with presentation of the
algorithm for people to learn from rather than features and uniformity
(though they should still be well-tested!).

For a map or set meant to be used the same way C++ programmers use
std::map and std::set, perhaps those CCAN modules should be called
"map" and "set" and be filled with all the bells and whistles.  The
ideal map/set would probably be a balanced binary tree based on
red-black trees and would feature insertion/deletion-safe iterators
like I described above.  Of course, just copying the AVL code and
adding bells/whistles would be fine too, but I won't have time to do
that for a while.

Joey Adams