[Prophesy] Thoughts on lineage and derivation

Sun Jun 2 16:23:53 EST 2002

I've got a few thoughts on database structure that I've been meaning to jot 
down, so here goes.

First, I'd like start using the term 'version' where previously I've been 
saying 'node'.  It's a lot more descriptive of what we really have in the 
database.  I'll just say 'node' when I'm talking about graph theory in 
general.

Speaking of graph theory, I realized that what we have in the database isn't 
a tree of versions at all, it's an arbitrary connected graph.  It's pretty 
much of a stretch to find strict trees in the real world of code development 
- cross pollination makes short work of that misapprehension.  The only thing 
that makes it look somewhat like a tree is geneology, and see the above 
remark on cross-pollination.  We could say at least that it's a non-cyclic 
graph because time only goes in one direction, but even that gets confused 
sometimes.  Just try importing some old code and see if time always goes 
forward or not.

So let's design everything based on no presumption of strict graph structure.

One thing that does impose a little order on the situation is that there is 
only one order that changes are apllied to the database.  That's a simple 
matter of incrementing a change number every time a change is applied.  We 
won't rely on that for much more than auditing and reporting though, since 
it's too restrictive.

Just for fun, we'll allow changes to be applied to any version in the tree, 
and yes, that can create various sort of inconsistencies, but instead of 
denying that such things can happen, we'll just record the fact that those 
inconsistencies exist in the database, and somebody can attempt to clean them 
up later.  We do not necessarily have to forget about the good old consistent 
version at the affected point in the database, and arguably we should never 
forget a version that's in an 'interior' version anyway.  (An 'interior' 
version is one from which at least one later version was derived.)

For that matter, it's a mistake to think of derivation along a single line, 
or even a single tree.  In fact, there are many objects that make up each 
version, and any of them can show lineage and be derived from, not even 
necessarily in the same version.  So lineage and inheritance are a lot more 
complex that they seem at first glance.

What's going to save us from getting confused are the object ids.  For any 
given object, typically a single source file, we will be able to trace exact 
lineage and derivations from it, and those will form a strict tree.  (Um, 
unless we allow objects to be made up of other objects, which I think we do.)

Notice how using an object id as a handle for a file object neatly answers 
the question of how to handle renames.  The name (complete with path) is just 
an attribute of the file object, and can change from version to version, just 
as the file text can.

-- 
Daniel