[Prophesy] Thoughts on lineage and derivation
Daniel Phillips
phillips at bonn-fries.net
Sun Jun 2 16:23:53 EST 2002
I've got a few thoughts on database structure that I've been meaning to jot
down, so here goes.
First, I'd like start using the term 'version' where previously I've been
saying 'node'. It's a lot more descriptive of what we really have in the
database. I'll just say 'node' when I'm talking about graph theory in
general.
Speaking of graph theory, I realized that what we have in the database isn't
a tree of versions at all, it's an arbitrary connected graph. It's pretty
much of a stretch to find strict trees in the real world of code development
- cross pollination makes short work of that misapprehension. The only thing
that makes it look somewhat like a tree is geneology, and see the above
remark on cross-pollination. We could say at least that it's a non-cyclic
graph because time only goes in one direction, but even that gets confused
sometimes. Just try importing some old code and see if time always goes
forward or not.
So let's design everything based on no presumption of strict graph structure.
One thing that does impose a little order on the situation is that there is
only one order that changes are apllied to the database. That's a simple
matter of incrementing a change number every time a change is applied. We
won't rely on that for much more than auditing and reporting though, since
it's too restrictive.
Just for fun, we'll allow changes to be applied to any version in the tree,
and yes, that can create various sort of inconsistencies, but instead of
denying that such things can happen, we'll just record the fact that those
inconsistencies exist in the database, and somebody can attempt to clean them
up later. We do not necessarily have to forget about the good old consistent
version at the affected point in the database, and arguably we should never
forget a version that's in an 'interior' version anyway. (An 'interior'
version is one from which at least one later version was derived.)
For that matter, it's a mistake to think of derivation along a single line,
or even a single tree. In fact, there are many objects that make up each
version, and any of them can show lineage and be derived from, not even
necessarily in the same version. So lineage and inheritance are a lot more
complex that they seem at first glance.
What's going to save us from getting confused are the object ids. For any
given object, typically a single source file, we will be able to trace exact
lineage and derivations from it, and those will form a strict tree. (Um,
unless we allow objects to be made up of other objects, which I think we do.)
Notice how using an object id as a handle for a file object neatly answers
the question of how to handle renames. The name (complete with path) is just
an attribute of the file object, and can change from version to version, just
as the file text can.
--
Daniel
More information about the Prophesy
mailing list