[Prophesy] Database Structure and the Transport System

Thu Jun 13 01:43:16 EST 2002

As I mentioned earlier, I am working towards the goal of implementing a basic 
source tree transport system, which implements the following two operations:

  tag - give a name to the current version of the source tree
  goto - transform the current source tree to a previously tagged version

I have a theory that if I do this very accurately and efficiently, first it 
will be immediately useful for certain purposes such as preparing patch sets 
and storing multiple source tree versions in a single tree, and second, it 
can be extended naturally to become an elegant and useful distributed source 
code manager.

The immediate need is to define a database structure well suited to capturing 
incremental changes to a source tree and supporting the above transport 
system operations.  To help understand my thinking, it's useful to consider 
the following points:

 1) Not all the source is recorded in the database proper.  Some, or most
    of the text exists only as normal files in the source tree.  Only
    enough information is recorded in the database to support the transport
    mechanism.  Though this does add a little complexity to the system, it 
    means that the database can be considerably smaller in common 
    situations, and putting a source tree under management is a much
    faster operation than if all the source text had to be compressed and
    loaded into it.  (As an extension, the on-disk source could be
    redundantly encoded in the database in order to provide protection 
    against the possibility that the on-disk source could be changed
    while Prophesy isn't watching.)

 2) Transforms are unidirectional.  This is simply to save space - we
    don't have to record the text that a transform deletes from the
    on-disk version, only added text.  However, we can easily compute the 
    inverse of a transform given the input text, or portions of it.  At
    the time the transport system transforms the on-disk text into some
    other version, those portions of the input text needed to compute the 
    inverse transformation - needed so that the transport system can
    restore the on-disk text to its original form - can be stored in the 
    database.

Prophesy Database Structure
---------------------------

So far, I've identified the following essential database entities:

   - file
   - directory
   - version
   - transform

where 'directory' is a kind of file.  Each of these objects will have an 
internally-generated, permanent id.  The object id, especially for files and 
directories, is tantalizingly similar to a file inode, and it's tempting to 
use the underlying filesystem's actual inode number for the id, except that 
we are allowed to "copy -a" the whole source tree structure, and the inode 
numbers would change in the process.  Drat.  This means that Prophesy 
and the underlying filesystem are going to be doing a lot of lookups in 
parallel: the filesystem looks up a file by path and name, yielding an inode, 
then advises Prophesy that the file is to be altered.  Prophesy then has to 
look up the file by path and name, yielding an object id.  Oh well, we will 
ensure the latter operation is efficient.

The main (or perhaps only) function of directory objects is to support lookup 
of objects by name.  Each file or directory object has a name and a directory 
id, this pair being unique in a version.  A file object can be known by more 
than one name/directory pair, that  is, it can be hardlinked.

Tree Structure of Versions
--------------------------

The ability to return to some previous version of the source text and modify 
it means that the version structure is a tree, that is, any version can be 
forked.  The version table represents this tree in the form of a flat, 
relational table.

   Version table:

      version id, parent version, tag

That is, each version knows its parent.  If we want to know all the children 
of some version then we can query the database for all versions with the 
given parent.  We can cache the result if we want, to support repeated 
queries of this form efficiently.

Primary Data Representation
---------------------------

All primary data in Prophesy is represented by the combination of the current 
on-disk source text and changes relative to the current source text.  These 
changes take the form of three tables, as described below, and a journal 
table, to which additions to any of the three change tables are logged.

   journal table:

       journal id, comment, author, timestamp

The sole purpose of the journal table is tracking; the transport system does 
not make reference to it.  While the journal could in theory be used to wind 
the whole database back to any historical state, the transport mechanism 
provides a more powerful and efficient way of doing that.

Each change to file text between two versions results in an addition to the 
text change table.

   text change table:

      object id, input version, output version, journal id, 
      transform, untransform

The forward transform is not stored until needed, since it can be generated 
from the current text and the untransform, and so would contain only 
redundant text.  The forward transform must be generated the first time the 
current version moves downstream of the output version, that is, towards the 
root.  Optionally, the forward transform can be stored redundantly, to 
protect against the possibility that the on-disk tree could be changed 
without knowledge of Prophesy, or to allow an entire repository to be copied 
by copying just a single database file.

Each file or directory object in Prophesy database has a unique object id, 
which is used to track the object as it evolves from version to version.

   object create table:

      object id, input version, output version, journal id, name

An object delete is exactly an object create with the input and output 
versions reversed.  In other words, a create going from version A to version 
B implies a delete going from version B to version A.  For any delete, the 
object text must be stored, however for a create that would be redundant 
since the current text is on disk.  In fact, this is the same consideration 
apply as for text changes, and is handled the same way.  That is, when a 
non-empty file is deleted, Prophesy enters both a reverse create and a text 
change consisting of a single text remove (reverse add) operation into the 
database.

When an object is deleted, its object id is not reused (possibly excepting 
cases where an object is created and deleted within the same version, or an 
entire version is discarded) since it continues to exist in other versions 
within the same repository.  For the time being, a 32 bit object id should be 
sufficient.

Handling hard links correctly is expected to be problematic, however 
representing them is not a problem.  A hard link is simply a create for an 
object that already exists.

>From version to version, file and directory objects may be moved from any 
place in the source tree to any other.  Each such move results in an entry in 
the object move table.  As with filesystems, object rename is treated as a 
move.

   object move table:

      object id, input version, output version, journal id,
      input name, output name

Here, and everywhere else names are used, a 'name' is a pair: atom, directory.
Using atom ids rather than literal text in the move and create records means 
that these records consist only of fixed-size fields, which is friendly to 
database optimization.  Atoms also provide a measure of compression, since 
the atom table is shared by all versions.

Current State Cache
-------------------

Other that the version table, all primary objects in the database represent 
differences rather than current state.  A current state for any version can 
always be constructed by applying all transforms, create/deletes and moves 
encountered on the path from an old current version to a new current version. 
However, filename lookups need to be efficient, and so a hash table mapping 
all current names (atom, dir pair) to object ids is maintained incrementally.

The list of all current objects is easily and efficiently generated by taking 
the union of all object creates on the path from the root to the current 
version, less all object deletes.  This is rarely needed, so it is not 
maintained incrementally.

Name Lookup
-----------

Name lookup by full path is needed each time Prophesy intercepts and 
processes a change to a file, and that could add up to a lot of lookups. For 
example, a global edit might be performed, or a whole set of files untarred 
into a subdirectory, or a directory deleted.  It is desirable that typical 
file operations not be slowed noticeably by putting a source tree under 
management.

Therefore, to optimize directory lookups, an additional hash table is 
maintained, which maps hashes of full directory paths to directory objects.  
This avoids the need to iterate through each section of a directory path to 
perform a lookup.  When a directory name is changed, hashes of subdirectories 
need to be invalidated, and this is the only case where Prophesy needs to 
know the subdirectory tree of a given directory.  To optimize this, a 
directory table for the current version is maintained incrementally:

   Directory table:

      directory id, parent directory id

which forms a tree, since multiple directory ids can have the same parent 
directory id.

Epilogue
--------

On the 'well begun is half done' principle, this post constitutes my last 
major effort before turning to preparations for the Ottawa Linux Symposium 
and kernel summit.  In other words, I won't be implementing any of this for 
about a month.  This should provide adequate time for the ideas to mature.  
Of course I'll respond to any critical comment, or elaborate on any points I 
glossed over too quickly.

-- 
Daniel