[Prophesy] Repository format
Daniel Phillips
phillips at google.com
Sat Nov 19 08:25:15 EST 2005
Here is a cut and paste from the (updated) source text, now called untridge.c.
The repository format is evolving, but it would seem, in a monotonic forward
direction. After mulling over how to add versioned text to the existing
versioned namespace, I basically reinvented the concept of weave, but with
twists, see below. This is in process of implementation.
The notion of multiple births appeared recently, mainly as a way to avoid
allowing multiple parents in the version graph, which would have a fairly
horrible impact on version generation efficiency.
Regards,
Daniel
/*
* My terminology:
* node -> a unique object in the repository store. The low order bit of a
* node id specifies dict or text and the high order bits are the node
* number.
*
* dict node -> a versioned directory object (a weave-like thingy)
* text node -> a versioned file object (also a weave-like thingy)
*
* A repository is a tree of dict nodes with text nodes at the leaves. Each
* node (dict or text) carries its own, complete history. In other words,
* each node is a versioned object.
*
* The repository is much like a traditional filesystem. The correspondence is:
* text node -> file
* dict node -> directory
* node number -> inode number
*
* Dict node format:
* A dict node is an unordered collection of dict entries. Each entry has a
* node id, a version in which the entry was created and N versions in which it
* died. Note the similarity to the text node format below. An entry can die
* by being unlinked (delete or move) or by being renamed.
*
* Text node format:
* A text node is a sequence of text fragments. Each text fragment points
* to a region of the base text, which is initially the exact file text. As
* new fragments are inserted, new fragment text is appended to the base
* text. Each fragment gives the version at which the fragment was first
* inserted and the N (0 or more) versions where the fragment was deleted.
* In effect, the birth and N deaths specify a subset of the repository
* version graph. We generate the plaintext for a given version by
* determining, for each fragment, whether the text fragment is live in the
* version we are generating. If so, we append the fragment text to the
* output text and go on to the next fragment. To see if a given fragment
* is live (included in) a given version, we check that the fragment birth
* and none of its deaths are ancestors of the current version.
*
* Improved text fragment format:
* In effect, each text fragment carries its own, complete history. But if we
* don't do something special, then when we merge we will have to create a new
* fragment at the merge point because the merge source is not an ancestor of
* the merge result. It then becomes messy to match up the old and new,
* identical fragments to learn that a given fragment actually came from a
* merge source, not the merge point. In traditional weave format this
* problem is solved by allowing multiple version parents, implicitly
* including all changes from the forked version(s) in the merge result.
* Unwanted text in the resulting "mash-up" is then deleted. I don't want to
* use this approach because I do not know any easy way to write an O(1)
* ancestry test if multiple parents are allowed. Furthermore, there could
* be many parents of a merge point, or the merge may be a cherry-pick,
* either of which could require many deletions. Instead I allow a fragment
* to have multiple births. When we merge, we add a new birth at the merge
* point. The first birth is the real birth of the fragment, other births
* are there to compute fragment liveness efficiently. The liveness test is
* now: any of the fragment births and none of its deaths are ancestors of
* the current version.
*
* Future things:
* - It would be easy to generalize the versioned tree to multiple roots, i.e.,
* multiple directory trees possibly sharing files via hardlink or symlink.
* - versioned node attributes
* - for either dict or text nodes
* - fixed size and variable sized are separate cases
* - standard attributes (e.g., permissions, text) are separate case
* from user defined
* - Symlinks
* - just specially flagged text nodes
*/
More information about the Prophesy
mailing list