From phillips at google.com Sat Nov 19 08:25:15 2005 From: phillips at google.com (Daniel Phillips) Date: Fri, 18 Nov 2005 13:25:15 -0800 Subject: [Prophesy] Repository format Message-ID: <437E46BB.3060102@google.com> Here is a cut and paste from the (updated) source text, now called untridge.c. The repository format is evolving, but it would seem, in a monotonic forward direction. After mulling over how to add versioned text to the existing versioned namespace, I basically reinvented the concept of weave, but with twists, see below. This is in process of implementation. The notion of multiple births appeared recently, mainly as a way to avoid allowing multiple parents in the version graph, which would have a fairly horrible impact on version generation efficiency. Regards, Daniel /* * My terminology: * node -> a unique object in the repository store. The low order bit of a * node id specifies dict or text and the high order bits are the node * number. * * dict node -> a versioned directory object (a weave-like thingy) * text node -> a versioned file object (also a weave-like thingy) * * A repository is a tree of dict nodes with text nodes at the leaves. Each * node (dict or text) carries its own, complete history. In other words, * each node is a versioned object. * * The repository is much like a traditional filesystem. The correspondence is: * text node -> file * dict node -> directory * node number -> inode number * * Dict node format: * A dict node is an unordered collection of dict entries. Each entry has a * node id, a version in which the entry was created and N versions in which it * died. Note the similarity to the text node format below. An entry can die * by being unlinked (delete or move) or by being renamed. * * Text node format: * A text node is a sequence of text fragments. Each text fragment points * to a region of the base text, which is initially the exact file text. As * new fragments are inserted, new fragment text is appended to the base * text. Each fragment gives the version at which the fragment was first * inserted and the N (0 or more) versions where the fragment was deleted. * In effect, the birth and N deaths specify a subset of the repository * version graph. We generate the plaintext for a given version by * determining, for each fragment, whether the text fragment is live in the * version we are generating. If so, we append the fragment text to the * output text and go on to the next fragment. To see if a given fragment * is live (included in) a given version, we check that the fragment birth * and none of its deaths are ancestors of the current version. * * Improved text fragment format: * In effect, each text fragment carries its own, complete history. But if we * don't do something special, then when we merge we will have to create a new * fragment at the merge point because the merge source is not an ancestor of * the merge result. It then becomes messy to match up the old and new, * identical fragments to learn that a given fragment actually came from a * merge source, not the merge point. In traditional weave format this * problem is solved by allowing multiple version parents, implicitly * including all changes from the forked version(s) in the merge result. * Unwanted text in the resulting "mash-up" is then deleted. I don't want to * use this approach because I do not know any easy way to write an O(1) * ancestry test if multiple parents are allowed. Furthermore, there could * be many parents of a merge point, or the merge may be a cherry-pick, * either of which could require many deletions. Instead I allow a fragment * to have multiple births. When we merge, we add a new birth at the merge * point. The first birth is the real birth of the fragment, other births * are there to compute fragment liveness efficiently. The liveness test is * now: any of the fragment births and none of its deaths are ancestors of * the current version. * * Future things: * - It would be easy to generalize the versioned tree to multiple roots, i.e., * multiple directory trees possibly sharing files via hardlink or symlink. * - versioned node attributes * - for either dict or text nodes * - fixed size and variable sized are separate cases * - standard attributes (e.g., permissions, text) are separate case * from user defined * - Symlinks * - just specially flagged text nodes */ From phillips at google.com Sat Nov 19 08:57:15 2005 From: phillips at google.com (Daniel Phillips) Date: Fri, 18 Nov 2005 13:57:15 -0800 Subject: [Prophesy] Prophesy mailing list moved Message-ID: <437E4E3B.9080307@google.com> Hi all, The Prophesy mailing list has moved from samba.org to a better suited home at ozlabs.org. The dark age of version control being officially over, this list is now public. I will publicize some working code pretty soon and hopefully get some help verifying it. On the development front, I have begun another spurt of work on the code base. It seems to be headed towards the point of doing something interesting and/or useful within a finite amount of time, which is encouraging. I have also managed to get out there and survey the engine details of some of some other projects, including Subversion, Bazaar-ng, Git and Mecurial. This convinced me that I do indeed have some new material to throw into the pot, and helped me fill in a lot of useful and sometimes not so obvious background material. I now know what my main goal is here: scalability. This means both scalability of the repository, that is, millions of revisions across hundreds of thousands of files, and scalability of the repository user, that is, how easy is this to use? How pleasant? How well does it do the job? How fast can you work? Regards, Daniel