From rasmus at jaquet.dk  Wed May  1 16:45:26 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Wed, 1 May 2002 08:45:26 +0200
Subject: [Prophesy] General questions
Message-ID: <20020501084526.B14505@jaquet.dk>

Hi.

Thanks for the invite and and thanks for letting me in.

I have a number of questions that I'll list below in no apparent
order. The list archives were not that stuffed, so I guess that
some of the initial discussions happened prior to or outside this 
list?

Anyway, the list:

1) This project would seem to be a reaction to the BK thread on
   lk and its goal (I guess) would be to get Linus off BK. So,
   are we as a group aware/familiar with the features of BK that
   Linus, Garzik, Riel etc, like and want? If so, could somebody
   list them for me?

2) In his 'Answers from 39000 ft' mail, Daniel states that prophesy
   will manage all files in a tree. Is that convenient? If I want
   prophesy to manage my kernel tree, I surely dont want it to
   manage all the gunk created during a compile?

3) Are anybody on this list familiar with SCM? More to the point,
   could anybody here give me a list of SCM related links/infor-
   mation? Weave vs. diff+patch(xdelta)? How to merge sanely/
   automagically?

4) One of the features from 1) would be the distributed nature
   of BK, I guess? Are there any thoughts on how to handle this?

5) Daniel's 39K mail didn't mention changesets, the ability to
   group changes to files. I guess we are going to have this?
   And changesets would be on a delta-commit basis? (These may
   be too concrete for now, answer at will :)

6) Tom Lord's 'arch' have been mentioned as an alternative for
   BK. Are we aware of how he handles some of the above questions?
   (All these aware/familiar questions are really disguised
   versions of 'I dont know. Please tell me about X if you know.')

7) Why is this a closed list?


Regards,
  Rasmus


From phillips at bonn-fries.net  Fri May  3 13:45:14 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Fri, 3 May 2002 05:45:14 +0200
Subject: [Prophesy] General questions
In-Reply-To: <20020501084526.B14505@jaquet.dk>
References: <20020501084526.B14505@jaquet.dk>
Message-ID: <E173U0R-0002FE-00@starship>

On Wednesday 01 May 2002 08:45, Rasmus Andersen wrote:
> Hi.
> 
> Thanks for the invite and and thanks for letting me in.
> 
> I have a number of questions that I'll list below in no apparent
> order. The list archives were not that stuffed, so I guess that
> some of the initial discussions happened prior to or outside this 
> list?
> 
> Anyway, the list:
> 
> 1) This project would seem to be a reaction to the BK thread on
>    lk and its goal (I guess) would be to get Linus off BK. So,
>    are we as a group aware/familiar with the features of BK that
>    Linus, Garzik, Riel etc, like and want? If so, could somebody
>    list them for me?

The best source of that information is the 'patch penguin' thread,
where Linus talks about starting with it, and then talks soon after
about what he wants in it.

I'd see it being useful to other people way before being attractive
enough to Linus to break his BK habit.  I've noticed that I personally
am spending far more time fiddling around creating and maintaining
patch sets than I should, so... if I invested that time in getting
some tools together instead of messing with the patch it would be a
win already.

The *immediate* purpose of this is to provide a repository that the
patchbot can operate and that Linus can pull from, and which is not
Bitkeeper.

> 2) In his 'Answers from 39000 ft' mail, Daniel states that prophesy
>    will manage all files in a tree. Is that convenient? If I want
>    prophesy to manage my kernel tree, I surely dont want it to
>    manage all the gunk created during a compile?

Oh no, I only meant source files.  We need a way to know what is
source and what is not.  The new kbuild makes that much easier, by
taking all the generated files out of the tree.  This still leaves
questions about configuration-generated symlinks, which should not
fool the SCM into thinking there are more files than there really 
are.

> 3) Are anybody on this list familiar with SCM? More to the point,
>    could anybody here give me a list of SCM related links/infor-
>    mation? Weave vs. diff+patch(xdelta)? How to merge sanely/
>    automagically?

I'm not 100% clear on 'weave' yet myself.  Soon will be though:

  http://www.perforce.com/perforce/life.html

> 4) One of the features from 1) would be the distributed nature
>    of BK, I guess? Are there any thoughts on how to handle this?

I thought I'd first think about having it work very well, locally.
We don't need it to be distributed for either of the first two
applications, that is, preparing patch sets and acting as a
repository for the patchbot.  Or, another way of putting that is,
Larry already provides us a way for it to be distributed, through
his pull.  And no, I haven't thought at all about how technically
difficult it will be to support a BK pull yet.  There might even
be legal questions of whether Larry's patents allow us to support
a BK pull, and if so... then I think we'll suddenly find a lot
more developers on the project, so that possibility doesn't worry
me.

> 5) Daniel's 39K mail didn't mention changesets, the ability to
>    group changes to files. I guess we are going to have this?
>    And changesets would be on a delta-commit basis? (These may
>    be too concrete for now, answer at will :)

Yes.  I'm trying to avoid steering too close to Larry's terminology
just for now, until I understand how standard it is.  For now, a
'delta' to me, is the difference between any two source trees, and
deltas can be partitioned into things called... I don't know,
subdeltas or something, the relationship being, and subdeltas can
also be partitioned.  Not only that, but the partitioning of
subdeltas is fluid, and can be changed using database queries, such
as 'select all the files that satisfy this logical expression, and
the delta will be partitioned into the part that affects those
files and the part that doesn't'.  Or logical tests could be applied
at the line level, and so on.  We want to really leverage the fact
that we can put a full-blown sql database into the mix, that's
something the proprietary side would have a lot of trouble doing.

> 6) Tom Lord's 'arch' have been mentioned as an alternative for
>    BK. Are we aware of how he handles some of the above questions?
>    (All these aware/familiar questions are really disguised
>    versions of 'I dont know. Please tell me about X if you know.')

No, and I swear I will try it this weekend!

> 7) Why is this a closed list?

It was the decision of the gentleman who set it up, or perhaps it was
an accident.  Considering the recent flamewar, I don't think it's
wrong to keep it closed for the time being.

-- 
Daniel


From rasmus at jaquet.dk  Sat May  4 06:15:28 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Fri, 3 May 2002 22:15:28 +0200
Subject: [Prophesy] General questions
In-Reply-To: <E173U0R-0002FE-00@starship>
References: <20020501084526.B14505@jaquet.dk> <E173U0R-0002FE-00@starship>
Message-ID: <20020503201528.GE1893@jaquet.dk>

(I'm just back from business travel and is quite bombed, so this
is even terser than usual.)

On Fri, May 03, 2002 at 05:45:14AM +0200, Daniel Phillips wrote:
> > 1) This project would seem to be a reaction to the BK thread on
> >    lk and its goal (I guess) would be to get Linus off BK. So,
> >    are we as a group aware/familiar with the features of BK that
> >    Linus, Garzik, Riel etc, like and want? If so, could somebody
> >    list them for me?
> 
> The best source of that information is the 'patch penguin' thread,
> where Linus talks about starting with it, and then talks soon after
> about what he wants in it.

I had to delete most of that thread in order to keep my server
from bursting into flames due to the flame density.

> I'd see it being useful to other people way before being attractive
> enough to Linus to break his BK habit.  I've noticed that I personally
> am spending far more time fiddling around creating and maintaining
> patch sets than I should, so... if I invested that time in getting
> some tools together instead of messing with the patch it would be a
> win already.

Perhaps you could explain a bit more here? This seems like something
we could work into an advantage.

> 
> The *immediate* purpose of this is to provide a repository that the
> patchbot can operate and that Linus can pull from, and which is not
> Bitkeeper.

If immediacy is a goal, then there is working code out there
already... But you know that.

> > 4) One of the features from 1) would be the distributed nature
> >    of BK, I guess? Are there any thoughts on how to handle this?
> 
> I thought I'd first think about having it work very well, locally.
> We don't need it to be distributed for either of the first two
> applications, that is, preparing patch sets and acting as a
> repository for the patchbot.  Or, another way of putting that is,
> Larry already provides us a way for it to be distributed, through
> his pull.  And no, I haven't thought at all about how technically
> difficult it will be to support a BK pull yet.  There might even
> be legal questions of whether Larry's patents allow us to support
> a BK pull, and if so... then I think we'll suddenly find a lot
> more developers on the project, so that possibility doesn't worry
> me.

One problem is that making something distributed sucks if the
initial design dont't allow for it.

Rasmus


From phillips at bonn-fries.net  Sat May  4 06:42:27 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Fri, 3 May 2002 22:42:27 +0200
Subject: [Prophesy] General questions
In-Reply-To: <20020503201528.GE1893@jaquet.dk>
References: <20020501084526.B14505@jaquet.dk> <E173U0R-0002FE-00@starship> <20020503201528.GE1893@jaquet.dk>
Message-ID: <E173jsq-0003bL-00@starship>

On Friday 03 May 2002 22:15, Rasmus Andersen wrote:
> (I'm just back from business travel and is quite bombed, so this
> is even terser than usual.)

Terse is good.

> > I'd see it being useful to other people way before being attractive
> > enough to Linus to break his BK habit.  I've noticed that I personally
> > am spending far more time fiddling around creating and maintaining
> > patch sets than I should, so... if I invested that time in getting
> > some tools together instead of messing with the patch it would be a
> > win already.
> 
> Perhaps you could explain a bit more here? This seems like something
> we could work into an advantage.

Yes.  I'd like to somehow turn a 'patch' into an object managed by the
SCM system.  So you'd carry not just multiple tree versions, but multiple
patches forward, the way real live developers do.  Since developers do it,
it must be possible to automate.  But every SCM I've looked at so far
takes a view of the whole tree, a history of commits to it, and a tree of
forks.  That is somehow not the whole story  There is, in reality, much
more structure to parallel development than that.

> > The *immediate* purpose of this is to provide a repository that the
> > patchbot can operate and that Linus can pull from, and which is not
> > Bitkeeper.
> 
> If immediacy is a goal, then there is working code out there
> already... But you know that.

Right, and ignoring it would be silly.  I'm working with BitKeeper now,
Arch and Subversion are on my list, I'm looking at some of the power
features of CVS, and looking at the literature.  But mainly, I'm thinking
about things from first principles as is my habit.

More on that later.

> > > 4) One of the features from 1) would be the distributed nature
> > >    of BK, I guess? Are there any thoughts on how to handle this?
> > 
> > I thought I'd first think about having it work very well, locally.
> > We don't need it to be distributed for either of the first two
> > applications, that is, preparing patch sets and acting as a
> > repository for the patchbot.  Or, another way of putting that is,
> > Larry already provides us a way for it to be distributed, through
> > his pull.  And no, I haven't thought at all about how technically
> > difficult it will be to support a BK pull yet.  There might even
> > be legal questions of whether Larry's patents allow us to support
> > a BK pull, and if so... then I think we'll suddenly find a lot
> > more developers on the project, so that possibility doesn't worry
> > me.
> 
> One problem is that making something distributed sucks if the
> initial design dont't allow for it.

No question about that.  Now, I've done just enough BitKeeping to know
that, even though it's designed for distributed operation from the
beginning, it still kind of sucks at it.  For one thing, if somebody is
cloning, the whole repository is locked for the duration - any push to
the repository has to wait.  So much for elegance.

My strategy is: first propose the functionality, then see how it can be
distributed, without making any compromise to the local functionality.
Doing the whole design around distributed operation and then having to
apologize for nonintuitive behavior on the user side sucks too.

Another thing about BitKeeper: you have to do 'bk edit' before you edit,
otherwise, if you just chmod +w the file and edit away, it gets screwed
up, with no intelligible error messages.  Stupid.  Also, bk co and bk ci
are just stupid wastes of time, in my opinion.  The number one design
rule as far as I'm concerned is: you can edit your repository just like
a normal source tree.  It looks and acts just like a normal source tree.
The SCM takes care of the details for you.

Now... how to make that work.  First problem, how can the SCM tell the
difference between generated files and source files?  Should we just
give it a list of files to ignore, as for patch?  Is there a more
elegant way, perhaps in conjunction with the new kbuild code?  (And then,
how tied to the kernel is kbuild, and do we care about managing source
besides the kernel?)

As far as knowing what files the user is editing, and when to update the
SCM's db, we will use Linux's dnotify mechanism for that.  This part of
the design is under control I think, I'll provide a writeup in due course.

-- 
Daniel


From rasmus at jaquet.dk  Sat May  4 18:13:24 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Sat, 4 May 2002 10:13:24 +0200
Subject: [Prophesy] General questions
In-Reply-To: <E173jsq-0003bL-00@starship>
References: <20020501084526.B14505@jaquet.dk> <E173U0R-0002FE-00@starship> <20020503201528.GE1893@jaquet.dk> <E173jsq-0003bL-00@starship>
Message-ID: <20020504081324.GA1893@jaquet.dk>

On Fri, May 03, 2002 at 10:42:27PM +0200, Daniel Phillips wrote:
> > Perhaps you could explain a bit more here? This seems like something
> > we could work into an advantage.
> 
> Yes.  I'd like to somehow turn a 'patch' into an object managed by the
> SCM system.  So you'd carry not just multiple tree versions, but multiple
> patches forward, the way real live developers do.  Since developers do it,
> it must be possible to automate.  But every SCM I've looked at so far
> takes a view of the whole tree, a history of commits to it, and a tree of
> forks.  That is somehow not the whole story  There is, in reality, much
> more structure to parallel development than that.

I'm not quite following you here. I thought BK allowed you to have a
number of patches/changes in a tree, update the tree (from Linus)
and have the changes carried forward?

Or are you talking about carrying a selected patch set forward across 
multiple tree versions?

Or am I just not getting you? :)

> > If immediacy is a goal, then there is working code out there
> > already... But you know that.
> 
> Right, and ignoring it would be silly.  I'm working with BitKeeper now,
> Arch and Subversion are on my list, I'm looking at some of the power
> features of CVS, and looking at the literature.  But mainly, I'm thinking
> about things from first principles as is my habit.

If you find literature, do share. I'll do my best to get acquainted
with some of these tools as well.

[distributed discussion]
> My strategy is: first propose the functionality, then see how it can be
> distributed, without making any compromise to the local functionality.
> Doing the whole design around distributed operation and then having to
> apologize for nonintuitive behavior on the user side sucks too.

Agreed on both points.

> Another thing about BitKeeper: you have to do 'bk edit' before you edit,
> otherwise, if you just chmod +w the file and edit away, it gets screwed
> up, with no intelligible error messages.  Stupid.  Also, bk co and bk ci
> are just stupid wastes of time, in my opinion.  The number one design
> rule as far as I'm concerned is: you can edit your repository just like
> a normal source tree.  It looks and acts just like a normal source tree.
> The SCM takes care of the details for you.

Agreed on the design rule. 

> 
> Now... how to make that work.  First problem, how can the SCM tell the
> difference between generated files and source files?  Should we just
> give it a list of files to ignore, as for patch?  Is there a more
> elegant way, perhaps in conjunction with the new kbuild code?  (And then,
> how tied to the kernel is kbuild, and do we care about managing source
> besides the kernel?)

While we may make our lives a bit easier coupling the SCM tightly to
the kernel, lets not pretend that we magically get a well-controlled
build environment. E.g., kbuild 2.5 lets you build objects in the
source tree and somewhere somebody actually have a good reason for
doing that. My point: If we make crass assumptions like that, we
will get flamed.

But we could easily go for the 'normal' SCM angle of attack: Have
the user say 'these files' or 'this directory'. We could the also
offer 'all files in this dir for ever'.

> 
> As far as knowing what files the user is editing, and when to update the
> SCM's db, we will use Linux's dnotify mechanism for that.  This part of
> the design is under control I think, I'll provide a writeup in due course.

Yes, dnotify is an elegant way to notice this.

Rasmus


From phillips at bonn-fries.net  Sun May  5 00:37:45 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Sat, 4 May 2002 16:37:45 +0200
Subject: [Prophesy] General questions
In-Reply-To: <20020504081324.GA1893@jaquet.dk>
References: <20020501084526.B14505@jaquet.dk> <E173jsq-0003bL-00@starship> <20020504081324.GA1893@jaquet.dk>
Message-ID: <E1740fR-0003pP-00@starship>

On Saturday 04 May 2002 10:13, Rasmus Andersen wrote:
> On Fri, May 03, 2002 at 10:42:27PM +0200, Daniel Phillips wrote:
> > ...I'd like to somehow turn a 'patch' into an object managed by the
> > SCM system.  So you'd carry not just multiple tree versions, but multiple
> > patches forward, the way real live developers do.  Since developers do it,
> > it must be possible to automate.  But every SCM I've looked at so far
> > takes a view of the whole tree, a history of commits to it, and a tree of
> > forks.  That is somehow not the whole story  There is, in reality, much
> > more structure to parallel development than that.
> 
> I'm not quite following you here. I thought BK allowed you to have a
> number of patches/changes in a tree, update the tree (from Linus)
> and have the changes carried forward?
>
> Or are you talking about carrying a selected patch set forward across 
> multiple tree versions?

The second, and carrying multiple, possibly conflicting patches forward.
Logically, this is quite difficult, but in practice we do it all the time.
So right now I'm casting around for ways of looking at the problem.

> > > If immediacy is a goal, then there is working code out there
> > > already... But you know that.
> > 
> > Right, and ignoring it would be silly.  I'm working with BitKeeper now,
> > Arch and Subversion are on my list, I'm looking at some of the power
> > features of CVS, and looking at the literature.  But mainly, I'm thinking
> > about things from first principles as is my habit.
> 
> If you find literature, do share. I'll do my best to get acquainted
> with some of these tools as well.

Here's a random link I found earlier:

   http://citeseer.nj.nec.com/context/175867/0

> > Now... how to make that work.  First problem, how can the SCM tell the
> > difference between generated files and source files?  Should we just
> > give it a list of files to ignore, as for patch?  Is there a more
> > elegant way, perhaps in conjunction with the new kbuild code?  (And then,
> > how tied to the kernel is kbuild, and do we care about managing source
> > besides the kernel?)
> 
> While we may make our lives a bit easier coupling the SCM tightly to
> the kernel, lets not pretend that we magically get a well-controlled
> build environment. E.g., kbuild 2.5 lets you build objects in the
> source tree and somewhere somebody actually have a good reason for
> doing that. My point: If we make crass assumptions like that, we
> will get flamed.
> 
> But we could easily go for the 'normal' SCM angle of attack: Have
> the user say 'these files' or 'this directory'. We could the also
> offer 'all files in this dir for ever'.

There's another, more automatic way: start with a clean tree - at the time
the SCM begins to manage it, *all* files are under management (unless told
otherwise, and there would be a list of standard exceptions such as hidden
files).  Then, every file you create, for example, with an editor or cp,
will also be managed by the SCM.  The SCM will always know when a build is
in progress (somehow), and it will not manage files created during the
build process.

If you decide to start managing a source tree that's already been built
from, then some explicit 'import' step, specifying the files to be managed,
or alternatively, those not to be managed is going to be necessary.  It
would be nice to be able to express this partitioning as an arbitrary
logical relation, to be handled by the database, it's another way to
leverage the fact that we have a full database available.

Any ideas on how we can know automatically that a build is in progress?

> > As far as knowing what files the user is editing, and when to update the
> > SCM's db, we will use Linux's dnotify mechanism for that.  This part of
> > the design is under control I think, I'll provide a writeup in due course.
> 
> Yes, dnotify is an elegant way to notice this.

It's going to need some improving though, in order to be reliable.

-- 
Daniel


From rasmus at jaquet.dk  Sun May  5 00:49:35 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Sat, 4 May 2002 16:49:35 +0200
Subject: [Prophesy] General questions
In-Reply-To: <E1740fR-0003pP-00@starship>
References: <20020501084526.B14505@jaquet.dk> <E173jsq-0003bL-00@starship> <20020504081324.GA1893@jaquet.dk> <E1740fR-0003pP-00@starship>
Message-ID: <20020504144935.GG1893@jaquet.dk>

On Sat, May 04, 2002 at 04:37:45PM +0200, Daniel Phillips wrote:
> > Or are you talking about carrying a selected patch set forward across 
> > multiple tree versions?
> 
> The second, and carrying multiple, possibly conflicting patches forward.
> Logically, this is quite difficult, but in practice we do it all the time.
> So right now I'm casting around for ways of looking at the problem.

(I'm spelling this out in order to be sure I get your meaning.)
So we are talking about the situation where you have three trees,
all based on a linus one, called 'Daniel's pagethingies', 'rmap
refinements' and 'aa cleanup'. All three touch mm/page stuff
heavily and the rmap and aa ones are locally refined by you.

When Linus updates his tree, you would like to be able to update
the base tree and have all three local ones updated as well? And
does BK not do this for you? (Apparently not, but I fail to see
why not.)

> Any ideas on how we can know automatically that a build is in progress?

There is probably some kbuild (2.5) only files we could do heuristics
on but that would couple us thightly to that version of kbuild...

If we do this as a kernel-only SCM, we could also start out with a
dontdiff filter already in place.

> > Yes, dnotify is an elegant way to notice this.
> 
> It's going to need some improving though, in order to be reliable.

? Do elaborate.

Rasmus


From phillips at bonn-fries.net  Sun May  5 02:59:32 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Sat, 4 May 2002 18:59:32 +0200
Subject: [Prophesy] General questions
In-Reply-To: <20020504144935.GG1893@jaquet.dk>
References: <20020501084526.B14505@jaquet.dk> <E1740fR-0003pP-00@starship> <20020504144935.GG1893@jaquet.dk>
Message-ID: <E1742se-0003pu-00@starship>

On Saturday 04 May 2002 16:49, Rasmus Andersen wrote:
> On Sat, May 04, 2002 at 04:37:45PM +0200, Daniel Phillips wrote:
> > > Or are you talking about carrying a selected patch set forward across 
> > > multiple tree versions?
> > 
> > The second, and carrying multiple, possibly conflicting patches forward.
> > Logically, this is quite difficult, but in practice we do it all the time.
> > So right now I'm casting around for ways of looking at the problem.
> 
> (I'm spelling this out in order to be sure I get your meaning.)
> So we are talking about the situation where you have three trees,
> all based on a linus one, called 'Daniel's pagethingies', 'rmap
> refinements' and 'aa cleanup'. All three touch mm/page stuff
> heavily and the rmap and aa ones are locally refined by you.
> 
> When Linus updates his tree, you would like to be able to update
> the base tree and have all three local ones updated as well? And
> does BK not do this for you? (Apparently not, but I fail to see
> why not.)

I don't know exactly what happens when you pull from Linus's tree and you 
have incompatible changes in yours.  Please feel free to educate me here, as 
I am not an experienced BitKeeper user.

A major feature of the BitKeeper model that I don't like is the tree model.  
In fact, independent developers don't maintain their trees according to a 
strict heirarchy descending from a common parent.  It's more like a general
net, with developers exchanging bits and pieces with each other in an 
arbitrary graph.  Another way of looking at this is, two different developers 
should be able to download a Linux tarball from kernel.org, each make their 
own changes, then later decide they want to exchange certain changes with 
each other.  With Bitkeeper you'd have to mess around to make that happen - 
on or the other of the developers would have to clone the repository of the 
other and drop back to the patch way of doing things, to manually import 
their changes.  This is BS, we want such exchanges to be entirely natural.

Now, what I'm thinking about here has something to do with the traditional
LISP distinction between EQUAL and EQ, the latter being true when we 
establish that two things really are the *same* object, and don't just have
the same values.  In the LISP case, EQ basically means the addresses of the
objects are the same.  So we want our source tree to be made up of objects,
and each object will have an 'address', which I will call an 'id', which is
perhaps derived from the developer's email address, the date the source tree
first came under management, and an object sequence number (the latter
generated by a counter kept in the root of the source tree).  Objects have 
generations, each generation being a delta.  A 'default' object holds all 
thes ource in the tree that is not included in any other object.  Any state 
of the source tree can therefore be represented as a set of (object, 
generation) tuples.

When two developers wish to match up their trees and exchange object deltas 
(aka patch set, aka change set) in a precise way, the first thing we want to 
do is establish a correspondence between objects in the two trees.  This will 
be a list of tuples of the form (id, id), which establishes that the 
respective objects are EQ.

There are a variety of strategies we could use to build the correspondence.  
We don't have to stick to just one strategy or require it to be entirely 
automatic.  For example, we could consult an already-under-management version 
of our source tree, and import all the object IDs from it that correspond to 
objects that are found to be exactly EQUAL.  We can also do the equality 
tests, somewhat less efficiently (because we also have to partition the 
source into objects) between our two trees.  Or we can simply interpret each 
file as one object, for the purpose of building the correspondence, though we 
would still want to respect the way that the owners of the two trees have 
already partitioned their source into objects.

Next question: what is an object?  Can objects contain objects?  I would like 
an 'change' to be an object, that is, I would like to be able to see how a 
patch evolves, just as we can see how the underlying source evolves as well.

Bitkeeper question: once we apply a changeset to a source base, does the 
Bitkeeper database continue to maintain the identity of the changset as we 
carry the source base through subsequent reversions?  I.e., will Bitkeeper 
let us talk about the 'htree' patch, and let the thing evolve along with the 
source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 
version, etc?

-- 
Daniel


From rasmus at jaquet.dk  Mon May  6 18:39:03 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Mon, 6 May 2002 10:39:03 +0200
Subject: [Prophesy] General questions
In-Reply-To: <E1742se-0003pu-00@starship>; from phillips@bonn-fries.net on Sat, May 04, 2002 at 06:59:32PM +0200
References: <20020501084526.B14505@jaquet.dk> <E1740fR-0003pP-00@starship> <20020504144935.GG1893@jaquet.dk> <E1742se-0003pu-00@starship>
Message-ID: <20020506103903.C13935@jaquet.dk>

On Sat, May 04, 2002 at 06:59:32PM +0200, Daniel Phillips wrote:
> I don't know exactly what happens when you pull from Linus's tree and you 
> have incompatible changes in yours.  Please feel free to educate me here, as 
> I am not an experienced BitKeeper user.

I'll have to try this. I think that my current third-hard impressions
are too vague for this.

> 
> A major feature of the BitKeeper model that I don't like is the tree model.  
> In fact, independent developers don't maintain their trees according to a 
> strict heirarchy descending from a common parent.  It's more like a general
> net, with developers exchanging bits and pieces with each other in an 
> arbitrary graph.  Another way of looking at this is, two different developers 
> should be able to download a Linux tarball from kernel.org, each make their 
> own changes, then later decide they want to exchange certain changes with 
> each other.  With Bitkeeper you'd have to mess around to make that happen - 
> on or the other of the developers would have to clone the repository of the 
> other and drop back to the patch way of doing things, to manually import 
> their changes.  This is BS, we want such exchanges to be entirely natural.

The BK equivalent of two different developers downloading source from
kernel.org would be cloning local repositories from linus', no? Then
it seems like the above reduces to BK interfacing with the outside
(non-BK) world, which is then problematic? Or am I missing something?

AFAIK, the strict 'tree' view of the revision history is due to the
fundamental distributed view of the process here, with all trees
being seens (designwise) as a replicas of a distributed filesystem.

> Now, what I'm thinking about here has something to do with the traditional
> LISP distinction between EQUAL and EQ, the latter being true when we 
> establish that two things really are the *same* object, and don't just have
> the same values.  In the LISP case, EQ basically means the addresses of the
> objects are the same.  So we want our source tree to be made up of objects,
> and each object will have an 'address', which I will call an 'id', which is
> perhaps derived from the developer's email address, the date the source tree
> first came under management, and an object sequence number (the latter
> generated by a counter kept in the root of the source tree).  Objects have 
> generations, each generation being a delta.  A 'default' object holds all 
> thes ource in the tree that is not included in any other object.  Any state 
> of the source tree can therefore be represented as a set of (object, 
> generation) tuples.

I agree on this object view, even though we should call them closures,
in proper LISP terminology :) Warning, this is about it for my LISP
knowledge.

[More object think]

I agree on this. I'll have to think a bit about this in order to
wrap my brain around it.

> Next question: what is an object?  Can objects contain objects?  I would like 
> an 'change' to be an object, that is, I would like to be able to see how a 
> patch evolves, just as we can see how the underlying source evolves as well.

I think objects can contain objects. That way merges can be objects too,
containing the objects merged as its revision history.

> 
> Bitkeeper question: once we apply a changeset to a source base, does the 
> Bitkeeper database continue to maintain the identity of the changset as we 
> carry the source base through subsequent reversions?  I.e., will Bitkeeper 
> let us talk about the 'htree' patch, and let the thing evolve along with the 
> source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 
> version, etc?

I dont think so but it is a BS guess. One problem would be later
changes modifying htree code, making extraction of the htree patch
difficult.

Another question is your described effort in managing change sets
for your patches. Could you decribe this a bit so we could see if
we could envision something to make that easier?

Rasmus


From rasmus at jaquet.dk  Tue May  7 04:00:36 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Mon, 6 May 2002 20:00:36 +0200
Subject: [Prophesy] General questions
In-Reply-To: <20020506103903.C13935@jaquet.dk>
References: <20020501084526.B14505@jaquet.dk> <E1740fR-0003pP-00@starship> <20020504144935.GG1893@jaquet.dk> <E1742se-0003pu-00@starship> <20020506103903.C13935@jaquet.dk>
Message-ID: <20020506180035.GA1669@jaquet.dk>

On Mon, May 06, 2002 at 10:39:03AM +0200, Rasmus Andersen wrote:
> I'll have to try this. I think that my current third-hard impressions
> are too vague for this.

That would be 'third-hand', of course.

> > Next question: what is an object?  Can objects contain objects?  I would like 
> > an 'change' to be an object, that is, I would like to be able to see how a 
> > patch evolves, just as we can see how the underlying source evolves as well.
> 
> I think objects can contain objects. That way merges can be objects too,
> containing the objects merged as its revision history.

That was perhaps a bit too short: I was thinking of a change/patch as an
object. A patchset would then be a collection of objects, still with
some of the properties of the basic object (comments, generations), and
a merge would then be a new patchset with the aggregated/resolved
constituting patchsets, comments etc.

> > Bitkeeper question: once we apply a changeset to a source base, does the 
> > Bitkeeper database continue to maintain the identity of the changset as we 
> > carry the source base through subsequent reversions?  I.e., will Bitkeeper 
> > let us talk about the 'htree' patch, and let the thing evolve along with the 
> > source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 
> > version, etc?
> 
> I dont think so but it is a BS guess. One problem would be later
> changes modifying htree code, making extraction of the htree patch
> difficult.

Another clarification: The first sentence above goes for the BK
question. The rest is more general; how to keep conflicting
patches seperate?

> 
> Another question is your described effort in managing change sets
> for your patches. Could you decribe this a bit so we could see if
> we could envision something to make that easier?

Rasmus


From phillips at bonn-fries.net  Wed May 29 03:05:03 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Tue, 28 May 2002 19:05:03 +0200
Subject: [Prophesy] Postgres and Python
Message-ID: <E17CkPA-0004dF-00@starship>

Hi all (especially Rasmus),

I have not forgotten about our design project here, in case you were 
wondering.  In fact, in light of recent developments, specifically, the 
threat of further enroachment of commercialism on core open source projects 
leads me to believe more than ever that we must follow through on what we set 
out to do.  (And I will remark here, that my attitude is pro-commerce in the 
sense that core open source projects are a commons on which even commercial 
users rely.  It is in the interest of commerce, as well as lovers of 
intellectual freedom, to protect our core projects.)

So today's topic is Postgres.  We must leverage our advantages, and having 
free use of a full SQL database that we can, if we need to, customize in any 
way we want, is one of them.  So I have been working with postgres and 
python, to see how well they hold up together.  The answer is 'very well'.

There are a number of packages that provide Python access to Postgres.  All 
these packages implement a standard database interface class called the
"Python Database API Specification":

   http://www.python.org/topics/database/DatabaseAPI-2.0.html

Basically, this lets you pass SQL query strings to a database, provides 
convenient methods for retrieving the results, and miscellaneous functions 
for controlling such things as commit/rollback.  The particular 
implementation that worked for me is a package called "psycopg".  For Debian 
users:

   apt-get install python2.2-psycopg.

Here's a sample session:

su postgres         # normally the postgres superuser, can create other users
createdb mydb       # somewhere to start
psql                # make sure everything worked
\q                  # out of here
python
>>> import psycopg
>>> db=psycopg.connect("dbname=mydb")
>>> cursor = db.cursor
>>> db.cursor().__methods__
>>> cursor.execute("CREATE table foo(bar int, zot date)")
>>> cursor.execute("INSERT INTO foo VALUES (123, '1/2/2002')")
>>> cursor.execute("INSERT INTO foo VALUES (456, '2/4/2002')")
>>> cursor.execute("select * from foo")
>>> data=cursor.fetchone()
>>> print data
>>> print cursor.fetchall()
>>>

(caveat: I haven't actually tried this example to make sure it works)

I found psycopg (psycho pig?) very nice to work with.  As far as complaints 
go, there is good support for result retrieval, but no support for data 
insertion - it seems, you just make up SQL strings containing the data and 
submit them.  This needs to be strengthened.  Good thing we have the source, 
right?

Note that the Python db interface does not tie you to SQL, however, all the 
SQL strings you have to write in order to get anything done certainly do tie 
you.  So, in my opinion this all has to be abstracted more, and every 
application should start off by doing that.  Oh well, the fact remains that 
this is an excellent place to get started, and Python with this package is a 
far more capable interactive interface to a db than, for instance, psql is, 
or a graphical database shell would be.

-- 
Daniel


From phillips at bonn-fries.net  Thu May 30 08:44:20 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Thu, 30 May 2002 00:44:20 +0200
Subject: [Prophesy] String transformations
Message-ID: <E17DCB3-0006pb-00@starship>

I'm still not sure that it's a good idea to build a whole SCM from scratch, 
however, *if* I was going to do that I'd start with some good basic 
operations, combined with some notion of how they fit in with the grand 
scheme.

As far as the grand scheme goes, the idea is that we will maintain a database 
of differences between predecessor versions of files.  We will also maintain 
information about how the files were created, deleted and moved around in the 
source tree, but that's not the subject of today's post.  Instead, I'll focus 
on how the differences between files are to be maintained.  And I'm going to 
get way more detailed than perhaps is justified at this stage, just because I 
feel that way today.  In other words, watch out: bit bashing alert.

I'm looking at each file as a binary string.  To get a precessor version of a 
file, we retrieve some kind of transformation information from the database 
and apply it to the string consisting of the current contents of the file, 
yielding the immediate precessor of the file.  Of course, every SCM does this 
in some way, nonetheless, I feel compelled to invent yet another way of doing 
it.

My philosophy is that these string transformations are exact, and so there 
exist far more efficient and general means of encoding and applying the 
needed transformations than, for example, diff and patch.  Not to mention 
simpler.  So let me jump straight into my detailed design.

We can see the process of transforming of any binary string into any other as 
a sequence of simple operations carried out on an input and output string.  
Working from left to right in the input string, the following three 
operations are sufficient:

   text - append literal text to the destination string
   copy - append bytes from the source to destination string
   skip - skip over bytes in the source string

Additionally, we may wish to optimize the case where a transformation simply 
moves text from one place to another:

   move - append text from an arbitrary location in the input string

To see why the move operation is desirable, consider that without it, moving 
a block of text from one place to another in a file results in the block of 
text needing to be recorded in the database, whereas using the move 
operation, that same block of text can be obtained from the current version 
of the file.  This represents a significant space savings.  Presumeably, the 
transformation strings are to be compressed when stored in the database, but 
since the original version of the file is not encoded in the database, it is 
not possible to rely on compression to avoid encoding the moved block of text 
literally.  My goal, therefore, is to design the transformation encoding in a 
way that works well together with compression.

Each of the above operations takes a count parameter, specifying how many 
bytes to append to the output string.  Additionally, the 'text' operation 
takes 'count' bytes of literal text, and the 'move' operation takes an 
parameter specifying where the text to be copied is located in the input 
string.

Now I need to encode these operations in a way that is hopefully compact and 
easy to evaluate.  I observed that in each case I have a count and an 
operation, so I would like to encode that in a single number.  I also have to 
worry about the range of that number: I'd like to encode most of the 
command/count values in a single byte.  To accomplish this, I introduce an 
additional primitive operation, which supplies some high order bits for the 
count of a following operation:

   high - supply additional high order count bits for a following operation

Since I am a miser with memory, I decided to allocate only two bits for the 
command encoding:

  0 = text
  1 = copy
  2 = skip
  3 = high

Expressed as macros, we have:

   text(n)
   copy(n)
   skip(n)
   high(n)

which generates an operation code by shifting the operation number into the 
high order two bits of parameter 'n', which is thus limited to six bits.  The 
high operation can appear several times in a row, each time an additional six 
bits, with the most significant bits appearing first.

The move operation cannot be accomodated in this scheme, and needs some 
different encoding.  That's ok, the move operation is different anyway in 
that it needs two numeric parameters.  Fortunately, there are a number of 
possible operations that are no-ops when the numeric parameter is zero, and 
these are thus available for use as escape codes.  My plan is to encode the 
move operation - which is only an optimization - as a triple:

  copy(0), copy(position), copy(count)

That's enough on 'move' for now.  I did not implement it, but I did 
accomodate it in the design.

Additionally, text(0) indicates the end of the transformation sequence.  That 
leaves two escape codes, skip(0) and high(0), for future expansion, the 
latter being available only when it appears in leading position, since 
high(0) may well appear in the less significant bytes of a large numeric 
parameter.

Leaving out 'move', we end up with a simple implementation:

void transform(uchar *ops, uchar *src, uchar *dst)
{
	unsigned c, count = 0;

	while ((c = *ops++)) {
		count = (count << 6) | (c & 0x3f);
		switch (c >> 6) {
		case textop:
			memcpy(dst, ops, count);
			dst +=count;
			ops += count;
			count = 0;
			break;
		case copyop:
			memcpy(dst, src, count);
			dst +=count;
		case skipop:
			src +=count;
			count = 0;
			break;
		}
	}
}

The 'ops' string controls the transformation of 'src' into 'dst'.  For 
example, the ops string:

   copy(2), skip(4), text(6), "foobar", copy(5), 0

Transforms the input string:

   "I love lucy"

into the output string

   "I foobar lucy"

(Note that we could easily express the above in terms of stream IO 
operations, since all operations are sequential.  However, it's doubtful 
whether there is any need to do that on modern machines, and in any event, 
the move operation would present something of a problem.)

Given an operation string, we can compute the length of both the input and 
output strings, as follows:

struct transinfo {int in; int out;} transcheck(uchar *ops)
{
	unsigned c, count = 0, ilen = 0, olen = 0;

	while ((c = *ops++)) {
		count = (count << 6) | (c & 0x3f);
		switch (c >> 6) {
		case textop:
			olen +=count;
			ops += count;
			count = 0;
			break;
		case copyop:
			olen +=count;
		case skipop:
			ilen +=count;
			count = 0;
			break;
		}
	}
	return (struct transinfo) {ilen, olen};
}

This function ought to take the length of the operation string as a parameter 
as well, and ensure that the termination of the sequence occurs at exactly 
that length.

Given an input string and a transformation string, we can compute the inverse 
transformation string that converts the resulting output string back to the 
input string.  This interesting exercise is left to the reader ;-)

Demonstration code attached.

-- 
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transform.c
Type: text/x-c
Size: 1384 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/prophesy/attachments/20020530/a437a35b/attachment.bin>

From phillips at bonn-fries.net  Fri May 31 11:50:39 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Fri, 31 May 2002 03:50:39 +0200
Subject: [Prophesy] Binary data and Postgres/SQL
Message-ID: <E17DbYu-0007jF-00@starship>

Today I'm thinking a little about how best to interface the scm to the 
database, particularly how to get the binary transforms[1] in and out.  It 
seems to me the sql COPY command does what we need:

  http://postgresql.org/users-lounge/docs/7.2/postgres/sql-copy.html

I don't like the idea at all of forming such data into ascii strings, as part 
of an INSERT command.  So the proposed strategy is the place the data in a 
temporary file and issue a COPY command.  A ramfs mount will do nicely for 
this.

We can use this technique to extend Python's database interface to handle 
inserts in a way similar to retrieves, if we want, and for databases that 
can't handle such a thing, the interface can fall back to forming the ascii 
INSERT string, as you must do now anyway.  Of course, improving the Python 
interface isn't our immediate concern, it's just nice to know that we can.

In my opinion, the INSERT command should only ever be used for data that is 
ascii by nature, such as parameters read from a config file or input by a 
user.

[1] See yesterday.

-- 
Daniel


From phillips at bonn-fries.net  Fri May 31 18:00:02 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Fri, 31 May 2002 10:00:02 +0200
Subject: [Prophesy] Improved string transformation
In-Reply-To: <E17DCB3-0006pb-00@starship>
References: <E17DCB3-0006pb-00@starship>
Message-ID: <E17DhKM-0007rM-00@starship>

Today I added support for the 'move' operation to the string transforma, 
roughly doubling the size of the state network, leading me to reflect on how 
much a slight irregularity in an encoding scheme can bloat up an 
implementation.  Oh well, it's still reasonably tight and efficient, and it 
is not going to grow more any time in the near future, except to add more 
error checking in the transinfo function.

The idea is that transform itself will have little or no error checking.  We 
will always have run transinfo in the operation string sometime before we run 
the transform.  If a transform is to be stored in the database, we will also 
store the lengths of the input and output strings, as calculated by transinfo 
and checked against the known lengths.  Yes, this is micro-optimizing, but I 
like to keep the low level things light and tight, it makes me feel better.

Notice how the three primitive operations skip, text and copy map onto the 
diff codes '+', '-' and ' '.  This is no accident, these are in fact the same 
thing, just more loosely expressed, in human-readable form.  Which leads to 
the observation that we can start generating transform strings without a 
whole lot of effort by converting diff files.  This is indeed something we 
want to do, even after we have code for generating transform strings directly.

On the general theme of using the power tools available, I'm thinking about 
generating a Bison parser to parse diff files into transforms, and doing the 
job properly.

Now that move is done, the transform engine is pretty much complete.  Move
wasn't too hard to implement, but it will be a little tricky to generate code 
for.  That's ok, the transform generator can stick with the three simple 
operations as long as it likes, since the move operation is nothing more than 
a space-saving optimization.  This is on the theme of forward compatibility.  
In this case we can upgrade the transform generator at any time, and new 
databases can will be hancled by early versions of the system.  This is a 
nice result when you can get it.

The attached code demonstrates the move operation in action.

--
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transform.c
Type: text/x-c
Size: 2248 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/prophesy/attachments/20020531/df960684/attachment.bin>

From rasmus at jaquet.dk  Fri May 31 18:28:06 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Fri, 31 May 2002 10:28:06 +0200
Subject: [Prophesy] Re: Improved string transformation
In-Reply-To: <E17DhKM-0007rM-00@starship>; from phillips@bonn-fries.net on Fri, May 31, 2002 at 10:00:02AM +0200
References: <E17DCB3-0006pb-00@starship> <E17DhKM-0007rM-00@starship>
Message-ID: <20020531102806.B3135@jaquet.dk>

On Fri, May 31, 2002 at 10:00:02AM +0200, Daniel Phillips wrote:
> Today I added support for the 'move' operation to the string transforma, 
> roughly doubling the size of the state network, leading me to reflect on how 
> much a slight irregularity in an encoding scheme can bloat up an 
> implementation.  Oh well, it's still reasonably tight and efficient, and it 
> is not going to grow more any time in the near future, except to add more 
> error checking in the transinfo function.

Hi Daniel.

Trying to reply to you is like hitting a moving target :) This is a
shoret one; I'm otherwise occupied. BTW: I'm on the prophesy list;
no need to cc me seperately.

It seems to me that we (you) are attacking some of the lower parts of
an SCM before looking at the higher level ones. I dont think that
would lead you current efforts to be wasted but sometimes I feel
comfortable having thought roughly about things before doing them.
Of course, that often also leads me to sit on my hands all day.

Anyway, some higher level concers I can list off the top of my
head would be:

o branches
o merges
o distribution
o providing usable change overviews and groupings based on dnotify 
  recorded changes


This is a terse list and, as I tried to imply, it is probably
independent of what you are doing now.

If nothing else comes of this mail, take it as a reassurance
that somebody out here is actually reading your mails :)

Regards,
  Rasmus


From phillips at bonn-fries.net  Fri May 31 20:05:26 2002
From: phillips at bonn-fries.net (Daniel Phillips)
Date: Fri, 31 May 2002 12:05:26 +0200
Subject: [Prophesy] Re: Improved string transformation
In-Reply-To: <20020531102806.B3135@jaquet.dk>
References: <E17DCB3-0006pb-00@starship> <E17DhKM-0007rM-00@starship> <20020531102806.B3135@jaquet.dk>
Message-ID: <E17DjHi-0007sN-00@starship>

On Friday 31 May 2002 10:28, Rasmus Andersen wrote:
> On Fri, May 31, 2002 at 10:00:02AM +0200, Daniel Phillips wrote:
> > Today I added support for the 'move' operation to the string transforma, 
> > roughly doubling the size of the state network, leading me to reflect on how 
> > much a slight irregularity in an encoding scheme can bloat up an 
> > implementation.  Oh well, it's still reasonably tight and efficient, and it 
> > is not going to grow more any time in the near future, except to add more 
> > error checking in the transinfo function.
> 
> Hi Daniel.
> 
> Trying to reply to you is like hitting a moving target :) This is a
> shoret one; I'm otherwise occupied. BTW: I'm on the prophesy list;
> no need to cc me seperately.
> 
> It seems to me that we (you) are attacking some of the lower parts of
> an SCM before looking at the higher level ones.

Oh yes, very much so.  I like to do that, just to help get my mind wrapped 
around the problem.  It's especially nice when you can see a little part of 
the problem that breaks out and doesn't depend a lot on the high level 
design.  It's like memcpy, you don't have to think too much about the details 
of the applications are going to use it, just make it go as fast as possible 
and have as simple a form as possible.

> I dont think that
> would lead you current efforts to be wasted but sometimes I feel
> comfortable having thought roughly about things before doing them.

You gotta speak up ;-)

> Of course, that often also leads me to sit on my hands all day.

Judging by your previous work, I'd say that's a slight exaggeration.

> Anyway, some higher level concers I can list off the top of my
> head would be:
> 
> o branches
> o merges
> o distribution
> o providing usable change overviews and groupings based on dnotify 
>   recorded changes
> 
> This is a terse list and, as I tried to imply, it is probably
> independent of what you are doing now.

Indeed.  I'm mainly thinking about one are that's very important to me, 
personally, and isn't on your list, and that is: editing.  I have this idea 
that the fact you're using a SCM should be nearly totally transparent.  You 
just edit your files and the SCM takes can of making sure that nothing is 
every forgotten.  There are still a few more pieces of that puzzle to put in 
place, but as soon as it gets there, we *already have something useful*.

That said, let me do a little musing on the points you mentioned, which are 
also very important.  I'd like to try to see all of the points you mentioned 
as high level database problems and get a some primitives in place to help us 
think about what we can do at the high level.  So - soon we will have nice 
fast transforms, and we already have the idea that the transforms are applied 
backwards, starting from the current version on disk.  Since the transforms 
are fast, we can get lazy and apply an awful lot of them to do certain 
things, i.e., to make old versions of the code materialize quickly, for 
further editing, or for comparison against other versions.

I'm just going to throw some of my random thoughts on the table.  Don't 
assume any of the following is correct, it's just a starting point for 
discussion.

> o branches

First we should think about tree nodes.  A tree node is any place that we 
have set a checkpoint, that is, a tree state that we can restore.  Between 
any two nodes - and that includes nodes on different branches - we have or 
can compute a delta.  In general, we will use the structure of the tree to 
compute the delta.  A branch is something defined by the user.  It's 
simply a name that gets carried from node to node, as the user sets 
checkpoints, along with an incrementing generation number.  Sometimes the 
tree will fork, starting a new branch (duh).

At one of the tree nodes we will find the current copy of the source code, 
that is, the copy on disk.  It does not have to be at the end of a branch, 
it can be anywhere.  That's why it's important to be able to invert 
transformations quickly.  (Hey, when is somebody going to rise to my 
challenge of stating the algorithm for generating an inverse transform?)  For 
safety's sake we probably want to leave 'cached' copies of tips of branches 
somewhere on disk or in the database, so that we don't have to completely 
rely on the transform machinery to get us from node to node and back again.  
A related idea is that we sometimes want to have two nodes of the tree 
expressed on disk at the same time.

There are, of course, many different ways of expressing the delta between two 
nodes in terms of particular transformations.  This fact is what makes all of 
this interesting.

> o merges

A merge is the process of applying a subset of transformations that express 
the delta between a pair of nodes to some other node.  We need to apply some 
extra constraints to the transformations involved, for example, instead of 
just skipping text, we normally will want to ensure that the text skipped in 
the original node of the delta is the same as the text skipped in the target 
node.  In order to accomplish that, we may need to alter the transformations 
in various ways.  As with good old patch, we will sometimes need to refer to 
context to decide how to change the transformations.

Merging is related to branching in much the same way that integration is 
related to differentiation.  We may want to borrow some of the same 
techniques that are used for automatic symbolic integration.

And there is a very short - too short - treatment of merging.

> o distribution

I'm having a few glimmers of ideas about what to do there.  I think I 
mentioned one already - each node will be partitioned into regions, each of 
which will have an id which is unique in the universe - more or less 
(incorporating email addresses in the ids makes this come true for practical 
purposes).  Anyway, essentially, we want to do a merge between two branches 
in two separate repositories, so I think what we want to do is first create, 
in the destination repository, clones of the two nodes in the source 
repository that generated the set of transformations we have decided we want 
to send.  Then a normal merge is done in the destination repository.  Easy?  
Original for sure.

Now, the thing about those cross-repository clones is that we don't want to 
actually send the whole tree.   We want only to send the objects that the 
destination repository doesn't already have, and this is where the universal 
object ids come in.  We are not going to necessarily rely on common parentage 
to establish the equivalence of two objects - we will sometimes compare the 
objects, and decide that they are actually the same object.  To speed this up 
over a remote link, we can just compare hashes of objects.  The result of 
such objects deemed to be equivalent is that we will set up a mapping between 
to the two repositories that expresses the equivalence of objects, and we 
will also allow either of the repositories to rename any particular object so 
that it is exactly equivalent.  We can call this process 'melding'.  (Hey, 
time to start writing patents.  Well anyway, let me extract a promise right 
now, that by staying on this list you are making a promise to me to respect 
the confidence of this work until we release it publicly.  We do not want 
certain purveyors of close source software going off and writing patents on 
the work we're doing.  Given all that has happened recently, very little 
would surprise me any more.)

> o providing usable change overviews and groupings based on dnotify 
>   recorded changes

I'm now thinking that dnotify is the wrong model, and what we really want to 
to mount a magic filesystem over the mount point of the directory we want to 
manage.  The magic filesystem will trap all the file changes and call the the 
scm, then call the real filesystem.  Very simple.

As far as change overviews go, I think I'm a long way from even thinking 
about that.  A lot more of the basic ideas have to be in place first.  Having 
a full database around that we can do arbitrary queries on should help quite 
a lot.

If you have some specific ideas, don't be shy...

> If nothing else comes of this mail, take it as a reassurance
> that somebody out here is actually reading your mails :)

Oh, I know you are.  Everybody gets busy.  I wasn't even sure I was going to 
go through with this, but now I think I am - I can see something that already 
works, and in a cool and unique way - not too far off.

-- 
Daniel


From rasmus at jaquet.dk  Fri May 31 21:36:00 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Fri, 31 May 2002 13:36:00 +0200
Subject: [Prophesy] CM site
Message-ID: <20020531133600.B4082@jaquet.dk>

I haven't had the time to look closer, but this site _might_
have something interesting in their 'papers' section.

http://www.cmtoday.com/yp/configuration_management.html


Rasmus


From rasmus at jaquet.dk  Fri May 31 21:48:28 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Fri, 31 May 2002 13:48:28 +0200
Subject: [Prophesy] CM site
In-Reply-To: <20020531133600.B4082@jaquet.dk>; from rasmus@jaquet.dk on Fri, May 31, 2002 at 01:36:00PM +0200
References: <20020531133600.B4082@jaquet.dk>
Message-ID: <20020531134828.C4082@jaquet.dk>

On Fri, May 31, 2002 at 01:36:00PM +0200, Rasmus Andersen wrote:
> I haven't had the time to look closer, but this site _might_
> have something interesting in their 'papers' section.
> 
> http://www.cmtoday.com/yp/configuration_management.html

But looking closer, there is not. Sorry for going off to
fast.

Rasmus


From rasmus at jaquet.dk  Fri May 31 21:59:41 2002
From: rasmus at jaquet.dk (Rasmus Andersen)
Date: Fri, 31 May 2002 13:59:41 +0200
Subject: [Prophesy] Re: Improved string transformation
In-Reply-To: <E17DjHi-0007sN-00@starship>; from phillips@bonn-fries.net on Fri, May 31, 2002 at 12:05:26PM +0200
References: <E17DCB3-0006pb-00@starship> <E17DhKM-0007rM-00@starship> <20020531102806.B3135@jaquet.dk> <E17DjHi-0007sN-00@starship>
Message-ID: <20020531135941.D4082@jaquet.dk>

On Fri, May 31, 2002 at 12:05:26PM +0200, Daniel Phillips wrote:
> I'm just going to throw some of my random thoughts on the table.  Don't 
> assume any of the following is correct, it's just a starting point for 
> discussion.

Mine is going to be equally random, just much shorter :) I'll try to 
think a bit more about this later.

> > o providing usable change overviews and groupings based on dnotify 
> >   recorded changes
> 
> I'm now thinking that dnotify is the wrong model, and what we really want to 
> to mount a magic filesystem over the mount point of the directory we want to 
> manage.  The magic filesystem will trap all the file changes and call the the 
> scm, then call the real filesystem.  Very simple.
> 
> As far as change overviews go, I think I'm a long way from even thinking 
> about that.  A lot more of the basic ideas have to be in place first.  Having 
> a full database around that we can do arbitrary queries on should help quite 
> a lot.

Like with dnotify, I think that the grouping and manageability of
changes coming through a magic FS is going to suffer. And I think
that this is one of the cardinal weak points in CVS, and thusly
one where we should aim for being strong.

But I have no good ideas on how to handle this and still get
transparency.

Rasmus