[Prophesy] comments so far

Sat Jun 15 05:03:00 EST 2002

On Friday 14 June 2002 20:06, Martin Pool wrote:
> I agree with you about the usefulness of editor undo chains.  Under
> emacs, I have kept-new-versions set to about 10, and I regularly use
> C-u C-x C-s to do "keep backup version" and diff-backup.  All very
> nice and useful.
> 
> A filesystem that kept all versions would allow you to do this in a
> program-neutral way, although I think that's not so important now that
> almost all the GNU tools understand foo.c.~1~ backups.
> 
> However, it has the same problem that the results are largely lacking
> semantics.  For example, looking back through the history of all
> modifications to a directory, it seems impossible to tell which
> versions of the source will actually compile correctly, and which were
> intermediate versions that don't work.

That we can solve by integrating with the build tool a little.  Every 
successful build marks a milestone in the Prophesy journal (not the same as a 
version).

> If a program commits
> early-and-often to CVS (say), but at least runs the test suite first,
> then you have in general some guarantee about the internal consistency
> of any committed version.  (It would be even better if CVS versions
> were module-wide, like in Subversion.)

Could you elaborate on this module-wise property?  I must have missed it 
while examining Subversion.

> A magic filesystem is "mere mechanism".  I don't think you should be
> spending so much time on it until you have a good design for the
> version-control system built on top.

I totally disagree.  I don't think you can build a tower on a bed of jello.  
The intrastructure is mere mechanism in the same sense that the operating 
system is mere mechanism: it defines what you can and can't do with the 
machine.

> If it turns out that the design "on top" is no better than CVS, then
> nobody will bother -- people who want neat features will use Bk (or a
> free clone), and more conservative people will use CVS.
> 
> You've said that you need to be able to cope without the filesystem --
> why not first implement the version without it, and then put it in as
> a nicety later?

Oh absolutely, I've stated that already, earlier in the archives.

> The same functions can be adequately (perhaps not quite as well)
> achieved using editor undo, editor backups, or tux2fs.  

Now wait, let's not confuse these things.  The magic filesystem only does one 
thing: sends overwritten text to a userspace daemon to be added to the change 
database.  Well, it notifies creates, deletes and truncates as well, but 
that's it.

> If the design can sensibly handle many small revisions then it would
> be easy to have a program called by the editor on save that commits to
> it.  If the design can't handle a huge number of revisions in a
> sensible way, then it doesn't matter how they get generated.

The current plan is to call out to the editor from Python, which will save 
the file contents beforehand.  This is just for testing.

> > I believe you're right, so long as SCM systems stay as clumsy as they are.  
> > If the archive system was actually easy and transparent to use, then 
> > programmers would use it as a tool for themselves, as a means of tracking 
> > multiple projects they're involved in, and trying out experiments.  In much 
> > the same way as we now rely on the undo chain in an editor - I do that, don't 
> > you?  That is, I rely on the editor's undo chain to back me out of failed 
> > experiments.  It gets to the point where I'm reluctant to shut down the 
> > machine because of all the state saved in the editor's undo chains.  Now, 
> > that's a system that works, but it's got glaring imperfections, beyond the 
> > fact that the state disappears when the editor shuts down.  The editors also 
> > don't know about each other, and they are incapable of maintaining undo 
> > chains across different files, let alone projects.
> 
> This is the perfect example of why semantic information is necessary.
> Pressing C-_ repeatedly until it looks about right is error-prone and
> labour intensive -- more than anything else, this limits the
> usefulness of editor undo.  For fixing small mistakes it's good, but
> for backing out of hour-long experiments it seems useless to me.  I
> don't want to say "undo edit" a hundred times; I want to say "back up
> to before I started working on this feature".

Right, unless you forgot to put down any kind of marker before you started
the session.  We can put down various kinds of markers in the journal to
help you be lazy here, including timestamps.  Furthermore, we can maintain
global undo/redo not as a single chain, but as a tree, like a version tree
which only gets pruned when you are absolutely sure you don't want to undo
any more.

> Ideally, I can have several trees around.  (Disk is cheap.)  Instead of
> rolling back, just toss that directory tree on the floor so I can find
> it later if I want to see what it was that I tried.

I don't know about you, but I often end up with trees sitting around and
I haven't got a clue what's in them and why they're there.  I always keep
a clean version of the tree around just for this reason: so I can diff
the mysterious tree and find out what's in it.  Prophesy should automate
this, and in addition, should hold some helpful metadata such as nicely
chosen version tags.

> > Granted, the SCM is also a tool for communication, but much good work has 
> > already been done there.  I think the distributed side of things is well 
> > known and under control,
> 
> I think current SCMs are not nearly as good as they should be.  Bk is
> the only decent distributed one, which is why it's doing so well.  

BitKeeper is very strong on the maintainer side, not so strong on the
submitter side.  This makes sense, as it was pitched to maintainers, and
in fact, that's were the big bottlenecks were.  I'm interested in doing
a better job on the developer side, which seems like virgin territory to
me.  I mean, how often do you hear the word 'usability' in connection
with source code management?

> > but today's crop of scm's still suck as development tools.  So
> > that's where I'm concentrating.
> 
> Do you mean they're not very helpful for the individual developer?
> What kind of thing?

There is too much fiddling with commands.  Every time you want to edit
a file you have to remember to check it out, and if you happen to be
thinking about an actual problem you were trying to solve at the time
the need arose, chances are your thought will vanish as you go through
the mechanics of checking out the needed file.  There are other rough
spots too, such as BitKeeper's insistance on adding an additional level
to the top of your tree.  I also find all those SCCS files peppered
through my source tree an ugly blemish.  Putting a tree under
management is an unecessarily complex project, and you have to submit
to a strip search.  CVS I won't even get into, nobody uses it locally
and you know why.

> > This was addressed earlier in an earlier post.  In the current version, every 
> > change to each file is recorded (and in order, giving you global undo, 
> > including undeletes) but when you close the version, the stacked changes are 
> > collapsed into a single layer of changes for the version.  To put it another 
> > way, the system journals individual changes, but (unless you tell it 
> > otherwise) only for the current version.
> 
> I disagree with this too :-)  
> 
> SCM shouldn't ever throw away information; it should only selectively
> roll it up for display.  Once you've captured a diff it should be kept
> forever.  Seeing the order in which edits within a version were made
> might possibly be helpful in the future.  

Sure, your edits can all be written to the journal, and that could
even be the default.  The journal is not the same as the version tree;
in the version tree we want to record only fully collapsed diffs
between versions.

> For example, consider the case in which a version consists of me
> taking a patch from somebody, and then fiddling things a bit to make
> it merge properly.  From one point of view, those changes have to go
> together, since both are necessary to make the program compile again.
> On the other hand, it would be nice to be able to see the original
> diff separately.

I think what we're going to do is actually compress the diff and
store it when you receive it, then make a journal entry when you
apply it.  Your fiddles are the difference between the version
with the diff, and your fiddled version.  It's not necessary to
record all your detailed edits to find the fiddles, though yes, it
would be nice to be able to fall back to that in murky situations.

> The more I think about it, the more I think some kind of recursive
> nesting of versions makes sense.  Bk has this, but it enforces a
> two-level model of changesets, which consist of deltas (which are more
> or less diffs.)  But I can imagine a higher-level changeset containing
> several others, particularly if they're ported or accepted from
> somebody else.

I've talked previously about 'regions', which are distinct parts
that together make up a larger diff.  It would make sense to nest such
things, and it might be possible to track regions as they evolve
through versions.  On the other hand, I don't see any obvious way to
nest versions themselves.

> > Check-in and check-out are things the computer can figure out for
> > itself.
> 
> How?

Prophesy knows you checked out a file, because you edited it.  Prophesy
knows you checked it in because you closed a version.

> How is the computer meant to know what I was thinking when I made a
> change?  That's what future readers of the code really want to know.
> It might even be *more* important than the change itself -- this is
> why ChangeLogs can work in the absence of any other SCM.  I find it's
> actually good discipline for the programmer too -- it helps them
> concentrate on doing only one thing at a time.
> 
> > Risk... I don't see it.  If anything, the risk of a programmer forgetting or 
> > misapplying a command is greater.  I know, I did it myself once :-)
> 
> Kernel crashes, down filesystems, etc.

Journalling filesystem...

> If ClearCase is down, you can't do *anything*.  If your CVS server is
> down, you can at least edit and compile locally, and diff against old
> versions.  

I suppose you missed the part where all repositories are local, and your
source tree is just a normal source tree with a database of diffs hidden
in the root.

> > As for complexity, I don't really see that.  Difficult, yes, because so far 
> > nobody has provided a suitable framework on Linux for stacking local 
> > filesystems.
> 
> I agree that would be useful.  I just think you have a filesystem-hacker
> hammer and are trying to apply it to a SCM thumb.

I think when you see where I'm going with it you will say 'aha'.

-- 
Daniel