[Prophesy] comments so far

Sat Jun 15 04:06:03 EST 2002

I agree with you about the usefulness of editor undo chains.  Under
emacs, I have kept-new-versions set to about 10, and I regularly use
C-u C-x C-s to do "keep backup version" and diff-backup.  All very
nice and useful.

A filesystem that kept all versions would allow you to do this in a
program-neutral way, although I think that's not so important now that
almost all the GNU tools understand foo.c.~1~ backups.

However, it has the same problem that the results are largely lacking
semantics.  For example, looking back through the history of all
modifications to a directory, it seems impossible to tell which
versions of the source will actually compile correctly, and which were
intermediate versions that don't work.  If a program commits
early-and-often to CVS (say), but at least runs the test suite first,
then you have in general some guarantee about the internal consistency
of any committed version.  (It would be even better if CVS versions
were module-wide, like in Subversion.)

A magic filesystem is "mere mechanism".  I don't think you should be
spending so much time on it until you have a good design for the
version-control system built on top.

If it turns out that the design "on top" is no better than CVS, then
nobody will bother -- people who want neat features will use Bk (or a
free clone), and more conservative people will use CVS.

You've said that you need to be able to cope without the filesystem --
why not first implement the version without it, and then put it in as
a nicety later?

The same functions can be adequately (perhaps not quite as well)
achieved using editor undo, editor backups, or tux2fs.  

If the design can sensibly handle many small revisions then it would
be easy to have a program called by the editor on save that commits to
it.  If the design can't handle a huge number of revisions in a
sensible way, then it doesn't matter how they get generated.

> I believe you're right, so long as SCM systems stay as clumsy as they are.  
> If the archive system was actually easy and transparent to use, then 
> programmers would use it as a tool for themselves, as a means of tracking 
> multiple projects they're involved in, and trying out experiments.  In much 
> the same way as we now rely on the undo chain in an editor - I do that, don't 
> you?  That is, I rely on the editor's undo chain to back me out of failed 
> experiments.  It gets to the point where I'm reluctant to shut down the 
> machine because of all the state saved in the editor's undo chains.  Now, 
> that's a system that works, but it's got glaring imperfections, beyond the 
> fact that the state disappears when the editor shuts down.  The editors also 
> don't know about each other, and they are incapable of maintaining undo 
> chains across different files, let alone projects.

This is the perfect example of why semantic information is necessary.
Pressing C-_ repeatedly until it looks about right is error-prone and
labour intensive -- more than anything else, this limits the
usefulness of editor undo.  For fixing small mistakes it's good, but
for backing out of hour-long experiments it seems useless to me.  I
don't want to say "undo edit" a hundred times; I want to say "back up
to before I started working on this feature".

Ideally, I can have several trees around.  (Disk is cheap.)  Instead of
rolling back, just toss that directory tree on the floor so I can find
it later if I want to see what it was that I tried.

> Granted, the SCM is also a tool for communication, but much good work has 
> already been done there.  I think the distributed side of things is well 
> known and under control,

I think current SCMs are not nearly as good as they should be.  Bk is
the only decent distributed one, which is why it's doing so well.  

> but today's crop of scm's still suck as development tools.  So
> that's where I'm concentrating.

Do you mean they're not very helpful for the individual developer?
What kind of thing?

> This was addressed earlier in an earlier post.  In the current version, every 
> change to each file is recorded (and in order, giving you global undo, 
> including undeletes) but when you close the version, the stacked changes are 
> collapsed into a single layer of changes for the version.  To put it another 
> way, the system journals individual changes, but (unless you tell it 
> otherwise) only for the current version.

I disagree with this too :-)  

SCM shouldn't ever throw away information; it should only selectively
roll it up for display.  Once you've captured a diff it should be kept
forever.  Seeing the order in which edits within a version were made
might possibly be helpful in the future.  

For example, consider the case in which a version consists of me
taking a patch from somebody, and then fiddling things a bit to make
it merge properly.  From one point of view, those changes have to go
together, since both are necessary to make the program compile again.
On the other hand, it would be nice to be able to see the original
diff separately.

The more I think about it, the more I think some kind of recursive
nesting of versions makes sense.  Bk has this, but it enforces a
two-level model of changesets, which consist of deltas (which are more
or less diffs.)  But I can imagine a higher-level changeset containing
several others, particularly if they're ported or accepted from
somebody else.

> > I think it's a
> > bad idea -- although I of course respect you for trying it -- because
> > I think the benefits compared to regular commands don't justify the
> > added complexity and risk.
> 
> Somebody from Apple said it well: "you should never have to tell the computer 
> something it already knows".

Right, but you shouldn't be afraid to tell the computer things that
are pragmatically necessary.

Somewhat off-topic comparison: directory and file names are not really
necessary, because you can always search by content.  But in practice,
with some exceptions, systems that do that have often turned out to be
hard to use.  

> Check-in and check-out are things the computer can figure out for
> itself.

How?

How is the computer meant to know what I was thinking when I made a
change?  That's what future readers of the code really want to know.
It might even be *more* important than the change itself -- this is
why ChangeLogs can work in the absence of any other SCM.  I find it's
actually good discipline for the programmer too -- it helps them
concentrate on doing only one thing at a time.

> Risk... I don't see it.  If anything, the risk of a programmer forgetting or 
> misapplying a command is greater.  I know, I did it myself once :-)

Kernel crashes, down filesystems, etc.

If ClearCase is down, you can't do *anything*.  If your CVS server is
down, you can at least edit and compile locally, and diff against old
versions.  

> As for complexity, I don't really see that.  Difficult, yes, because so far 
> nobody has provided a suitable framework on Linux for stacking local 
> filesystems.

I agree that would be useful.  I just think you have a filesystem-hacker
hammer and are trying to apply it to a SCM thumb.

-- 
Martin