[Prophesy] comments so far

Martin Pool mbp at samba.org
Fri Jun 14 10:28:54 EST 2002


These are mostly just ideas I've had in my mind about SCM; some of
them disagree with (what I've heard of) prophesy.  Of course, you can
do whatever you want.  So take them or leave them.

I think the hard thing about defining a SCM system is defining just
what SCM *means*.

As far as I can tell, you seem to be implementing a versioning
filesystem, which lets you tag and revisit points in history.  That's
very nice, but I don't think that is really the heart of the problem.

I believe that SCM systems, like programming languages, are primarily
tools for communication between programmers -- the pragmatics of
controlling the machine are secondary.  (Included is the case of a
programmer communicating with themselves over time.)

Hooking at the filesystem level is good for capturing all changes, but
I think they are very fine-grained and not meaningful.  I think it's a
bad idea -- although I of course respect you for trying it -- because
I think the benefits compared to regular commands don't justify the
added complexity and risk.

There's a hierarchy:

  release notes for a new version -- many end-users will read these;
    they'll include references to bugs fixed

  list of patches accepted -- every developer probably wants to read
    this

  list of small changes within a patch -- many programmers probably
    want to read this

  diff for an actual patch -- probably don't need to read it unless 
    I'm actually working in the area

Perhaps there are some other levels, but you get the idea.  I think
the recursive nature is very important.  The key job of the SCM system
is to help programmers manage the history of development of the
project.

Just keeping a GNU-style ChangeLog can be pretty useful even without
SCM.

Autogenerating a NEWS file by pulling out top-level comments would be
great, because it's one of the most useful tools to a user or
satellite developer.

Offline operation is crucial.  Most projects don't have everybody on a
LAN.  Open source is inherently distributed.  Time costs here will
drastically outweigh anything you can do with a database, etc, on the
server.

Arch makes every download of the product a potential working
directory.  I don't think it's necessary to keep the entire history in
every tarball, but it is perhaps good to keep references that tie the
files to their place in history.

It would, by extension, be nice to allow all downloads to happen over
http/ftp, and all submissions to happen by mail to a maintainer.  The
program should not require any intelligence in the protocol.

People shouldn't need permission to start hacking on a project, and to
keep versions locally.  They just need permission to commit to the
master site.

diffs have this nice property of being intelligible to humans and
programs.  Keep them.  Make minimal changes to handle chmod, mv, etc.

All other things being equal, files should be directly human-readable.
Use diffs.  Perhaps make ChangeLogs, or something similar, part of the
metadata.  (On the other hand, being readable might encourage editing
by hand, which would be bad.)

Writing new filesystems, diff formats, network protocols, etc is just
screwing around.  The heart of the problem is to get a good model for
*how to do SCM*.  You can implement (v1) using existing tools;
optimize later if it turns out that your model is correct.

Similarly, don't waste time writing GUIs; use emacs, xxdiff, dirdiff,
etc.  Write one later if it proves correct.

If I was starting from scratch, I would consider a typical open source
project:

 - email is key

 - people mail around patches; perhaps they get revised; eventually
   they get applied

 - the NEWS file says "applied patch for foofeature from
   jhacker at dot.com"

Projects sometimes split off files or subdirectories into other
projects; perhaps they diverge slightly.  It would be nice to handle
this.

For rsync and other projects, I keep patches that I have not yet
really accepted but that look good in CVS in patches/.  A SCM system
that managed this would be nice.  I think it's a promising model, not
a hack.

Disk is cheap.  Keep everything.

Networks are getting broader, but latency is not going to go away.

Do it in <4000lines.  Lions-book Unix was 10kloc, and look how many
good ideas they had in there.

-- 
Martin 



More information about the Prophesy mailing list