[Prophesy] comments so far

Sat Jun 15 00:09:55 EST 2002

On Friday 14 June 2002 02:28, Martin Pool wrote:
> These are mostly just ideas I've had in my mind about SCM; some of
> them disagree with (what I've heard of) prophesy.  Of course, you can
> do whatever you want.  So take them or leave them.
> 
> I think the hard thing about defining a SCM system is defining just
> what SCM *means*.
> 
> As far as I can tell, you seem to be implementing a versioning
> filesystem, which lets you tag and revisit points in history.  That's
> very nice, but I don't think that is really the heart of the problem.

It's the heart of a tool that (hopefully) let's you get at the heart of the 
problem.

> I believe that SCM systems, like programming languages, are primarily
> tools for communication between programmers -- the pragmatics of
> controlling the machine are secondary.  (Included is the case of a
> programmer communicating with themselves over time.)

I believe you're right, so long as SCM systems stay as clumsy as they are.  
If the archive system was actually easy and transparent to use, then 
programmers would use it as a tool for themselves, as a means of tracking 
multiple projects they're involved in, and trying out experiments.  In much 
the same way as we now rely on the undo chain in an editor - I do that, don't 
you?  That is, I rely on the editor's undo chain to back me out of failed 
experiments.  It gets to the point where I'm reluctant to shut down the 
machine because of all the state saved in the editor's undo chains.  Now, 
that's a system that works, but it's got glaring imperfections, beyond the 
fact that the state disappears when the editor shuts down.  The editors also 
don't know about each other, and they are incapable of maintaining undo 
chains across different files, let alone projects.

Granted, the SCM is also a tool for communication, but much good work has 
already been done there.  I think the distributed side of things is well 
known and under control, but today's crop of scm's still suck as development 
tools.  So that's where I'm concentrating.

> Hooking at the filesystem level is good for capturing all changes, but
> I think they are very fine-grained and not meaningful.

This was addressed earlier in an earlier post.  In the current version, every 
change to each file is recorded (and in order, giving you global undo, 
including undeletes) but when you close the version, the stacked changes are 
collapsed into a single layer of changes for the version.  To put it another 
way, the system journals individual changes, but (unless you tell it 
otherwise) only for the current version.

> I think it's a
> bad idea -- although I of course respect you for trying it -- because
> I think the benefits compared to regular commands don't justify the
> added complexity and risk.

Somebody from Apple said it well: "you should never have to tell the computer 
something it already knows".  Check-in and check-out are things the computer 
can figure out for itself.

Risk... I don't see it.  If anything, the risk of a programmer forgetting or 
misapplying a command is greater.  I know, I did it myself once :-)

As for complexity, I don't really see that.  Difficult, yes, because so far 
nobody has provided a suitable framework on Linux for stacking local 
filesystems.  Anyway, I don't intend to tackle the problem of exporting the 
vfs to user space in its full generality, but rather, just enough to provide 
the functionality I want.  If that provides a good base to work from towards 
a fully general system, then that's a bonus.

Finally, I don't have to depend on the magic filesystem effort being 
successful, since the fallback is just to go to the traditional way of doing 
things, with explicit commands (a file checkout has the immediate effect of 
loading the current contents of the file into the database).  However, that's 
way too dull for me and would fall well short of what I'd expect from a 21st 
century design.

I've only thought in general terms about how to implement the magic 
filesystem so far, however, now is the time to get down to specifics.  As a 
design rule, I'll try to work within existing kernel mechanisms, but if those 
mechanisms prove inadequate, I won't be shy about changing them.  In the end, 
if somebody comes up with a better way of doing the same thing, that's great, 
but right now the main concerned is functionality and reliability.  Other 
essential design parameters are:

  - Overhead imposed by the magic filesystem is insignificant
  - No performance impact at all outside the scope of the magic filesystem
  - No security compromise
  - No new dos
  - No new races

When the magic filesystem is mounted, it gets a new superblock and knows 
about the superblock of the underlying system.  We want to pass most vfs 
events straight through to the underlying filesystem, except for open, write, 
mmap and close (note that the vfs only passes the final file close event to 
the filesystem, and this isn't good enough).

A pass-through write would work as follows:

  - inodes of the magic filesystem are exactly the inodes of the
    underlying filesystem, except for having an i_sb that points at
    a magic_superblock in place of the underlying filesystem's native
    superblock (does this work??)

  - vfs calls magic_file->f_dentry->d_inode->i_fop->write(magic_file, ...)

  - this magic_file_write keeps the native superblock in a private field
    of the magic superblock:

       magic_file->f_dentry->d_inode->i_sb->private.real_sb

  - magic_file_write allocates a temporary buffer, invodes the native
    filesystem's ->read to read the to-be-overwritten data into it,
    writes that data into the userspace daemon's pipe, and releases the
    temporary buffer (there has to be a more direct way of doing this!)

  - magic_file_write then calls the underlying filesystem's ->write,
    with its native... (inode??, no, it points at magic_sb, recursion!!)
    could we temporarily reset the sb?? yikes.  Too bad generic_file_write
    takes a file instead of an inode.

Other considerations:

  - Modify dnotify to allow events on files, not just directories

  - For every file open, register on 

  - File open is overridden to attach notify events to file open and file
    close, if the file was opened r/w.   These events are directed at the
    user space daemon an

  - File write is overridden in magic_file_operations->write, to read the 
    current contents of the file in the overwritten region into a pipe.  If 
    the pipe is full the writing process blocks until the userspace daemon 
    empties it.

> There's a hierarchy:
> 
>   release notes for a new version -- many end-users will read these;
>     they'll include references to bugs fixed
> 
>   list of patches accepted -- every developer probably wants to read
>     this
> 
>   list of small changes within a patch -- many programmers probably
>     want to read this
> 
>   diff for an actual patch -- probably don't need to read it unless 
>     I'm actually working in the area
> 
> Perhaps there are some other levels, but you get the idea.  I think
> the recursive nature is very important.  The key job of the SCM system
> is to help programmers manage the history of development of the
> project.
> 
> Just keeping a GNU-style ChangeLog can be pretty useful even without
> SCM.
> 
> Autogenerating a NEWS file by pulling out top-level comments would be
> great, because it's one of the most useful tools to a user or
> satellite developer.
> 
> Offline operation is crucial.  Most projects don't have everybody on a
> LAN.  Open source is inherently distributed.  Time costs here will
> drastically outweigh anything you can do with a database, etc, on the
> server.
> 
> Arch makes every download of the product a potential working
> directory.  I don't think it's necessary to keep the entire history in
> every tarball, but it is perhaps good to keep references that tie the
> files to their place in history.
> 
> It would, by extension, be nice to allow all downloads to happen over
> http/ftp, and all submissions to happen by mail to a maintainer.  The
> program should not require any intelligence in the protocol.
> 
> People shouldn't need permission to start hacking on a project, and to
> keep versions locally.  They just need permission to commit to the
> master site.
> 
> diffs have this nice property of being intelligible to humans and
> programs.  Keep them.  Make minimal changes to handle chmod, mv, etc.
> 
> All other things being equal, files should be directly human-readable.
> Use diffs.  Perhaps make ChangeLogs, or something similar, part of the
> metadata.  (On the other hand, being readable might encourage editing
> by hand, which would be bad.)
> 
> Writing new filesystems, diff formats, network protocols, etc is just
> screwing around.  The heart of the problem is to get a good model for
> *how to do SCM*.  You can implement (v1) using existing tools;
> optimize later if it turns out that your model is correct.
> 
> Similarly, don't waste time writing GUIs; use emacs, xxdiff, dirdiff,
> etc.  Write one later if it proves correct.
> 
> If I was starting from scratch, I would consider a typical open source
> project:
> 
>  - email is key
> 
>  - people mail around patches; perhaps they get revised; eventually
>    they get applied
> 
>  - the NEWS file says "applied patch for foofeature from
>    jhacker at dot.com"
> 
> Projects sometimes split off files or subdirectories into other
> projects; perhaps they diverge slightly.  It would be nice to handle
> this.
> 
> For rsync and other projects, I keep patches that I have not yet
> really accepted but that look good in CVS in patches/.  A SCM system
> that managed this would be nice.  I think it's a promising model, not
> a hack.
> 
> Disk is cheap.  Keep everything.
> 
> Networks are getting broader, but latency is not going to go away.
> 
> Do it in <4000lines.  Lions-book Unix was 10kloc, and look how many
> good ideas they had in there.
> 
> -- 
> Martin 
> _______________________________________________
> Prophesy mailing list
> Prophesy at auug.org.au
> http://www.auug.org.au/mailman/listinfo/prophesy
> 
> 

-- 
Daniel