[Prophesy] comments so far
Daniel Phillips
phillips at bonn-fries.net
Sat Jun 15 00:09:55 EST 2002
On Friday 14 June 2002 02:28, Martin Pool wrote:
> These are mostly just ideas I've had in my mind about SCM; some of
> them disagree with (what I've heard of) prophesy. Of course, you can
> do whatever you want. So take them or leave them.
>
> I think the hard thing about defining a SCM system is defining just
> what SCM *means*.
>
> As far as I can tell, you seem to be implementing a versioning
> filesystem, which lets you tag and revisit points in history. That's
> very nice, but I don't think that is really the heart of the problem.
It's the heart of a tool that (hopefully) let's you get at the heart of the
problem.
> I believe that SCM systems, like programming languages, are primarily
> tools for communication between programmers -- the pragmatics of
> controlling the machine are secondary. (Included is the case of a
> programmer communicating with themselves over time.)
I believe you're right, so long as SCM systems stay as clumsy as they are.
If the archive system was actually easy and transparent to use, then
programmers would use it as a tool for themselves, as a means of tracking
multiple projects they're involved in, and trying out experiments. In much
the same way as we now rely on the undo chain in an editor - I do that, don't
you? That is, I rely on the editor's undo chain to back me out of failed
experiments. It gets to the point where I'm reluctant to shut down the
machine because of all the state saved in the editor's undo chains. Now,
that's a system that works, but it's got glaring imperfections, beyond the
fact that the state disappears when the editor shuts down. The editors also
don't know about each other, and they are incapable of maintaining undo
chains across different files, let alone projects.
Granted, the SCM is also a tool for communication, but much good work has
already been done there. I think the distributed side of things is well
known and under control, but today's crop of scm's still suck as development
tools. So that's where I'm concentrating.
> Hooking at the filesystem level is good for capturing all changes, but
> I think they are very fine-grained and not meaningful.
This was addressed earlier in an earlier post. In the current version, every
change to each file is recorded (and in order, giving you global undo,
including undeletes) but when you close the version, the stacked changes are
collapsed into a single layer of changes for the version. To put it another
way, the system journals individual changes, but (unless you tell it
otherwise) only for the current version.
> I think it's a
> bad idea -- although I of course respect you for trying it -- because
> I think the benefits compared to regular commands don't justify the
> added complexity and risk.
Somebody from Apple said it well: "you should never have to tell the computer
something it already knows". Check-in and check-out are things the computer
can figure out for itself.
Risk... I don't see it. If anything, the risk of a programmer forgetting or
misapplying a command is greater. I know, I did it myself once :-)
As for complexity, I don't really see that. Difficult, yes, because so far
nobody has provided a suitable framework on Linux for stacking local
filesystems. Anyway, I don't intend to tackle the problem of exporting the
vfs to user space in its full generality, but rather, just enough to provide
the functionality I want. If that provides a good base to work from towards
a fully general system, then that's a bonus.
Finally, I don't have to depend on the magic filesystem effort being
successful, since the fallback is just to go to the traditional way of doing
things, with explicit commands (a file checkout has the immediate effect of
loading the current contents of the file into the database). However, that's
way too dull for me and would fall well short of what I'd expect from a 21st
century design.
I've only thought in general terms about how to implement the magic
filesystem so far, however, now is the time to get down to specifics. As a
design rule, I'll try to work within existing kernel mechanisms, but if those
mechanisms prove inadequate, I won't be shy about changing them. In the end,
if somebody comes up with a better way of doing the same thing, that's great,
but right now the main concerned is functionality and reliability. Other
essential design parameters are:
- Overhead imposed by the magic filesystem is insignificant
- No performance impact at all outside the scope of the magic filesystem
- No security compromise
- No new dos
- No new races
When the magic filesystem is mounted, it gets a new superblock and knows
about the superblock of the underlying system. We want to pass most vfs
events straight through to the underlying filesystem, except for open, write,
mmap and close (note that the vfs only passes the final file close event to
the filesystem, and this isn't good enough).
A pass-through write would work as follows:
- inodes of the magic filesystem are exactly the inodes of the
underlying filesystem, except for having an i_sb that points at
a magic_superblock in place of the underlying filesystem's native
superblock (does this work??)
- vfs calls magic_file->f_dentry->d_inode->i_fop->write(magic_file, ...)
- this magic_file_write keeps the native superblock in a private field
of the magic superblock:
magic_file->f_dentry->d_inode->i_sb->private.real_sb
- magic_file_write allocates a temporary buffer, invodes the native
filesystem's ->read to read the to-be-overwritten data into it,
writes that data into the userspace daemon's pipe, and releases the
temporary buffer (there has to be a more direct way of doing this!)
- magic_file_write then calls the underlying filesystem's ->write,
with its native... (inode??, no, it points at magic_sb, recursion!!)
could we temporarily reset the sb?? yikes. Too bad generic_file_write
takes a file instead of an inode.
Other considerations:
- Modify dnotify to allow events on files, not just directories
- For every file open, register on
- File open is overridden to attach notify events to file open and file
close, if the file was opened r/w. These events are directed at the
user space daemon an
- File write is overridden in magic_file_operations->write, to read the
current contents of the file in the overwritten region into a pipe. If
the pipe is full the writing process blocks until the userspace daemon
empties it.
> There's a hierarchy:
>
> release notes for a new version -- many end-users will read these;
> they'll include references to bugs fixed
>
> list of patches accepted -- every developer probably wants to read
> this
>
> list of small changes within a patch -- many programmers probably
> want to read this
>
> diff for an actual patch -- probably don't need to read it unless
> I'm actually working in the area
>
> Perhaps there are some other levels, but you get the idea. I think
> the recursive nature is very important. The key job of the SCM system
> is to help programmers manage the history of development of the
> project.
>
> Just keeping a GNU-style ChangeLog can be pretty useful even without
> SCM.
>
> Autogenerating a NEWS file by pulling out top-level comments would be
> great, because it's one of the most useful tools to a user or
> satellite developer.
>
> Offline operation is crucial. Most projects don't have everybody on a
> LAN. Open source is inherently distributed. Time costs here will
> drastically outweigh anything you can do with a database, etc, on the
> server.
>
> Arch makes every download of the product a potential working
> directory. I don't think it's necessary to keep the entire history in
> every tarball, but it is perhaps good to keep references that tie the
> files to their place in history.
>
> It would, by extension, be nice to allow all downloads to happen over
> http/ftp, and all submissions to happen by mail to a maintainer. The
> program should not require any intelligence in the protocol.
>
> People shouldn't need permission to start hacking on a project, and to
> keep versions locally. They just need permission to commit to the
> master site.
>
> diffs have this nice property of being intelligible to humans and
> programs. Keep them. Make minimal changes to handle chmod, mv, etc.
>
> All other things being equal, files should be directly human-readable.
> Use diffs. Perhaps make ChangeLogs, or something similar, part of the
> metadata. (On the other hand, being readable might encourage editing
> by hand, which would be bad.)
>
> Writing new filesystems, diff formats, network protocols, etc is just
> screwing around. The heart of the problem is to get a good model for
> *how to do SCM*. You can implement (v1) using existing tools;
> optimize later if it turns out that your model is correct.
>
> Similarly, don't waste time writing GUIs; use emacs, xxdiff, dirdiff,
> etc. Write one later if it proves correct.
>
> If I was starting from scratch, I would consider a typical open source
> project:
>
> - email is key
>
> - people mail around patches; perhaps they get revised; eventually
> they get applied
>
> - the NEWS file says "applied patch for foofeature from
> jhacker at dot.com"
>
> Projects sometimes split off files or subdirectories into other
> projects; perhaps they diverge slightly. It would be nice to handle
> this.
>
> For rsync and other projects, I keep patches that I have not yet
> really accepted but that look good in CVS in patches/. A SCM system
> that managed this would be nice. I think it's a promising model, not
> a hack.
>
> Disk is cheap. Keep everything.
>
> Networks are getting broader, but latency is not going to go away.
>
> Do it in <4000lines. Lions-book Unix was 10kloc, and look how many
> good ideas they had in there.
>
> --
> Martin
> _______________________________________________
> Prophesy mailing list
> Prophesy at auug.org.au
> http://www.auug.org.au/mailman/listinfo/prophesy
>
>
--
Daniel
More information about the Prophesy
mailing list