[Prophesy] comments so far

Wed Jun 19 12:53:41 EST 2002

If you want to design userspace filesystem hook that's fine; if you
want to design a SCM system that's fine too (and more interesting to
me personally.)  If you think that a SCM system ought to be built on
top of kernel dnotify hooks then I really have to take issue with you.

In summary:

 [1] this turns out to be a real weak point in the biggest known
     implementation of the design, ClearCase

 [2] on general principle, things shouldn't be in the kernel unless
     they need to be

 [3] you're not tackling the real problem

[1]

I was looking at a Clearcase installation at a large company earlier
on today.  Everybody's views (~= working directories) are kept on this
machine under /view.  Fine.

  cd /view/
  ls -l 

Hangs.  Foo.  

  strace ls -l

shows it looping indefinitely on getdirent() (something like that?).
Pressing TAB in bash produces the same effect -- sometimes you have to
kill bash and log in again.  Very amusing.  You don't realize how
often you use this until you work on a machine without bash, or on a
machine were pressing tab is likely to hang your shell.

Anyhow, so I get a view name from somebody else, type it in casefully,
and can see things inside.  It is noticeably slower than it ought to
be, considering the machine it's stored on (modern PIII or something)
-- listing a directory takes a fair fraction of a second.  

Of course ClearCase is famous for having enormous hardware
requirements, exceeding the cost of a developer's desktop hardware.
This is no accident, but rather an essential implication of the
design: every file IO, even just creating a short-lived temporary
file, has to go to userspace, potentially across the network, into a
daemon, and potentially into a database.  A large fraction of IO on a
working directory will have nothing to do with SCM: it will be, e.g.,
compilation to a test copy.  It's dumb to impose the cost on
operations when there will be no benefit.

But it's basically all there, and seems to work well.  It seems like
ClearCase has some nice features.  One popular one is that there are
good X11 and W32 GUIs for all operations.  It would be good if free
systems had that, but it's really more or less independent of the
underlying architecture.

Later on we noticed that one of the build scripts was having trouble
removing a temporary directory.  Eventually it turned out that a file
in a /tmp subdirectory was causing unlink() to return ENOENT, even
though the file could be listed, stat'd, and even moved.  I suspect
ClearCase had somehow corrupted the machines dcache or something to
cause this behaviour.  The machine was in other respects pretty
standard.  Presumably rebooting will "fix" it.

So at this point I say:

 - "bloody proprietary kernel modules"
 - "bloody unnecessary kenrel modules"

(Insert epithet of choice in locales other than en_AU)

Now, of course, all software has bugs, and I guess Rational will
either eventually fix this, or explain how it's misconfigured on this
machine, or at any rate be interested to see the report which will be
passed to them.  

I don't expect software not to have bugs, but I do think if there are
simple design decisions that you can make early on that will reduce
the likelihood or severity of bugs, you should do so unless there is a
strong counterargument.

You can make an argument about open source being less buggy (or not)
or Rational being dumb (or not), but I don't think any of them is
clearly true.  At any rate, ClearCase is more mature than Prophesy is
likely to be any time soon.

I've seen bugs in BK; typically they can be resolved by using one of
BK's commands to preen a repository or remove leftover locks.  It
hasn't ever caused random other bad things to happen on unrelated
parts of my machine and I wouldn't expect it to.

[2]

I think the weight of OS design experience is behind me in saying that
things should not be in the kernel unless there is some security,
performance, or functionality reason why they have to be there.  I
realize you only want to put hooks into the kernel, not the whole
thing, but ClearCase does that too, and the issues still apply.

I don't see anything about SCM that can't be adequately done purely in
userspace.  In as much as Daniel is designing a system he wants other
people to work on and use, I think the obligation is on him to
demonstrate that a kernel dependency is necessary.  This is particular
so given [1], that putting it in the kernel has turned out to be a
problem in the past.  I don't think that justification is impossible,
but I'm a long way from being convinced.

I can see a few possible justifications, but I don't think any of them
stand up:

 "it's transparent"

   That's bogus; a CVS working directory and a ClearCase view are both
   trivially transparent in that you can read and edit files using
   normal tools, but you need to know magic commands or syntax to
   actually do anything.

 "it avoids having nasty CVS dirs lying around"

   It's slightly tidier, but it turns out not to be a real problem.
   If it bugged you, you could have just one in the top level, or make
   it a dot file.

 "you can auto-detect rename/add/delete"

   Handling renames is important, but automatically doing it is
   somewhat less so.  There are several other systems possibly as
   good:

    - magic tokens embedded in the file (arch)
    - detecting similar file text (bk)
    - explicit notification (pre or post)
    - ...

  These don't happen often enough that it needs to be completely
  transparent.  "bk mv foo bar" is not significantly harder; leaning
  to type it is trivial by comparison to learning the overall system.

 "you can keep intermediate changes"

  Well, that's nice.  But given that you're going to throw them away
  anyhow, I don't see how it's any better than editor backups or a
  filesystem with history.  I guess I don't see it as essentially part
  of SCM -- it's related but not the same.

  Given a tiny command that's run on each save or build you can do
  this from userspace anyhow.

People have tried keeping source in databases before (Zope, VisualAge,
various Smalltalks), but in general programmers seem to prefer
relatively little magic in their source directories.  Even MSVC++
keeps plain files on disk.  Having plain files opens up opportunities;
magic databases close them off.

[3] 

SCM is a hard problem to define; SCM software more or less maps 1:1
with the author's view of how software development is done or ought to
be done.  The challenge is to think about SCM differently, or more
clearly, than has been done before.  

Svn have already thought about this more than me.  My overall
impression is that they want to be a "good enough" replacement for
CVS's more gaping holes, which is a good goal.  

If you're going to write a new system rather than hack on (say)
Subversion, then it seems to me that you ought to aim to be better
than any existing design on at least one important point.

I know people here are talking about that, but I think it needs a lot
more work before writing code.

I think it's far more important than worrying about kernel hooks.

Problems that you ought to be thinking about, in my not-very-humble
opinion:

 * Do you want to support disconnected operation?  That sounds like a
   good idea, even when the systems are not really "disconnected" but
   just on a modem in another continent.  It definitely makes your job
   harder and more interesting: trivially, when you commit, the
   version number you generate must be local and not universally
   authoritative.  (cf bk's "keys") There are several levels, from
   merely being able to edit while disconnected (cvs) to making
   patches but not sending (diff and mail) to basically everything
   (bk). 

 * Can you have "threads" of development, where several changes are
   aimed at fixing the same thing, but they're not committed to a
   separate branch?

 * Is this meant for people working in an open source / internet way,
   or in a small-office way?  Or do you aim to handle both?  They seem
   pretty different: at one extreme, people just mail around patches;
   at the other, people just all work in the same directory.

   A lot of the literature about "Configuration Management" (capital
   C, M) is written from a military or enormous-project point of view,
   which is pretty different from that of open source hackers, and not
   necessarily better for all problems.

 * It seems obvious that you want some way of building logical changes
   that span multiple files.  Really?  Does it make sense to have two
   distinct changes to the same file inside this? 

 * Can changesets be nested?

 * How do you represent accepting a patch from somebody, without
   losing that patch's internal structure?

 * If you make a mistake in a commit message, can you go back and
   change it?  In many systems you can't, because that would be
   "rewriting history".  It seems useful though, in some cases, and
   you can solve it by introducing a meta-history concept.

 * How do you make all this comprehensible?  Can you explain it in a
   single page to a novice user, and leave the complicated stuff til
   later?  Will they get bitten if they try to work with just a simple
   understanding?

 * Subdirectories often spin off as child projects, (tdb from samba)
   or they might merge in (experimental architectures joining Linux.)
   Can that be supported in some way better than just copying a
   snapshot of the files across?  Do you want to?

 * What does it mean to support the "reviewer" role?

 * How do you handle repeated bidirectional merges between parallel
   streams of development?

 * Do you want to tackle the "star-merge" problem handled by arch,
   where you work out the order of applying multiple patches that is
   least likely to cause conflicts?

 * Does the system need to do anything to help with merging beyond
   just running something equivalent to diff3 and letting you resolve
   conflicts by hand?

 * Some object files are really hard/slow to produce and so it kind of
   makes sense to keep them in vc, although they don't really belong.
   (e.g. files requiring a special toolchain; autoconf output)  Can
   you keep them as second-class citizens to avoid conflicts, etc.

 * Sometimes people want to e.g. check in binaries of released
   versions, so that they can be exactly restored even if the compiler
   changes later.  What do you think of that?

 * Can the SCM play a role in communicating at appropriate levels of
   detail to various audiences?  (Users, potential users, managers,
   developers, core team, satellite developers, distribution
   maintainers, release engineers, ...)

 * What happens when you're in the middle of changing something and
   you notice a little bug?  You want to fix the bug, but also keep
   that fix separate from your main commit.  Under CVS, you might get
   a second checkout, fix it there, and merge, but that's slow and a
   lot of trouble, so people mostly don't bother.  It would be nice if
   they could.

 * What about developers who are trusted to commit to one branch, but
   not to HEAD?

 * Lots more questions.

This is long enough already, you get the idea.
--
Martin