[Prophesy] comments so far
Daniel Phillips
phillips at bonn-fries.net
Sat Jun 15 04:18:54 EST 2002
On Friday 14 June 2002 16:09, Daniel Phillips wrote:
> On Friday 14 June 2002 02:28, Martin Pool wrote:
Well, I didn't intend to send the previous post until after having worked out
more of the magic filesystem issues, however... the implication is that files
under management of the magic filesystem have to have two inodes, one
belonging to the magic filesystem and one belonging to the native filesystem.
I'm putting down much of this awkwardness to what I'm increasingly seeing as
misdesign of the vfs, but cleaning that up is not the immediate project.
I'll return to the question of the magic filesystem later.
OK, now the first thing I should say is that I agree with all the features
you list below, and what I'm going to do now is speculate about how the
current design can support each of them, or what needs to be done to support
them.
> > There's a hierarchy:
> >
> > release notes for a new version -- many end-users will read these;
> > they'll include references to bugs fixed
So the database needs to know what's a release note. This is version
metadata, since a release is always a version. The question is, do we want
to define metadata structure at the database table level, or do we want to
just put all version metadata together in a single 'version metadata' record
per version and parse it out with xml or some such?
> > list of patches accepted -- every developer probably wants to read
> > this
Meaning the system has to know what the patch is, when accepted, into what
version, and so on. What I'd like to do if possible is to carry forward
patches as objects from version to version, so that the scm user can apply a
patch to version 2.4.16 and remove it, perhaps after it's mutated a little,
from version 2.4.19. For now, the most practical way to do this is just keep
the patch verbatim in the database (along with the who/when/etc information)
and let the user figure out what has to be done to revert it later. Hmm,
yes, that's easy, and it's what you want I strongly suspect.
The list of patches applied to a particular version is actually very
important. Without it, you don't know what to revert. I've often felt the
lack of this kind of information.
Anyway, this feature is what bitkeeper would call 'import patch', except that
Prophesy is going to remember more about the imported patch than Bitkeeper
does, will keep the patch in its database, and will let you revert it without
having to find the original copy on disk.
> > list of small changes within a patch -- many programmers probably
> > want to read this
Right, so when Prophesy parses out the patch (we don't need to use patch to
do this any more, because of the parser I wrote) it will save the patch
header as metadata, assuming it's a description. The Prophesy user can edit
this and mark it up so that it can generate a nice-looking listing of patch
details (realistically, nobody ever edits these details, but it's nice to
know you could).
> > diff for an actual patch -- probably don't need to read it unless
> > I'm actually working in the area
Right, since the actual diff is compressed into the database, the web
interface could pull it up for you.
> > Perhaps there are some other levels, but you get the idea. I think
> > the recursive nature is very important. The key job of the SCM system
> > is to help programmers manage the history of development of the
> > project.
> >
> > Just keeping a GNU-style ChangeLog can be pretty useful even without
> > SCM.
> >
> > Autogenerating a NEWS file by pulling out top-level comments would be
> > great, because it's one of the most useful tools to a user or
> > satellite developer.
Yes, here you'd have to convince your submitters to mark up their patches, or
you'd have to do it yourself. Taking the email subject line by default would
be a good start.
> > Offline operation is crucial. Most projects don't have everybody on a
> > LAN. Open source is inherently distributed. Time costs here will
> > drastically outweigh anything you can do with a database, etc, on the
> > server.
The database is installed and runs locally. Operation is offline by default.
> > Arch makes every download of the product a potential working
> > directory. I don't think it's necessary to keep the entire history in
> > every tarball, but it is perhaps good to keep references that tie the
> > files to their place in history.
That's right, for every repository there's a working directory. The
repository database lives in the root of the workign directory. By the way,
Prophesy is not so rude as to force an additional top level directory on top
of the normal top directory as BitKeeper and other systems do.
> > It would, by extension, be nice to allow all downloads to happen over
> > http/ftp,
As with Subversion, distributed access will be provided in the form of an
Apache module. Providing an ftp view as well would be very nice.
> > and all submissions to happen by mail to a maintainer. The
> > program should not require any intelligence in the protocol.
Right. We want to integrate Rasmus's patchbot work.
> > People shouldn't need permission to start hacking on a project, and to
> > keep versions locally. They just need permission to commit to the
> > master site.
True, and permission to transmit to the remote site is an entirely different
thing, and should be easier to get than permission to commit to the remote
site.
By the way, there will be not any 'master' site, only remote sites, i.e.,
Prophesy is peer-to-peer.
> > diffs have this nice property of being intelligible to humans and
> > programs. Keep them. Make minimal changes to handle chmod, mv, etc.
Right, keep the ability to parse them and generate them, but don't use them
internally, they're inappropriate for that. Except that Prophesy will
archive the diff in its original form, as received. I suppose that for
symmetry we should allow diffs to be sent to be archived as well, complete
with descriptive comments etc.
> > All other things being equal, files should be directly human-readable.
> > Use diffs. Perhaps make ChangeLogs, or something similar, part of the
> > metadata. (On the other hand, being readable might encourage editing
> > by hand, which would be bad.)
Using diffs internally in the database is out of the question. They're just
not an appropriate currency for the kinds of manipulations Prophesy has to do.
> > Writing new filesystems, diff formats, network protocols, etc is just
> > screwing around.
I agree about the network protocols, but not about the filesystem magic and
the internal storage format. Particularly in regards to the latter, look at
the research that's been done. There's a reason for it: archive size and
efficiency of common operations is a very real problem. Not to mention
accuracy and power. These things depend very much on the solidity of the
foundation on which the superstructure stands.
> > The heart of the problem is to get a good model for
> > *how to do SCM*. You can implement (v1) using existing tools;
> > optimize later if it turns out that your model is correct.
Well actually, by parsing diffs to get the transforms that's exactly what I'm
doing. (And it turns out that doing a proper binary diff isn't that hard.)
Python, postgresql, glade, etc., are all 'existing tools'. What other
existing tools would you suggest? Not patch. It's much easier and faster to
apply database deltas with the already-implemented transform mechanism.
Later, when we get to merging, patch or a patch-like thing will be needed,
and then we'll probably start with patch and move to something faster/more
powerful/more reliable later.
> > Similarly, don't waste time writing GUIs; use emacs, xxdiff, dirdiff,
> > etc. Write one later if it proves correct.
Agreed there. However, once the basic transport mechanism is in place, a
guid will follow very shortly afterwards, to show the version tree.
> > If I was starting from scratch, I would consider a typical open source
> > project:
> >
> > - email is key
> >
> > - people mail around patches; perhaps they get revised; eventually
> > they get applied
> >
> > - the NEWS file says "applied patch for foofeature from
> > jhacker at dot.com"
Yes indeed, we can and will automate that.
> > Projects sometimes split off files or subdirectories into other
> > projects; perhaps they diverge slightly. It would be nice to handle
> > this.
Yes, a source tree should be able to inherit files from another project, and
Prophesy should treat these files as descending from the same object. Each
file object can have its own evolutionary tree, and these tree are not the
same or restricted at all by the version tree or project boundaries.
Furthermore, we should be able to recognize that one object is identical to
another in a remote tree, or had a common ancestor. This touches on the
subject of universal object ids, which I mentioned earlier in the archives,
and I have not forgotten about it. First things first, though.
> > For rsync and other projects, I keep patches that I have not yet
> > really accepted but that look good in CVS in patches/. A SCM system
> > that managed this would be nice. I think it's a promising model, not
> > a hack.
> >
> > Disk is cheap. Keep everything.
But keep it as compactly as you can. It's not that cheap. I have 7 gig of
source on my laptop and several times that on my server. Most of that
consists of kernel trees, all slightly different versions, or different
projects in them. That's just silly.
> > Networks are getting broader, but latency is not going to go away.
> >
> > Do it in <4000lines. Lions-book Unix was 10kloc, and look how many
> > good ideas they had in there.
I suppose the first useful version will be about that size (4K lines).
--
Daniel
More information about the Prophesy
mailing list