From rasmus at jaquet.dk Wed May 1 16:45:26 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Wed, 1 May 2002 08:45:26 +0200 Subject: [Prophesy] General questions Message-ID: <20020501084526.B14505@jaquet.dk> Hi. Thanks for the invite and and thanks for letting me in. I have a number of questions that I'll list below in no apparent order. The list archives were not that stuffed, so I guess that some of the initial discussions happened prior to or outside this list? Anyway, the list: 1) This project would seem to be a reaction to the BK thread on lk and its goal (I guess) would be to get Linus off BK. So, are we as a group aware/familiar with the features of BK that Linus, Garzik, Riel etc, like and want? If so, could somebody list them for me? 2) In his 'Answers from 39000 ft' mail, Daniel states that prophesy will manage all files in a tree. Is that convenient? If I want prophesy to manage my kernel tree, I surely dont want it to manage all the gunk created during a compile? 3) Are anybody on this list familiar with SCM? More to the point, could anybody here give me a list of SCM related links/infor- mation? Weave vs. diff+patch(xdelta)? How to merge sanely/ automagically? 4) One of the features from 1) would be the distributed nature of BK, I guess? Are there any thoughts on how to handle this? 5) Daniel's 39K mail didn't mention changesets, the ability to group changes to files. I guess we are going to have this? And changesets would be on a delta-commit basis? (These may be too concrete for now, answer at will :) 6) Tom Lord's 'arch' have been mentioned as an alternative for BK. Are we aware of how he handles some of the above questions? (All these aware/familiar questions are really disguised versions of 'I dont know. Please tell me about X if you know.') 7) Why is this a closed list? Regards, Rasmus From phillips at bonn-fries.net Fri May 3 13:45:14 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Fri, 3 May 2002 05:45:14 +0200 Subject: [Prophesy] General questions In-Reply-To: <20020501084526.B14505@jaquet.dk> References: <20020501084526.B14505@jaquet.dk> Message-ID: On Wednesday 01 May 2002 08:45, Rasmus Andersen wrote: > Hi. > > Thanks for the invite and and thanks for letting me in. > > I have a number of questions that I'll list below in no apparent > order. The list archives were not that stuffed, so I guess that > some of the initial discussions happened prior to or outside this > list? > > Anyway, the list: > > 1) This project would seem to be a reaction to the BK thread on > lk and its goal (I guess) would be to get Linus off BK. So, > are we as a group aware/familiar with the features of BK that > Linus, Garzik, Riel etc, like and want? If so, could somebody > list them for me? The best source of that information is the 'patch penguin' thread, where Linus talks about starting with it, and then talks soon after about what he wants in it. I'd see it being useful to other people way before being attractive enough to Linus to break his BK habit. I've noticed that I personally am spending far more time fiddling around creating and maintaining patch sets than I should, so... if I invested that time in getting some tools together instead of messing with the patch it would be a win already. The *immediate* purpose of this is to provide a repository that the patchbot can operate and that Linus can pull from, and which is not Bitkeeper. > 2) In his 'Answers from 39000 ft' mail, Daniel states that prophesy > will manage all files in a tree. Is that convenient? If I want > prophesy to manage my kernel tree, I surely dont want it to > manage all the gunk created during a compile? Oh no, I only meant source files. We need a way to know what is source and what is not. The new kbuild makes that much easier, by taking all the generated files out of the tree. This still leaves questions about configuration-generated symlinks, which should not fool the SCM into thinking there are more files than there really are. > 3) Are anybody on this list familiar with SCM? More to the point, > could anybody here give me a list of SCM related links/infor- > mation? Weave vs. diff+patch(xdelta)? How to merge sanely/ > automagically? I'm not 100% clear on 'weave' yet myself. Soon will be though: http://www.perforce.com/perforce/life.html > 4) One of the features from 1) would be the distributed nature > of BK, I guess? Are there any thoughts on how to handle this? I thought I'd first think about having it work very well, locally. We don't need it to be distributed for either of the first two applications, that is, preparing patch sets and acting as a repository for the patchbot. Or, another way of putting that is, Larry already provides us a way for it to be distributed, through his pull. And no, I haven't thought at all about how technically difficult it will be to support a BK pull yet. There might even be legal questions of whether Larry's patents allow us to support a BK pull, and if so... then I think we'll suddenly find a lot more developers on the project, so that possibility doesn't worry me. > 5) Daniel's 39K mail didn't mention changesets, the ability to > group changes to files. I guess we are going to have this? > And changesets would be on a delta-commit basis? (These may > be too concrete for now, answer at will :) Yes. I'm trying to avoid steering too close to Larry's terminology just for now, until I understand how standard it is. For now, a 'delta' to me, is the difference between any two source trees, and deltas can be partitioned into things called... I don't know, subdeltas or something, the relationship being, and subdeltas can also be partitioned. Not only that, but the partitioning of subdeltas is fluid, and can be changed using database queries, such as 'select all the files that satisfy this logical expression, and the delta will be partitioned into the part that affects those files and the part that doesn't'. Or logical tests could be applied at the line level, and so on. We want to really leverage the fact that we can put a full-blown sql database into the mix, that's something the proprietary side would have a lot of trouble doing. > 6) Tom Lord's 'arch' have been mentioned as an alternative for > BK. Are we aware of how he handles some of the above questions? > (All these aware/familiar questions are really disguised > versions of 'I dont know. Please tell me about X if you know.') No, and I swear I will try it this weekend! > 7) Why is this a closed list? It was the decision of the gentleman who set it up, or perhaps it was an accident. Considering the recent flamewar, I don't think it's wrong to keep it closed for the time being. -- Daniel From rasmus at jaquet.dk Sat May 4 06:15:28 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Fri, 3 May 2002 22:15:28 +0200 Subject: [Prophesy] General questions In-Reply-To: References: <20020501084526.B14505@jaquet.dk> Message-ID: <20020503201528.GE1893@jaquet.dk> (I'm just back from business travel and is quite bombed, so this is even terser than usual.) On Fri, May 03, 2002 at 05:45:14AM +0200, Daniel Phillips wrote: > > 1) This project would seem to be a reaction to the BK thread on > > lk and its goal (I guess) would be to get Linus off BK. So, > > are we as a group aware/familiar with the features of BK that > > Linus, Garzik, Riel etc, like and want? If so, could somebody > > list them for me? > > The best source of that information is the 'patch penguin' thread, > where Linus talks about starting with it, and then talks soon after > about what he wants in it. I had to delete most of that thread in order to keep my server from bursting into flames due to the flame density. > I'd see it being useful to other people way before being attractive > enough to Linus to break his BK habit. I've noticed that I personally > am spending far more time fiddling around creating and maintaining > patch sets than I should, so... if I invested that time in getting > some tools together instead of messing with the patch it would be a > win already. Perhaps you could explain a bit more here? This seems like something we could work into an advantage. > > The *immediate* purpose of this is to provide a repository that the > patchbot can operate and that Linus can pull from, and which is not > Bitkeeper. If immediacy is a goal, then there is working code out there already... But you know that. > > 4) One of the features from 1) would be the distributed nature > > of BK, I guess? Are there any thoughts on how to handle this? > > I thought I'd first think about having it work very well, locally. > We don't need it to be distributed for either of the first two > applications, that is, preparing patch sets and acting as a > repository for the patchbot. Or, another way of putting that is, > Larry already provides us a way for it to be distributed, through > his pull. And no, I haven't thought at all about how technically > difficult it will be to support a BK pull yet. There might even > be legal questions of whether Larry's patents allow us to support > a BK pull, and if so... then I think we'll suddenly find a lot > more developers on the project, so that possibility doesn't worry > me. One problem is that making something distributed sucks if the initial design dont't allow for it. Rasmus From phillips at bonn-fries.net Sat May 4 06:42:27 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Fri, 3 May 2002 22:42:27 +0200 Subject: [Prophesy] General questions In-Reply-To: <20020503201528.GE1893@jaquet.dk> References: <20020501084526.B14505@jaquet.dk> <20020503201528.GE1893@jaquet.dk> Message-ID: On Friday 03 May 2002 22:15, Rasmus Andersen wrote: > (I'm just back from business travel and is quite bombed, so this > is even terser than usual.) Terse is good. > > I'd see it being useful to other people way before being attractive > > enough to Linus to break his BK habit. I've noticed that I personally > > am spending far more time fiddling around creating and maintaining > > patch sets than I should, so... if I invested that time in getting > > some tools together instead of messing with the patch it would be a > > win already. > > Perhaps you could explain a bit more here? This seems like something > we could work into an advantage. Yes. I'd like to somehow turn a 'patch' into an object managed by the SCM system. So you'd carry not just multiple tree versions, but multiple patches forward, the way real live developers do. Since developers do it, it must be possible to automate. But every SCM I've looked at so far takes a view of the whole tree, a history of commits to it, and a tree of forks. That is somehow not the whole story There is, in reality, much more structure to parallel development than that. > > The *immediate* purpose of this is to provide a repository that the > > patchbot can operate and that Linus can pull from, and which is not > > Bitkeeper. > > If immediacy is a goal, then there is working code out there > already... But you know that. Right, and ignoring it would be silly. I'm working with BitKeeper now, Arch and Subversion are on my list, I'm looking at some of the power features of CVS, and looking at the literature. But mainly, I'm thinking about things from first principles as is my habit. More on that later. > > > 4) One of the features from 1) would be the distributed nature > > > of BK, I guess? Are there any thoughts on how to handle this? > > > > I thought I'd first think about having it work very well, locally. > > We don't need it to be distributed for either of the first two > > applications, that is, preparing patch sets and acting as a > > repository for the patchbot. Or, another way of putting that is, > > Larry already provides us a way for it to be distributed, through > > his pull. And no, I haven't thought at all about how technically > > difficult it will be to support a BK pull yet. There might even > > be legal questions of whether Larry's patents allow us to support > > a BK pull, and if so... then I think we'll suddenly find a lot > > more developers on the project, so that possibility doesn't worry > > me. > > One problem is that making something distributed sucks if the > initial design dont't allow for it. No question about that. Now, I've done just enough BitKeeping to know that, even though it's designed for distributed operation from the beginning, it still kind of sucks at it. For one thing, if somebody is cloning, the whole repository is locked for the duration - any push to the repository has to wait. So much for elegance. My strategy is: first propose the functionality, then see how it can be distributed, without making any compromise to the local functionality. Doing the whole design around distributed operation and then having to apologize for nonintuitive behavior on the user side sucks too. Another thing about BitKeeper: you have to do 'bk edit' before you edit, otherwise, if you just chmod +w the file and edit away, it gets screwed up, with no intelligible error messages. Stupid. Also, bk co and bk ci are just stupid wastes of time, in my opinion. The number one design rule as far as I'm concerned is: you can edit your repository just like a normal source tree. It looks and acts just like a normal source tree. The SCM takes care of the details for you. Now... how to make that work. First problem, how can the SCM tell the difference between generated files and source files? Should we just give it a list of files to ignore, as for patch? Is there a more elegant way, perhaps in conjunction with the new kbuild code? (And then, how tied to the kernel is kbuild, and do we care about managing source besides the kernel?) As far as knowing what files the user is editing, and when to update the SCM's db, we will use Linux's dnotify mechanism for that. This part of the design is under control I think, I'll provide a writeup in due course. -- Daniel From rasmus at jaquet.dk Sat May 4 18:13:24 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Sat, 4 May 2002 10:13:24 +0200 Subject: [Prophesy] General questions In-Reply-To: References: <20020501084526.B14505@jaquet.dk> <20020503201528.GE1893@jaquet.dk> Message-ID: <20020504081324.GA1893@jaquet.dk> On Fri, May 03, 2002 at 10:42:27PM +0200, Daniel Phillips wrote: > > Perhaps you could explain a bit more here? This seems like something > > we could work into an advantage. > > Yes. I'd like to somehow turn a 'patch' into an object managed by the > SCM system. So you'd carry not just multiple tree versions, but multiple > patches forward, the way real live developers do. Since developers do it, > it must be possible to automate. But every SCM I've looked at so far > takes a view of the whole tree, a history of commits to it, and a tree of > forks. That is somehow not the whole story There is, in reality, much > more structure to parallel development than that. I'm not quite following you here. I thought BK allowed you to have a number of patches/changes in a tree, update the tree (from Linus) and have the changes carried forward? Or are you talking about carrying a selected patch set forward across multiple tree versions? Or am I just not getting you? :) > > If immediacy is a goal, then there is working code out there > > already... But you know that. > > Right, and ignoring it would be silly. I'm working with BitKeeper now, > Arch and Subversion are on my list, I'm looking at some of the power > features of CVS, and looking at the literature. But mainly, I'm thinking > about things from first principles as is my habit. If you find literature, do share. I'll do my best to get acquainted with some of these tools as well. [distributed discussion] > My strategy is: first propose the functionality, then see how it can be > distributed, without making any compromise to the local functionality. > Doing the whole design around distributed operation and then having to > apologize for nonintuitive behavior on the user side sucks too. Agreed on both points. > Another thing about BitKeeper: you have to do 'bk edit' before you edit, > otherwise, if you just chmod +w the file and edit away, it gets screwed > up, with no intelligible error messages. Stupid. Also, bk co and bk ci > are just stupid wastes of time, in my opinion. The number one design > rule as far as I'm concerned is: you can edit your repository just like > a normal source tree. It looks and acts just like a normal source tree. > The SCM takes care of the details for you. Agreed on the design rule. > > Now... how to make that work. First problem, how can the SCM tell the > difference between generated files and source files? Should we just > give it a list of files to ignore, as for patch? Is there a more > elegant way, perhaps in conjunction with the new kbuild code? (And then, > how tied to the kernel is kbuild, and do we care about managing source > besides the kernel?) While we may make our lives a bit easier coupling the SCM tightly to the kernel, lets not pretend that we magically get a well-controlled build environment. E.g., kbuild 2.5 lets you build objects in the source tree and somewhere somebody actually have a good reason for doing that. My point: If we make crass assumptions like that, we will get flamed. But we could easily go for the 'normal' SCM angle of attack: Have the user say 'these files' or 'this directory'. We could the also offer 'all files in this dir for ever'. > > As far as knowing what files the user is editing, and when to update the > SCM's db, we will use Linux's dnotify mechanism for that. This part of > the design is under control I think, I'll provide a writeup in due course. Yes, dnotify is an elegant way to notice this. Rasmus From phillips at bonn-fries.net Sun May 5 00:37:45 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Sat, 4 May 2002 16:37:45 +0200 Subject: [Prophesy] General questions In-Reply-To: <20020504081324.GA1893@jaquet.dk> References: <20020501084526.B14505@jaquet.dk> <20020504081324.GA1893@jaquet.dk> Message-ID: On Saturday 04 May 2002 10:13, Rasmus Andersen wrote: > On Fri, May 03, 2002 at 10:42:27PM +0200, Daniel Phillips wrote: > > ...I'd like to somehow turn a 'patch' into an object managed by the > > SCM system. So you'd carry not just multiple tree versions, but multiple > > patches forward, the way real live developers do. Since developers do it, > > it must be possible to automate. But every SCM I've looked at so far > > takes a view of the whole tree, a history of commits to it, and a tree of > > forks. That is somehow not the whole story There is, in reality, much > > more structure to parallel development than that. > > I'm not quite following you here. I thought BK allowed you to have a > number of patches/changes in a tree, update the tree (from Linus) > and have the changes carried forward? > > Or are you talking about carrying a selected patch set forward across > multiple tree versions? The second, and carrying multiple, possibly conflicting patches forward. Logically, this is quite difficult, but in practice we do it all the time. So right now I'm casting around for ways of looking at the problem. > > > If immediacy is a goal, then there is working code out there > > > already... But you know that. > > > > Right, and ignoring it would be silly. I'm working with BitKeeper now, > > Arch and Subversion are on my list, I'm looking at some of the power > > features of CVS, and looking at the literature. But mainly, I'm thinking > > about things from first principles as is my habit. > > If you find literature, do share. I'll do my best to get acquainted > with some of these tools as well. Here's a random link I found earlier: http://citeseer.nj.nec.com/context/175867/0 > > Now... how to make that work. First problem, how can the SCM tell the > > difference between generated files and source files? Should we just > > give it a list of files to ignore, as for patch? Is there a more > > elegant way, perhaps in conjunction with the new kbuild code? (And then, > > how tied to the kernel is kbuild, and do we care about managing source > > besides the kernel?) > > While we may make our lives a bit easier coupling the SCM tightly to > the kernel, lets not pretend that we magically get a well-controlled > build environment. E.g., kbuild 2.5 lets you build objects in the > source tree and somewhere somebody actually have a good reason for > doing that. My point: If we make crass assumptions like that, we > will get flamed. > > But we could easily go for the 'normal' SCM angle of attack: Have > the user say 'these files' or 'this directory'. We could the also > offer 'all files in this dir for ever'. There's another, more automatic way: start with a clean tree - at the time the SCM begins to manage it, *all* files are under management (unless told otherwise, and there would be a list of standard exceptions such as hidden files). Then, every file you create, for example, with an editor or cp, will also be managed by the SCM. The SCM will always know when a build is in progress (somehow), and it will not manage files created during the build process. If you decide to start managing a source tree that's already been built from, then some explicit 'import' step, specifying the files to be managed, or alternatively, those not to be managed is going to be necessary. It would be nice to be able to express this partitioning as an arbitrary logical relation, to be handled by the database, it's another way to leverage the fact that we have a full database available. Any ideas on how we can know automatically that a build is in progress? > > As far as knowing what files the user is editing, and when to update the > > SCM's db, we will use Linux's dnotify mechanism for that. This part of > > the design is under control I think, I'll provide a writeup in due course. > > Yes, dnotify is an elegant way to notice this. It's going to need some improving though, in order to be reliable. -- Daniel From rasmus at jaquet.dk Sun May 5 00:49:35 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Sat, 4 May 2002 16:49:35 +0200 Subject: [Prophesy] General questions In-Reply-To: References: <20020501084526.B14505@jaquet.dk> <20020504081324.GA1893@jaquet.dk> Message-ID: <20020504144935.GG1893@jaquet.dk> On Sat, May 04, 2002 at 04:37:45PM +0200, Daniel Phillips wrote: > > Or are you talking about carrying a selected patch set forward across > > multiple tree versions? > > The second, and carrying multiple, possibly conflicting patches forward. > Logically, this is quite difficult, but in practice we do it all the time. > So right now I'm casting around for ways of looking at the problem. (I'm spelling this out in order to be sure I get your meaning.) So we are talking about the situation where you have three trees, all based on a linus one, called 'Daniel's pagethingies', 'rmap refinements' and 'aa cleanup'. All three touch mm/page stuff heavily and the rmap and aa ones are locally refined by you. When Linus updates his tree, you would like to be able to update the base tree and have all three local ones updated as well? And does BK not do this for you? (Apparently not, but I fail to see why not.) > Any ideas on how we can know automatically that a build is in progress? There is probably some kbuild (2.5) only files we could do heuristics on but that would couple us thightly to that version of kbuild... If we do this as a kernel-only SCM, we could also start out with a dontdiff filter already in place. > > Yes, dnotify is an elegant way to notice this. > > It's going to need some improving though, in order to be reliable. ? Do elaborate. Rasmus From phillips at bonn-fries.net Sun May 5 02:59:32 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Sat, 4 May 2002 18:59:32 +0200 Subject: [Prophesy] General questions In-Reply-To: <20020504144935.GG1893@jaquet.dk> References: <20020501084526.B14505@jaquet.dk> <20020504144935.GG1893@jaquet.dk> Message-ID: On Saturday 04 May 2002 16:49, Rasmus Andersen wrote: > On Sat, May 04, 2002 at 04:37:45PM +0200, Daniel Phillips wrote: > > > Or are you talking about carrying a selected patch set forward across > > > multiple tree versions? > > > > The second, and carrying multiple, possibly conflicting patches forward. > > Logically, this is quite difficult, but in practice we do it all the time. > > So right now I'm casting around for ways of looking at the problem. > > (I'm spelling this out in order to be sure I get your meaning.) > So we are talking about the situation where you have three trees, > all based on a linus one, called 'Daniel's pagethingies', 'rmap > refinements' and 'aa cleanup'. All three touch mm/page stuff > heavily and the rmap and aa ones are locally refined by you. > > When Linus updates his tree, you would like to be able to update > the base tree and have all three local ones updated as well? And > does BK not do this for you? (Apparently not, but I fail to see > why not.) I don't know exactly what happens when you pull from Linus's tree and you have incompatible changes in yours. Please feel free to educate me here, as I am not an experienced BitKeeper user. A major feature of the BitKeeper model that I don't like is the tree model. In fact, independent developers don't maintain their trees according to a strict heirarchy descending from a common parent. It's more like a general net, with developers exchanging bits and pieces with each other in an arbitrary graph. Another way of looking at this is, two different developers should be able to download a Linux tarball from kernel.org, each make their own changes, then later decide they want to exchange certain changes with each other. With Bitkeeper you'd have to mess around to make that happen - on or the other of the developers would have to clone the repository of the other and drop back to the patch way of doing things, to manually import their changes. This is BS, we want such exchanges to be entirely natural. Now, what I'm thinking about here has something to do with the traditional LISP distinction between EQUAL and EQ, the latter being true when we establish that two things really are the *same* object, and don't just have the same values. In the LISP case, EQ basically means the addresses of the objects are the same. So we want our source tree to be made up of objects, and each object will have an 'address', which I will call an 'id', which is perhaps derived from the developer's email address, the date the source tree first came under management, and an object sequence number (the latter generated by a counter kept in the root of the source tree). Objects have generations, each generation being a delta. A 'default' object holds all thes ource in the tree that is not included in any other object. Any state of the source tree can therefore be represented as a set of (object, generation) tuples. When two developers wish to match up their trees and exchange object deltas (aka patch set, aka change set) in a precise way, the first thing we want to do is establish a correspondence between objects in the two trees. This will be a list of tuples of the form (id, id), which establishes that the respective objects are EQ. There are a variety of strategies we could use to build the correspondence. We don't have to stick to just one strategy or require it to be entirely automatic. For example, we could consult an already-under-management version of our source tree, and import all the object IDs from it that correspond to objects that are found to be exactly EQUAL. We can also do the equality tests, somewhat less efficiently (because we also have to partition the source into objects) between our two trees. Or we can simply interpret each file as one object, for the purpose of building the correspondence, though we would still want to respect the way that the owners of the two trees have already partitioned their source into objects. Next question: what is an object? Can objects contain objects? I would like an 'change' to be an object, that is, I would like to be able to see how a patch evolves, just as we can see how the underlying source evolves as well. Bitkeeper question: once we apply a changeset to a source base, does the Bitkeeper database continue to maintain the identity of the changset as we carry the source base through subsequent reversions? I.e., will Bitkeeper let us talk about the 'htree' patch, and let the thing evolve along with the source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 version, etc? -- Daniel From rasmus at jaquet.dk Mon May 6 18:39:03 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Mon, 6 May 2002 10:39:03 +0200 Subject: [Prophesy] General questions In-Reply-To: ; from phillips@bonn-fries.net on Sat, May 04, 2002 at 06:59:32PM +0200 References: <20020501084526.B14505@jaquet.dk> <20020504144935.GG1893@jaquet.dk> Message-ID: <20020506103903.C13935@jaquet.dk> On Sat, May 04, 2002 at 06:59:32PM +0200, Daniel Phillips wrote: > I don't know exactly what happens when you pull from Linus's tree and you > have incompatible changes in yours. Please feel free to educate me here, as > I am not an experienced BitKeeper user. I'll have to try this. I think that my current third-hard impressions are too vague for this. > > A major feature of the BitKeeper model that I don't like is the tree model. > In fact, independent developers don't maintain their trees according to a > strict heirarchy descending from a common parent. It's more like a general > net, with developers exchanging bits and pieces with each other in an > arbitrary graph. Another way of looking at this is, two different developers > should be able to download a Linux tarball from kernel.org, each make their > own changes, then later decide they want to exchange certain changes with > each other. With Bitkeeper you'd have to mess around to make that happen - > on or the other of the developers would have to clone the repository of the > other and drop back to the patch way of doing things, to manually import > their changes. This is BS, we want such exchanges to be entirely natural. The BK equivalent of two different developers downloading source from kernel.org would be cloning local repositories from linus', no? Then it seems like the above reduces to BK interfacing with the outside (non-BK) world, which is then problematic? Or am I missing something? AFAIK, the strict 'tree' view of the revision history is due to the fundamental distributed view of the process here, with all trees being seens (designwise) as a replicas of a distributed filesystem. > Now, what I'm thinking about here has something to do with the traditional > LISP distinction between EQUAL and EQ, the latter being true when we > establish that two things really are the *same* object, and don't just have > the same values. In the LISP case, EQ basically means the addresses of the > objects are the same. So we want our source tree to be made up of objects, > and each object will have an 'address', which I will call an 'id', which is > perhaps derived from the developer's email address, the date the source tree > first came under management, and an object sequence number (the latter > generated by a counter kept in the root of the source tree). Objects have > generations, each generation being a delta. A 'default' object holds all > thes ource in the tree that is not included in any other object. Any state > of the source tree can therefore be represented as a set of (object, > generation) tuples. I agree on this object view, even though we should call them closures, in proper LISP terminology :) Warning, this is about it for my LISP knowledge. [More object think] I agree on this. I'll have to think a bit about this in order to wrap my brain around it. > Next question: what is an object? Can objects contain objects? I would like > an 'change' to be an object, that is, I would like to be able to see how a > patch evolves, just as we can see how the underlying source evolves as well. I think objects can contain objects. That way merges can be objects too, containing the objects merged as its revision history. > > Bitkeeper question: once we apply a changeset to a source base, does the > Bitkeeper database continue to maintain the identity of the changset as we > carry the source base through subsequent reversions? I.e., will Bitkeeper > let us talk about the 'htree' patch, and let the thing evolve along with the > source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 > version, etc? I dont think so but it is a BS guess. One problem would be later changes modifying htree code, making extraction of the htree patch difficult. Another question is your described effort in managing change sets for your patches. Could you decribe this a bit so we could see if we could envision something to make that easier? Rasmus From rasmus at jaquet.dk Tue May 7 04:00:36 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Mon, 6 May 2002 20:00:36 +0200 Subject: [Prophesy] General questions In-Reply-To: <20020506103903.C13935@jaquet.dk> References: <20020501084526.B14505@jaquet.dk> <20020504144935.GG1893@jaquet.dk> <20020506103903.C13935@jaquet.dk> Message-ID: <20020506180035.GA1669@jaquet.dk> On Mon, May 06, 2002 at 10:39:03AM +0200, Rasmus Andersen wrote: > I'll have to try this. I think that my current third-hard impressions > are too vague for this. That would be 'third-hand', of course. > > Next question: what is an object? Can objects contain objects? I would like > > an 'change' to be an object, that is, I would like to be able to see how a > > patch evolves, just as we can see how the underlying source evolves as well. > > I think objects can contain objects. That way merges can be objects too, > containing the objects merged as its revision history. That was perhaps a bit too short: I was thinking of a change/patch as an object. A patchset would then be a collection of objects, still with some of the properties of the basic object (comments, generations), and a merge would then be a new patchset with the aggregated/resolved constituting patchsets, comments etc. > > Bitkeeper question: once we apply a changeset to a source base, does the > > Bitkeeper database continue to maintain the identity of the changset as we > > carry the source base through subsequent reversions? I.e., will Bitkeeper > > let us talk about the 'htree' patch, and let the thing evolve along with the > > source, so that we could pull out the 2.4.17 version of htree, or the 2.4.18 > > version, etc? > > I dont think so but it is a BS guess. One problem would be later > changes modifying htree code, making extraction of the htree patch > difficult. Another clarification: The first sentence above goes for the BK question. The rest is more general; how to keep conflicting patches seperate? > > Another question is your described effort in managing change sets > for your patches. Could you decribe this a bit so we could see if > we could envision something to make that easier? Rasmus From phillips at bonn-fries.net Wed May 29 03:05:03 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Tue, 28 May 2002 19:05:03 +0200 Subject: [Prophesy] Postgres and Python Message-ID: Hi all (especially Rasmus), I have not forgotten about our design project here, in case you were wondering. In fact, in light of recent developments, specifically, the threat of further enroachment of commercialism on core open source projects leads me to believe more than ever that we must follow through on what we set out to do. (And I will remark here, that my attitude is pro-commerce in the sense that core open source projects are a commons on which even commercial users rely. It is in the interest of commerce, as well as lovers of intellectual freedom, to protect our core projects.) So today's topic is Postgres. We must leverage our advantages, and having free use of a full SQL database that we can, if we need to, customize in any way we want, is one of them. So I have been working with postgres and python, to see how well they hold up together. The answer is 'very well'. There are a number of packages that provide Python access to Postgres. All these packages implement a standard database interface class called the "Python Database API Specification": http://www.python.org/topics/database/DatabaseAPI-2.0.html Basically, this lets you pass SQL query strings to a database, provides convenient methods for retrieving the results, and miscellaneous functions for controlling such things as commit/rollback. The particular implementation that worked for me is a package called "psycopg". For Debian users: apt-get install python2.2-psycopg. Here's a sample session: su postgres # normally the postgres superuser, can create other users createdb mydb # somewhere to start psql # make sure everything worked \q # out of here python >>> import psycopg >>> db=psycopg.connect("dbname=mydb") >>> cursor = db.cursor >>> db.cursor().__methods__ >>> cursor.execute("CREATE table foo(bar int, zot date)") >>> cursor.execute("INSERT INTO foo VALUES (123, '1/2/2002')") >>> cursor.execute("INSERT INTO foo VALUES (456, '2/4/2002')") >>> cursor.execute("select * from foo") >>> data=cursor.fetchone() >>> print data >>> print cursor.fetchall() >>> (caveat: I haven't actually tried this example to make sure it works) I found psycopg (psycho pig?) very nice to work with. As far as complaints go, there is good support for result retrieval, but no support for data insertion - it seems, you just make up SQL strings containing the data and submit them. This needs to be strengthened. Good thing we have the source, right? Note that the Python db interface does not tie you to SQL, however, all the SQL strings you have to write in order to get anything done certainly do tie you. So, in my opinion this all has to be abstracted more, and every application should start off by doing that. Oh well, the fact remains that this is an excellent place to get started, and Python with this package is a far more capable interactive interface to a db than, for instance, psql is, or a graphical database shell would be. -- Daniel From phillips at bonn-fries.net Thu May 30 08:44:20 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Thu, 30 May 2002 00:44:20 +0200 Subject: [Prophesy] String transformations Message-ID: I'm still not sure that it's a good idea to build a whole SCM from scratch, however, *if* I was going to do that I'd start with some good basic operations, combined with some notion of how they fit in with the grand scheme. As far as the grand scheme goes, the idea is that we will maintain a database of differences between predecessor versions of files. We will also maintain information about how the files were created, deleted and moved around in the source tree, but that's not the subject of today's post. Instead, I'll focus on how the differences between files are to be maintained. And I'm going to get way more detailed than perhaps is justified at this stage, just because I feel that way today. In other words, watch out: bit bashing alert. I'm looking at each file as a binary string. To get a precessor version of a file, we retrieve some kind of transformation information from the database and apply it to the string consisting of the current contents of the file, yielding the immediate precessor of the file. Of course, every SCM does this in some way, nonetheless, I feel compelled to invent yet another way of doing it. My philosophy is that these string transformations are exact, and so there exist far more efficient and general means of encoding and applying the needed transformations than, for example, diff and patch. Not to mention simpler. So let me jump straight into my detailed design. We can see the process of transforming of any binary string into any other as a sequence of simple operations carried out on an input and output string. Working from left to right in the input string, the following three operations are sufficient: text - append literal text to the destination string copy - append bytes from the source to destination string skip - skip over bytes in the source string Additionally, we may wish to optimize the case where a transformation simply moves text from one place to another: move - append text from an arbitrary location in the input string To see why the move operation is desirable, consider that without it, moving a block of text from one place to another in a file results in the block of text needing to be recorded in the database, whereas using the move operation, that same block of text can be obtained from the current version of the file. This represents a significant space savings. Presumeably, the transformation strings are to be compressed when stored in the database, but since the original version of the file is not encoded in the database, it is not possible to rely on compression to avoid encoding the moved block of text literally. My goal, therefore, is to design the transformation encoding in a way that works well together with compression. Each of the above operations takes a count parameter, specifying how many bytes to append to the output string. Additionally, the 'text' operation takes 'count' bytes of literal text, and the 'move' operation takes an parameter specifying where the text to be copied is located in the input string. Now I need to encode these operations in a way that is hopefully compact and easy to evaluate. I observed that in each case I have a count and an operation, so I would like to encode that in a single number. I also have to worry about the range of that number: I'd like to encode most of the command/count values in a single byte. To accomplish this, I introduce an additional primitive operation, which supplies some high order bits for the count of a following operation: high - supply additional high order count bits for a following operation Since I am a miser with memory, I decided to allocate only two bits for the command encoding: 0 = text 1 = copy 2 = skip 3 = high Expressed as macros, we have: text(n) copy(n) skip(n) high(n) which generates an operation code by shifting the operation number into the high order two bits of parameter 'n', which is thus limited to six bits. The high operation can appear several times in a row, each time an additional six bits, with the most significant bits appearing first. The move operation cannot be accomodated in this scheme, and needs some different encoding. That's ok, the move operation is different anyway in that it needs two numeric parameters. Fortunately, there are a number of possible operations that are no-ops when the numeric parameter is zero, and these are thus available for use as escape codes. My plan is to encode the move operation - which is only an optimization - as a triple: copy(0), copy(position), copy(count) That's enough on 'move' for now. I did not implement it, but I did accomodate it in the design. Additionally, text(0) indicates the end of the transformation sequence. That leaves two escape codes, skip(0) and high(0), for future expansion, the latter being available only when it appears in leading position, since high(0) may well appear in the less significant bytes of a large numeric parameter. Leaving out 'move', we end up with a simple implementation: void transform(uchar *ops, uchar *src, uchar *dst) { unsigned c, count = 0; while ((c = *ops++)) { count = (count << 6) | (c & 0x3f); switch (c >> 6) { case textop: memcpy(dst, ops, count); dst +=count; ops += count; count = 0; break; case copyop: memcpy(dst, src, count); dst +=count; case skipop: src +=count; count = 0; break; } } } The 'ops' string controls the transformation of 'src' into 'dst'. For example, the ops string: copy(2), skip(4), text(6), "foobar", copy(5), 0 Transforms the input string: "I love lucy" into the output string "I foobar lucy" (Note that we could easily express the above in terms of stream IO operations, since all operations are sequential. However, it's doubtful whether there is any need to do that on modern machines, and in any event, the move operation would present something of a problem.) Given an operation string, we can compute the length of both the input and output strings, as follows: struct transinfo {int in; int out;} transcheck(uchar *ops) { unsigned c, count = 0, ilen = 0, olen = 0; while ((c = *ops++)) { count = (count << 6) | (c & 0x3f); switch (c >> 6) { case textop: olen +=count; ops += count; count = 0; break; case copyop: olen +=count; case skipop: ilen +=count; count = 0; break; } } return (struct transinfo) {ilen, olen}; } This function ought to take the length of the operation string as a parameter as well, and ensure that the termination of the sequence occurs at exactly that length. Given an input string and a transformation string, we can compute the inverse transformation string that converts the resulting output string back to the input string. This interesting exercise is left to the reader ;-) Demonstration code attached. -- Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: transform.c Type: text/x-c Size: 1384 bytes Desc: not available URL: From phillips at bonn-fries.net Fri May 31 11:50:39 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Fri, 31 May 2002 03:50:39 +0200 Subject: [Prophesy] Binary data and Postgres/SQL Message-ID: Today I'm thinking a little about how best to interface the scm to the database, particularly how to get the binary transforms[1] in and out. It seems to me the sql COPY command does what we need: http://postgresql.org/users-lounge/docs/7.2/postgres/sql-copy.html I don't like the idea at all of forming such data into ascii strings, as part of an INSERT command. So the proposed strategy is the place the data in a temporary file and issue a COPY command. A ramfs mount will do nicely for this. We can use this technique to extend Python's database interface to handle inserts in a way similar to retrieves, if we want, and for databases that can't handle such a thing, the interface can fall back to forming the ascii INSERT string, as you must do now anyway. Of course, improving the Python interface isn't our immediate concern, it's just nice to know that we can. In my opinion, the INSERT command should only ever be used for data that is ascii by nature, such as parameters read from a config file or input by a user. [1] See yesterday. -- Daniel From phillips at bonn-fries.net Fri May 31 18:00:02 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Fri, 31 May 2002 10:00:02 +0200 Subject: [Prophesy] Improved string transformation In-Reply-To: References: Message-ID: Today I added support for the 'move' operation to the string transforma, roughly doubling the size of the state network, leading me to reflect on how much a slight irregularity in an encoding scheme can bloat up an implementation. Oh well, it's still reasonably tight and efficient, and it is not going to grow more any time in the near future, except to add more error checking in the transinfo function. The idea is that transform itself will have little or no error checking. We will always have run transinfo in the operation string sometime before we run the transform. If a transform is to be stored in the database, we will also store the lengths of the input and output strings, as calculated by transinfo and checked against the known lengths. Yes, this is micro-optimizing, but I like to keep the low level things light and tight, it makes me feel better. Notice how the three primitive operations skip, text and copy map onto the diff codes '+', '-' and ' '. This is no accident, these are in fact the same thing, just more loosely expressed, in human-readable form. Which leads to the observation that we can start generating transform strings without a whole lot of effort by converting diff files. This is indeed something we want to do, even after we have code for generating transform strings directly. On the general theme of using the power tools available, I'm thinking about generating a Bison parser to parse diff files into transforms, and doing the job properly. Now that move is done, the transform engine is pretty much complete. Move wasn't too hard to implement, but it will be a little tricky to generate code for. That's ok, the transform generator can stick with the three simple operations as long as it likes, since the move operation is nothing more than a space-saving optimization. This is on the theme of forward compatibility. In this case we can upgrade the transform generator at any time, and new databases can will be hancled by early versions of the system. This is a nice result when you can get it. The attached code demonstrates the move operation in action. -- Daniel -------------- next part -------------- A non-text attachment was scrubbed... Name: transform.c Type: text/x-c Size: 2248 bytes Desc: not available URL: From rasmus at jaquet.dk Fri May 31 18:28:06 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Fri, 31 May 2002 10:28:06 +0200 Subject: [Prophesy] Re: Improved string transformation In-Reply-To: ; from phillips@bonn-fries.net on Fri, May 31, 2002 at 10:00:02AM +0200 References: Message-ID: <20020531102806.B3135@jaquet.dk> On Fri, May 31, 2002 at 10:00:02AM +0200, Daniel Phillips wrote: > Today I added support for the 'move' operation to the string transforma, > roughly doubling the size of the state network, leading me to reflect on how > much a slight irregularity in an encoding scheme can bloat up an > implementation. Oh well, it's still reasonably tight and efficient, and it > is not going to grow more any time in the near future, except to add more > error checking in the transinfo function. Hi Daniel. Trying to reply to you is like hitting a moving target :) This is a shoret one; I'm otherwise occupied. BTW: I'm on the prophesy list; no need to cc me seperately. It seems to me that we (you) are attacking some of the lower parts of an SCM before looking at the higher level ones. I dont think that would lead you current efforts to be wasted but sometimes I feel comfortable having thought roughly about things before doing them. Of course, that often also leads me to sit on my hands all day. Anyway, some higher level concers I can list off the top of my head would be: o branches o merges o distribution o providing usable change overviews and groupings based on dnotify recorded changes This is a terse list and, as I tried to imply, it is probably independent of what you are doing now. If nothing else comes of this mail, take it as a reassurance that somebody out here is actually reading your mails :) Regards, Rasmus From phillips at bonn-fries.net Fri May 31 20:05:26 2002 From: phillips at bonn-fries.net (Daniel Phillips) Date: Fri, 31 May 2002 12:05:26 +0200 Subject: [Prophesy] Re: Improved string transformation In-Reply-To: <20020531102806.B3135@jaquet.dk> References: <20020531102806.B3135@jaquet.dk> Message-ID: On Friday 31 May 2002 10:28, Rasmus Andersen wrote: > On Fri, May 31, 2002 at 10:00:02AM +0200, Daniel Phillips wrote: > > Today I added support for the 'move' operation to the string transforma, > > roughly doubling the size of the state network, leading me to reflect on how > > much a slight irregularity in an encoding scheme can bloat up an > > implementation. Oh well, it's still reasonably tight and efficient, and it > > is not going to grow more any time in the near future, except to add more > > error checking in the transinfo function. > > Hi Daniel. > > Trying to reply to you is like hitting a moving target :) This is a > shoret one; I'm otherwise occupied. BTW: I'm on the prophesy list; > no need to cc me seperately. > > It seems to me that we (you) are attacking some of the lower parts of > an SCM before looking at the higher level ones. Oh yes, very much so. I like to do that, just to help get my mind wrapped around the problem. It's especially nice when you can see a little part of the problem that breaks out and doesn't depend a lot on the high level design. It's like memcpy, you don't have to think too much about the details of the applications are going to use it, just make it go as fast as possible and have as simple a form as possible. > I dont think that > would lead you current efforts to be wasted but sometimes I feel > comfortable having thought roughly about things before doing them. You gotta speak up ;-) > Of course, that often also leads me to sit on my hands all day. Judging by your previous work, I'd say that's a slight exaggeration. > Anyway, some higher level concers I can list off the top of my > head would be: > > o branches > o merges > o distribution > o providing usable change overviews and groupings based on dnotify > recorded changes > > This is a terse list and, as I tried to imply, it is probably > independent of what you are doing now. Indeed. I'm mainly thinking about one are that's very important to me, personally, and isn't on your list, and that is: editing. I have this idea that the fact you're using a SCM should be nearly totally transparent. You just edit your files and the SCM takes can of making sure that nothing is every forgotten. There are still a few more pieces of that puzzle to put in place, but as soon as it gets there, we *already have something useful*. That said, let me do a little musing on the points you mentioned, which are also very important. I'd like to try to see all of the points you mentioned as high level database problems and get a some primitives in place to help us think about what we can do at the high level. So - soon we will have nice fast transforms, and we already have the idea that the transforms are applied backwards, starting from the current version on disk. Since the transforms are fast, we can get lazy and apply an awful lot of them to do certain things, i.e., to make old versions of the code materialize quickly, for further editing, or for comparison against other versions. I'm just going to throw some of my random thoughts on the table. Don't assume any of the following is correct, it's just a starting point for discussion. > o branches First we should think about tree nodes. A tree node is any place that we have set a checkpoint, that is, a tree state that we can restore. Between any two nodes - and that includes nodes on different branches - we have or can compute a delta. In general, we will use the structure of the tree to compute the delta. A branch is something defined by the user. It's simply a name that gets carried from node to node, as the user sets checkpoints, along with an incrementing generation number. Sometimes the tree will fork, starting a new branch (duh). At one of the tree nodes we will find the current copy of the source code, that is, the copy on disk. It does not have to be at the end of a branch, it can be anywhere. That's why it's important to be able to invert transformations quickly. (Hey, when is somebody going to rise to my challenge of stating the algorithm for generating an inverse transform?) For safety's sake we probably want to leave 'cached' copies of tips of branches somewhere on disk or in the database, so that we don't have to completely rely on the transform machinery to get us from node to node and back again. A related idea is that we sometimes want to have two nodes of the tree expressed on disk at the same time. There are, of course, many different ways of expressing the delta between two nodes in terms of particular transformations. This fact is what makes all of this interesting. > o merges A merge is the process of applying a subset of transformations that express the delta between a pair of nodes to some other node. We need to apply some extra constraints to the transformations involved, for example, instead of just skipping text, we normally will want to ensure that the text skipped in the original node of the delta is the same as the text skipped in the target node. In order to accomplish that, we may need to alter the transformations in various ways. As with good old patch, we will sometimes need to refer to context to decide how to change the transformations. Merging is related to branching in much the same way that integration is related to differentiation. We may want to borrow some of the same techniques that are used for automatic symbolic integration. And there is a very short - too short - treatment of merging. > o distribution I'm having a few glimmers of ideas about what to do there. I think I mentioned one already - each node will be partitioned into regions, each of which will have an id which is unique in the universe - more or less (incorporating email addresses in the ids makes this come true for practical purposes). Anyway, essentially, we want to do a merge between two branches in two separate repositories, so I think what we want to do is first create, in the destination repository, clones of the two nodes in the source repository that generated the set of transformations we have decided we want to send. Then a normal merge is done in the destination repository. Easy? Original for sure. Now, the thing about those cross-repository clones is that we don't want to actually send the whole tree. We want only to send the objects that the destination repository doesn't already have, and this is where the universal object ids come in. We are not going to necessarily rely on common parentage to establish the equivalence of two objects - we will sometimes compare the objects, and decide that they are actually the same object. To speed this up over a remote link, we can just compare hashes of objects. The result of such objects deemed to be equivalent is that we will set up a mapping between to the two repositories that expresses the equivalence of objects, and we will also allow either of the repositories to rename any particular object so that it is exactly equivalent. We can call this process 'melding'. (Hey, time to start writing patents. Well anyway, let me extract a promise right now, that by staying on this list you are making a promise to me to respect the confidence of this work until we release it publicly. We do not want certain purveyors of close source software going off and writing patents on the work we're doing. Given all that has happened recently, very little would surprise me any more.) > o providing usable change overviews and groupings based on dnotify > recorded changes I'm now thinking that dnotify is the wrong model, and what we really want to to mount a magic filesystem over the mount point of the directory we want to manage. The magic filesystem will trap all the file changes and call the the scm, then call the real filesystem. Very simple. As far as change overviews go, I think I'm a long way from even thinking about that. A lot more of the basic ideas have to be in place first. Having a full database around that we can do arbitrary queries on should help quite a lot. If you have some specific ideas, don't be shy... > If nothing else comes of this mail, take it as a reassurance > that somebody out here is actually reading your mails :) Oh, I know you are. Everybody gets busy. I wasn't even sure I was going to go through with this, but now I think I am - I can see something that already works, and in a cool and unique way - not too far off. -- Daniel From rasmus at jaquet.dk Fri May 31 21:36:00 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Fri, 31 May 2002 13:36:00 +0200 Subject: [Prophesy] CM site Message-ID: <20020531133600.B4082@jaquet.dk> I haven't had the time to look closer, but this site _might_ have something interesting in their 'papers' section. http://www.cmtoday.com/yp/configuration_management.html Rasmus From rasmus at jaquet.dk Fri May 31 21:48:28 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Fri, 31 May 2002 13:48:28 +0200 Subject: [Prophesy] CM site In-Reply-To: <20020531133600.B4082@jaquet.dk>; from rasmus@jaquet.dk on Fri, May 31, 2002 at 01:36:00PM +0200 References: <20020531133600.B4082@jaquet.dk> Message-ID: <20020531134828.C4082@jaquet.dk> On Fri, May 31, 2002 at 01:36:00PM +0200, Rasmus Andersen wrote: > I haven't had the time to look closer, but this site _might_ > have something interesting in their 'papers' section. > > http://www.cmtoday.com/yp/configuration_management.html But looking closer, there is not. Sorry for going off to fast. Rasmus From rasmus at jaquet.dk Fri May 31 21:59:41 2002 From: rasmus at jaquet.dk (Rasmus Andersen) Date: Fri, 31 May 2002 13:59:41 +0200 Subject: [Prophesy] Re: Improved string transformation In-Reply-To: ; from phillips@bonn-fries.net on Fri, May 31, 2002 at 12:05:26PM +0200 References: <20020531102806.B3135@jaquet.dk> Message-ID: <20020531135941.D4082@jaquet.dk> On Fri, May 31, 2002 at 12:05:26PM +0200, Daniel Phillips wrote: > I'm just going to throw some of my random thoughts on the table. Don't > assume any of the following is correct, it's just a starting point for > discussion. Mine is going to be equally random, just much shorter :) I'll try to think a bit more about this later. > > o providing usable change overviews and groupings based on dnotify > > recorded changes > > I'm now thinking that dnotify is the wrong model, and what we really want to > to mount a magic filesystem over the mount point of the directory we want to > manage. The magic filesystem will trap all the file changes and call the the > scm, then call the real filesystem. Very simple. > > As far as change overviews go, I think I'm a long way from even thinking > about that. A lot more of the basic ideas have to be in place first. Having > a full database around that we can do arbitrary queries on should help quite > a lot. Like with dnotify, I think that the grouping and manageability of changes coming through a magic FS is going to suffer. And I think that this is one of the cardinal weak points in CVS, and thusly one where we should aim for being strong. But I have no good ideas on how to handle this and still get transparency. Rasmus