[Prophesy] Versioning filesystem
Daniel Phillips
phillips at arcor.de
Mon Mar 24 09:39:44 EST 2003
Chances are I'm talking to myself, after not doing anything here for nine
months or so. That doesn't mean things haven't been happening.
Specifically, I've been introspecting.
The main subject of introspection has been how I'd go about implementing a
versioning filesystem, and even where that seems like a good thing to do. I
basically beat my head against a wrong approach for most of the nine months,
pursing the idea of hooking out file_operations from a vfs a path_walk. I
thought that would be the most efficient way to hook the thing up, because
writes could be specially handled, whereas reads would just follow the normal
path. This never worked out cleanly. The vfs just isn't set up that way,
and would have required major surgery. Besides that, I gradually realized
that I did not always want to pass reads straight through. This would my
design options by not allowing me to generate the read data on the fly. Then
I saw the light, by realizing that Martin Poole already had the right idea
with his newuserfs.
Newuserfs is a forward port of Jeremy Fitzhardinge's userfs, which works by
passing vfs operations through a pipe to user space.
After thinking about this a short time, I realized that I could start with
ramfs, which implements full posix semantics and just bolt that onto a
usermode daemon with the socket. There are a number of right things about
this approach, not least of which is the fact that the stack never gets very
deep for either the task calling for file operations or the server
implementing them. This is because the kernel does a task switch to the
server each time a complex low-level file operation needs to be done, and the
stack-hungry things happen in user space. There's no recursive calling into
the kernel.
Another right thing is the way caching works with this approach, specifically
the page cache and dcache. For both, the vfs only needs help from the
usermode daemon when some name or file data isn't in its cache. So the
usermode implementation can be quite slow and the cache will cover that up.
Not that I want to make the usermode part slow, but in theory it could be,
especially if there is database access and application of a chain of file
differences going on.
So I started implementing this about 10 days ago and have been occupied with
it since. Things are going pretty well, to the point I could think about a
code release in a week or two. The project has a name:
Stuf - STackable Usermode Filesystem
which is actually not specific to versioning filesystems. A particular
filesystem is implemented by a usermode server daemon that implements Stuf's
socket protocol (which I call "beads"). The sever I'm working on now is
called "simple" and just passes filesystem operations through to the
underlying filesystem. After that is working reasonably well, to the point
that you can, say, compile a kernel on the stacked filesystem, I'll move on
to a versioning server.
At this point I can mount a filesystem with the "stuff" command (Stuf
Frontend), fork the server, connect the pipe, generate and pass FDs for both
the mounted and underlying filesystem through the pipe. The server can ioctl
the virtual filesystem to take care of special needs that can't be satisfied
by (or would be too slow and racy with) posix operations. I can now pass
open(2) requests through through the pipe, and am currently busy implementing
a new system call that can open a file, given a directory fd and a name.
There's been a lot of work on SCM high level design considerations done on
the Arch mailing list, including what needs to be done to satisfy the
requirements of kernel developers. It seems to me, that much of what has
been discussed is suitable for implementation as a versioning filesystem, and
so I have set out to do that.
Regards,
Daniel
More information about the Prophesy
mailing list