[Prophesy] Improved string transformation
Daniel Phillips
phillips at bonn-fries.net
Fri May 31 18:00:02 EST 2002
Today I added support for the 'move' operation to the string transforma,
roughly doubling the size of the state network, leading me to reflect on how
much a slight irregularity in an encoding scheme can bloat up an
implementation. Oh well, it's still reasonably tight and efficient, and it
is not going to grow more any time in the near future, except to add more
error checking in the transinfo function.
The idea is that transform itself will have little or no error checking. We
will always have run transinfo in the operation string sometime before we run
the transform. If a transform is to be stored in the database, we will also
store the lengths of the input and output strings, as calculated by transinfo
and checked against the known lengths. Yes, this is micro-optimizing, but I
like to keep the low level things light and tight, it makes me feel better.
Notice how the three primitive operations skip, text and copy map onto the
diff codes '+', '-' and ' '. This is no accident, these are in fact the same
thing, just more loosely expressed, in human-readable form. Which leads to
the observation that we can start generating transform strings without a
whole lot of effort by converting diff files. This is indeed something we
want to do, even after we have code for generating transform strings directly.
On the general theme of using the power tools available, I'm thinking about
generating a Bison parser to parse diff files into transforms, and doing the
job properly.
Now that move is done, the transform engine is pretty much complete. Move
wasn't too hard to implement, but it will be a little tricky to generate code
for. That's ok, the transform generator can stick with the three simple
operations as long as it likes, since the move operation is nothing more than
a space-saving optimization. This is on the theme of forward compatibility.
In this case we can upgrade the transform generator at any time, and new
databases can will be hancled by early versions of the system. This is a
nice result when you can get it.
The attached code demonstrates the move operation in action.
--
Daniel
-------------- next part --------------
A non-text attachment was scrubbed...
Name: transform.c
Type: text/x-c
Size: 2248 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/prophesy/attachments/20020531/df960684/attachment.bin>
More information about the Prophesy
mailing list