[Prophesy] Diff to transform conversion
Daniel Phillips
phillips at bonn-fries.net
Sun Jun 2 17:50:02 EST 2002
Here's the algorithm for diff to transform conversion. We will not try to
generate the 'move' operation for the time being.
For convenience, the 'emit' operation described below will accumulate and
merge sequences of the same operation into a single operation. This just
requires remembering what the last operation was, and how long it is (the
start is just a negative offset from the current input position). When the
operation being emitted is not the same as the last operation emitted, the
appropriate operation is appended to the operation string. Finally, after
processing the entire, a copy is emitted. Zero length operations are
discarded.
The basic idea is to process the input text sequentially. We will keep track
of both current input line number and byte position. We don't have to look
at the output text at all, except that we may wish to actually apply the
patch to ensure that the result of applying the patch or the transform is the
same.
Below, when we copy or skip a line, or emit a line of text, we also account
for the trailing end-of-line. Strange things happen if there is no
end-of-line at the end of the input, output or diff file. Worry about that
later.
The algorithm proper:
Find the beginning of a patch.
The pattern is: "---" <text> <endline> "+++" <text> <endline>
While the next text is "@@" (beginning of chunk):
Get the input line number and count, and the output line number and
count from the chunk header line. Ensure the line numbers are
monotonically increasing. (The output line and count are not used
in the algorithm below, but could be used for error checking.)
Emit a copy from the current input position to the chunk's input
line number, and advance the input position to the cunk's input line
For each line of the chunk, until the current input line equals the
chunks's input line number plus the chunk's input line count:
If the line begins with '-', emit a skip as long as the line
If the line begins with '+', emit a text as long as the line
If the line begins with ' ', emit a copy as long as the line
Finally, emit a copy from the current input position to the end of
the input text, and flush it to the operation string.
I think I'll try coding this in C, with the help of the regex library, though
I know it would be easier in Python, and Rasmus has already written some nice
regex's in Python for handling diffs. However, the transform applying code
is already in C, so the diff parsing code might as well be too. Of course it
means that another job coming up very soon is: figuring out how to interface
Python to C functions.
--
Daniel
More information about the Prophesy
mailing list