[Prophesy] Diff to transform conversion

Daniel Phillips phillips at bonn-fries.net
Sun Jun 2 17:50:02 EST 2002


Here's the algorithm for diff to transform conversion.  We will not try to 
generate the 'move' operation for the time being.

For convenience, the 'emit' operation described below will accumulate and
merge sequences of the same operation into a single operation.  This just 
requires remembering what the last operation was, and how long it is (the 
start is just a negative offset from the current input position).  When the 
operation being emitted is not the same as the last operation emitted, the 
appropriate operation is appended to the operation string.  Finally, after 
processing the entire, a copy is emitted.  Zero length operations are 
discarded.

The basic idea is to process the input text sequentially.  We will keep track 
of both current input line number and byte position.  We don't have to look 
at the output text at all, except that we may wish to actually apply the 
patch to ensure that the result of applying the patch or the transform is the 
same.

Below, when we copy or skip a line, or emit a line of text, we also account 
for the trailing end-of-line.  Strange things happen if there is no 
end-of-line at the end of the input, output or diff file.  Worry about that 
later.

The algorithm proper:

   Find the beginning of a patch.
   The pattern is: "---" <text> <endline> "+++" <text> <endline>

   While the next text is "@@" (beginning of chunk):

       Get the input line number and count, and the output line number and 
       count from the chunk header line.  Ensure the line numbers are
       monotonically increasing.  (The output line and count are not used
       in the algorithm below, but could be used for error checking.)

       Emit a copy from the current input position to the chunk's input
       line number, and advance the input position to the cunk's input line

       For each line of the chunk, until the current input line equals the 
       chunks's input line number plus the chunk's input line count:

           If the line begins with '-', emit a skip as long as the line
           If the line begins with '+', emit a text as long as the line
           If the line begins with ' ', emit a copy as long as the line
      
   Finally, emit a copy from the current input position to the end of
   the input text, and flush it to the operation string.

I think I'll try coding this in C, with the help of the regex library, though 
I know it would be easier in Python, and Rasmus has already written some nice 
regex's in Python for handling diffs.  However, the transform applying code 
is already in C, so the diff parsing code might as well be too.  Of course it 
means that another job coming up very soon is: figuring out how to interface 
Python to C functions.

-- 
Daniel



More information about the Prophesy mailing list