[ccan] New module: bdelta

Sat Aug 20 13:35:45 EST 2011

Attached is an updated version of the patch.

On Fri, Aug 19, 2011 at 10:41 AM, Joey Adams <joeyadams3.14159 at gmail.com> wrote:
> On Fri, Aug 19, 2011 at 2:17 AM, Rusty Russell <rusty at rustcorp.com.au> wrote:
>> A few minor nitpicks from reading the patch (I haven't applied it yet!)
>>
>> 1) I'm always tempted to reuse errno rather than invent my
>> own error codes (ENOMEM, EINVAL and EIO?), but otherwise I would do an
>> enum for the error codes; it just makes the API more explicit.

I decided to go with an enum, called typedef enum {...} BDELTAcode.  I
couldn't find a consensus anywhere on enum names for error codes, so I
went with the convention set by libcurl (i.e. CURLcode, CURLMcode).

I did have to put the internal error codes in the header file, though
(after prefixing them with BDELTA_).

>> 2) Maybe use the term 'char arrays' rather than 'byte strings' as
>>   string has that 0-terminated sense for C coders.

I tried to change it to "byte arrays", but came to realize that
"array", when by itself, isn't as descriptive as "string".  I use the
term a lot in the module, and saying "byte array" every time seems
rather verbose.

I ended up just making simple clarifications in the header
documentation and _info file.  For example, the introduction opens up
with "This library takes two strings containing binary data, and ...".
 If the user knows they're using a "binary diff" library, they'll
probably be able to figure out what type of strings I'm talking about.

>> 4) We really need a good bitstring library: it'd be great for this!

A couple major disadvantages of bdelta:

 (1) Doesn't help with large strings (yet!).  More precisely, if the
difference between two strings is more than 1000 bytes, it reverts to
creating a patch that just dumps the new string.  I could try to
implement a second algorithm for larger strings, but I don't have that
kind of time.  bdelta does what I need it to do for now, and does it
really well.

 (2) Output format isn't compatible with other binary diff formats,
such as the ones used by libxdiff.  This is unfortunate because, for
example, PHP has xdiff bindings already, and bdelta diffs wouldn't be
usable from PHP unless a bdelta module is written for PHP as well.

(1) is simply a matter of not having the time and necessity.  (2)
would be better served by another CCAN module providing a simple
wrapper around libxdiff, one that doesn't require the user to set up
complicated structures and implement callback functions.  If I
constrain the patch format, I'll lose the flexibility needed to
implement more interesting algorithms in the future.

>> I think testing could be improved a bit.  For example, I'd test the
>> diff results directly, before testing patching.  eg. that "aaaa" vs
>> "aaaa" produced a non-diff, then "aaaa" vs "aaaab", etc.  Then once the
>> diffs pass, test those patches.

I did just that in the new patch.  See run-myers.c, which tests the
diff_myers function directly.

>> And I don't think your random test actually tests patching?

I added clarification in run-medium.c.  bdelta_diff verifies that the
patch produces the correct result, and the user can safely make that
assumption.

- Joey
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bdelta-0.1.1.patch.gz
Type: application/x-gzip
Size: 12744 bytes
Desc: not available
URL: <http://lists.ozlabs.org/pipermail/ccan/attachments/20110819/9a6d647d/attachment.bin>