[ccan] New module: bdelta

Joey Adams joeyadams3.14159 at gmail.com
Sat Aug 20 00:41:33 EST 2011


On Fri, Aug 19, 2011 at 2:17 AM, Rusty Russell <rusty at rustcorp.com.au> wrote:
> I like it!  It's a perfect example of a ccan module: it does one thing,
> and does it well.

Thanks!

> A few minor nitpicks from reading the patch (I haven't applied it yet!)
>
> 1) I'm always tempted to reuse errno rather than invent my
> own error codes (ENOMEM, EINVAL and EIO?), but otherwise I would do an
> enum for the error codes; it just makes the API more explicit.

My error strings are more descriptive.  "Patch applied to wrong
string" is far more descriptive than "Invalid argument".  Besides, I
prefer to be portable in a library, and ENOMEM/EINVAL/EIO are defined
by POSIX, not by the C standards.

I was going to use an enum, but what would I call it?  Remember that
C++ would treat enum bdelta {...} as a type definition (rather than
simply an enum tag), meaning "bdelta" could no longer be used as a
variable name.

I should probably use a typedef enum of some name, if only to annotate
which functions return error codes and which ones don't.

> 2) Maybe use the term 'char arrays' rather than 'byte strings' as
>   string has that 0-terminated sense for C coders.

I wasn't sure what to call them.  PostgreSQL calls them "byte arrays
(BYTEA)".  Haskell calls it ByteString.  char almost implies plain
text, while "byte" or "octet" suggests the library can be used with
binary data.  A "string" in many other programming languages can have
null bytes.

Another naming consideration: I'm not sure whether to call it a
"patch", "diff", or "delta".  I called the library bdelta rather than
bdiff mainly because bdiff_diff looks silly.  I should probably use
all of those terms so people using search engines can find what
they're looking for.

Note that I shunned the word "binary" in my documentation because the
diff algorithm shortens at the byte level, not the bit level.  An
insertion of one bit would change all the bytes that follow.

> 3) I wouldn't implement bdelta_perror personally, since I prefer
>   err/errx/warn/warnx.

I'm following the lead of GnuTLS here, which has a function called
gnutls_perror.  It's really convenient to have, if you want to follow
the good practice of handling all error codes but haven't become
interested in the errors themselves yet.

err/errx/warn/warnx aren't defined by C or POSIX, but are BSD extensions.

> 4) We really need a good bitstring library: it'd be great for this!

You just said "string"!  :-)

> I think testing could be improved a bit.  For example, I'd test the
> diff results directly, before testing patching.  eg. that "aaaa" vs
> "aaaa" produced a non-diff, then "aaaa" vs "aaaab", etc.  Then once the
> diffs pass, test those patches.

Indeed, bdelta could use more trivial tests.  Someone on freenode/#c
recently asked if anyone had any tests for longest common subsequence,
and I pointed to bdelta.  Having more tests would have been nice.

Also, it would be nice to have some tests that test the backend
algorithm itself (diff_myers), checking the result against expected,
hard-coded patch strings.

> As far as I can tell, if your diff never generated anything but whole
> new strings, you'd pass?

Indeed.  Otherwise, it would be a performance test, and I'm not sure
CCAN supports those yet :-)

> And I don't think your random test actually tests patching?

Actually, they do.  bdelta_diff verifies the patch before returning
it.  I was originally going to make this an assertion check, but then
users (myself included) would have to manually check the result
anyway.  One bad diff can corrupt hundreds of datums.

Thanks for the review!

- Joey


More information about the ccan mailing list