[ccan] New module: bdelta

Mon Aug 22 11:04:49 EST 2011

On Sun, Aug 21, 2011 at 7:12 AM, Rusty Russell <rusty at rustcorp.com.au> wrote:
> On Fri, 19 Aug 2011 10:41:33 -0400, Joey Adams <joeyadams3.14159 at gmail.com> wrote:
>> I was going to use an enum, but what would I call it?  Remember that
>> C++ would treat enum bdelta {...} as a type definition (rather than
>> simply an enum tag), meaning "bdelta" could no longer be used as a
>> variable name.
>
> I don't really care about C++, but enum bdelta seems wrong anyway:
> bdelta_err maybe?

I went with BDELTAcode (as mentioned in my last patch-laden e-mail).
It's slightly ugly, but it's the closest I could get to "consistency",
as it's what libcurl uses.

typedef enum {...} bdelta_err would be fine, too.  Actually, it'd
probably be better, as other modules could follow bdelta's lead
without it looking horribly ugly (e.g. OGG_TO_PCMcode).

>> Another naming consideration: I'm not sure whether to call it a
>> "patch", "diff", or "delta".  I called the library bdelta rather than
>> bdiff mainly because bdiff_diff looks silly.  I should probably use
>> all of those terms so people using search engines can find what
>> they're looking for.
>>
>> Note that I shunned the word "binary" in my documentation because the
>> diff algorithm shortens at the byte level, not the bit level.  An
>> insertion of one bit would change all the bytes that follow.
>
> Yeah, bdelta = byte delta?

That, or "binary delta".  I doubt people will actually care whether
bdelta recognizes insertions/deletions at the bit level.  Those who do
will hopefully notice it by reading the introduction or documentation
and seeing "byte", or (worst case) they'll notice during testing and
find out they have to unpack their bit representations.

>> > 3) I wouldn't implement bdelta_perror personally, since I prefer
>> >   err/errx/warn/warnx.
>>
>> I'm following the lead of GnuTLS here, which has a function called
>> gnutls_perror.  It's really convenient to have, if you want to follow
>> the good practice of handling all error codes but haven't become
>> interested in the errors themselves yet.
>
> Your implementation though, is *worse* than someone implementing it
> themselves.  I'm not arguing against bdelta_strerror(), just
> bdelta_perror().  It's stupid.
>
> If you're a library, you shouldn't be printing to stderr unless you're
> about to die (cf. err()). Otherwise, you often want a format string
> rather than a fixed string. ...

Good point.  Additionally, I suppose it would be silly if every
error-returning library was burdened with implementing its own perror.

> ...   I've found that generally when coders try to use perror it makes their code worse.

What about when they don't check error codes at all?

>> > 4) We really need a good bitstring library: it'd be great for this!
>>
>> You just said "string"!  :-)
>
> Yeah, I suck :)

You also said "bit".  Double fail ;-)

>> Thanks for the review!
>
> No worries, now I have to actually *understand* the code to give
> feedback on the implementation :)

The part I documented best was my implementation of the Myers'
algorithm.  The plumbing (string buffering, parsing, dynamic memory
allocation, and error handling) should be self-explanatory, at least
if you're me.

bdelta has its own private string buffer implementation, which would
arguably make a good CCAN library on its own.  However:

 * I don't have the time or interest to develop, document, test, and
review a second module (for the time being).

 * I strive for standalone modules, as they're easier to grab and
integrate into a project, as well as to study.  Even if I implemented
ccan/sb, I wouldn't want bdelta to depend on it.  Other diff
implementations I found were either mired in application code (e.g.
rsync, git) or in themselves (e.g. libxdiff), making them hard to
integrate.

- Joey