[c-lightning] Replicated backups

Wed May 29 22:53:30 AEST 2019

William Casarin <jb55 at jb55.com> writes:

> Continuing this discussion from github:
>
> The current plan is a replicated DB backup plugin, there was talk about
> potential static channel backups like LND does it (saved with data loss
> protection info), but I believe those aren't as ideal? cc @ZmnSCPxj

Static channel backups are more like a stop-gap solution: you rely on
the counterparty telling you about the last commitment point to detect
that you are indeed falling behind, and then asking them nicely to close
the channel. If they refuse you are stuck indefinitely.

I should say this currently works, since all implementations are
behaving nicely, but it just doesn't feel right to have super secure
protocol and then hand-wave things like these.

> ZmnSCPxj:
>> For myself, the only safe channel backup is either DB replication
>> For DB replication, you are probably better off using a commodity
>> replicating self-healing filesystem, such as ZFS (turn off dedup,
>> since Lightning stores cryptographic data mainly which is very
>> unlikely to be duplicated). However, if you need to run on
>> single-board computer like RaspPi, well ZFS is not going to fit.
>> Hooking at the DB level seems questionable to me: plugins hooked there
>> can be triggered *before* `init` (because we already do some DB
>> updates just from starting up), plugins hooked there cannot safely
>> execute most (all, because things could change in the future) commands
>> because most of them touch the database, plugins get a DB sqlite3
>> query that they have to somehow understand, plugins need to coordinate
>> the timing of them sampling/copying the database file and writing the
>> query to their own write-ahead log... there is a lot of space for
>> edge-case bugs here, you are really better of with ZFS.
>
> we discussed this today on IRC: https://jb55.com/s/802aa6f679b5a336.txt
> I don't think ZFS is a reasonable thing to require of end users. I share
> your concern about the brittleness of maintaining a write-ahead log and
> replication plugin based on db hooks, but I don't see any other option.
> In the chance we fall out of sync I believe we can just start over a
> fresh snapshot, it's not ideal but should be robust?

I definitely agree, ZFS is nice, but should always remain optional, we
can't tell users what FS to run. I'm also not so sure that we actually
perform any DB modifications before initializing the plugins (and we
certainly don't do modifications before the `getmanifest` call which is
where hooks are registered). If there were such a DB modification it
should be trivial to move that to after the plugin init or we could
replay them after init.

> An alternative is to have a specialized backup for a specific subset of
> the database that could be backed up independently, or snapshotted on
> each update. I'm not sure if this is possible but it's something I'm
> looking into.

It should be possible to filter out tables that are often modified
(utxoset, ...) but are mostly used for caching. That'd reduce the stream
of modifications that need to be backed up quite significantly. By
filtering the DB snapshot as well we can get the overall backup size
quite small.

My current plan is to create two backup formats:

 - Simple replay of the DB, by having a second DB open in the plugin and
   just replaying any modification as it comes in.
 - Snapshot + modification log, in which the plugin takes an initial
   snapshot of the DB if none exists, and then appends each query as it
   comes in to a journal that could be replayed at a later time.

Important here is that we have a way to detect when we get out of sync
due to an intermittent start without the backup plugin or a crash before
the last query was committed. The DB replay should likely lag a bit
behind the live DB and the journal should only ever be empty in case of
a new snapshot. So I propose adding a DB version var that is being
incremented with every query transaction that has modifications. Then we
can detect whether the plugin is ahead of 1 version (indicating a crash
and requiring a new snapshot or rollback of the last transaction) or it
has fallen behind by not running alongside the daemon (requiring a new
snapshot).

The snapshot + journal mechanism can also be encrypted and gives us a
lot more freedom where to store our backups. I've been playing with the
idea of having a variety of storage backends that can be selected by
specifying one or more backup URIs. That could allow us to use Dropbox,
FTP, SSH, Google Cloud Storage, or some custom built service that
guarantees replicated and consistent storage.

I haven't spent too many cycles on the design of this and haven't
started yet, aside from some simple DB replay trials. If there are more
people interested in this we can throw together a design document in
either the c-lightning repo or the plugins repo.

> ZmnSCPxj:
>>> I don't think ZFS is a reasonable thing to require of end users. I
>>> share your concern about the brittleness of maintaining a write-ahead
>>> log and replication plugin based on db hooks, but I don't see any
>>> other option.
>
>> NIH *shrug* ZFS is safe and well-tested and a lot more people are going
>> to be invested in *keeping* it safe and well-tested. Our own boutique
>> backup will take at least a year of development and #recklessness before
>> it would be anywhere near as safe. Those who cannot wait and need for
>> some reason to handle mass amounts of Bitcoin on LN nodes ***now***
>> should really use ZFS (and should not use lnd SCB; given the major LN
>> node implementations are open-source, it is trivial for an attacker to
>> write code that exploits SCB and gets away with it before code that
>> probes for theft attempts becomes too widespread for safe stealing of
>> SCB users).
>
> how exactly do you do realtime replication with ZFS? I've only ever used
> the snapshots.
>
>>> In the chance we fall out of sync I believe we can just start over a
>>> fresh snapshot, it's not ideal but should be robust?
>
>> How do we know we are out of sync?
>
> Christian had the idea of using an update state counter, so the plugin
> could know if it's at the correct state
>
> I guess the problem is if the drive died right after updated to the
> latest state and somehow the plugin crashes or failed to commit this
> latest state to the remote server.

This is exactly why the plugin hooks are synchronous: the plugin is
called before the commit, and the plugin itself can make sure that the
update was committed to the remote server before allowing the daemon to
continue. If desired we could also introduce some metadata to the hook
call and create multiple tiers of modifications. This way we can stream
updates for the not so important updates (anything that doesn't commit
to a channel state, such as processing a block) and then have the plugin
wait for a synchronous ack from the service whenever the modification is
critical.

>> If the plugin dies at some point and `lightningd` auto-removes the hook,
>> some queries will not get backed up. Some of those queries may update
>> channels that end up not getting updated for a good while (maybe the
>> other side just goes offline for a long while), so you cannot even know
>> that you do not know of an update until too late.
>
> I figured in these scenarios you can start over from a new update
> counter. The plugin would have to save uncommitted logs incase the
> remote goes down.

We should likely mandate that plugins with hooks registered should be
treated as critical and we should die alongside the plugin in those
cases.

Cheers,
Christian