[c-lightning] Replicated backups

Thu May 30 10:40:09 AEST 2019

Good morning William,

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Wednesday, May 29, 2019 1:16 AM, William Casarin <jb55 at jb55.com> wrote:

> Continuing this discussion from github:
>
> The current plan is a replicated DB backup plugin, there was talk about
> potential static channel backups like LND does it (saved with data loss
> protection info), but I believe those aren't as ideal? cc @ZmnSCPxj
>
> ZmnSCPxj:
>
> > For myself, the only safe channel backup is either DB replication
> > For DB replication, you are probably better off using a commodity
> > replicating self-healing filesystem, such as ZFS (turn off dedup,
> > since Lightning stores cryptographic data mainly which is very
> > unlikely to be duplicated). However, if you need to run on
> > single-board computer like RaspPi, well ZFS is not going to fit.
> > Hooking at the DB level seems questionable to me: plugins hooked there
> > can be triggered before `init` (because we already do some DB
> > updates just from starting up), plugins hooked there cannot safely
> > execute most (all, because things could change in the future) commands
> > because most of them touch the database, plugins get a DB sqlite3
> > query that they have to somehow understand, plugins need to coordinate
> > the timing of them sampling/copying the database file and writing the
> > query to their own write-ahead log... there is a lot of space for
> > edge-case bugs here, you are really better of with ZFS.
>
> we discussed this today on IRC:https://jb55.com/s/802aa6f679b5a336.txt
> I don't think ZFS is a reasonable thing to require of end users. I share
> your concern about the brittleness of maintaining a write-ahead log and
> replication plugin based on db hooks, but I don't see any other option.
> In the chance we fall out of sync I believe we can just start over a
> fresh snapshot, it's not ideal but should be robust?
>
> An alternative is to have a specialized backup for a specific subset of
> the database that could be backed up independently, or snapshotted on
> each update. I'm not sure if this is possible but it's something I'm
> looking into.
>
> ZmnSCPxj:
>
> > > I don't think ZFS is a reasonable thing to require of end users. I
> > > share your concern about the brittleness of maintaining a write-ahead
> > > log and replication plugin based on db hooks, but I don't see any
> > > other option.
>
> > NIH shrug ZFS is safe and well-tested and a lot more people are going
> > to be invested in keeping it safe and well-tested. Our own boutique
> > backup will take at least a year of development and #recklessness before
> > it would be anywhere near as safe. Those who cannot wait and need for
> > some reason to handle mass amounts of Bitcoin on LN nodes now
> > should really use ZFS (and should not use lnd SCB; given the major LN
> > node implementations are open-source, it is trivial for an attacker to
> > write code that exploits SCB and gets away with it before code that
> > probes for theft attempts becomes too widespread for safe stealing of
> > SCB users).
>
> how exactly do you do realtime replication with ZFS? I've only ever used
> the snapshots.

RAID-Z and mirroring.
`scrub` once a week to help the filesystem detect inconsistencies among mirrors.
Continuously monitor ZFS health and once you start getting high error rates on a component storage device, do a graceful shutdown of `lightningd`, replace the failing device, have ZFS recover, restart `lightningd`.

This assumes all your hardware is in one place where ZFS can manage them.
If you need remote backup, well... GlusterFS?

A simpler alternative to ZFS is ChironFS but I do not think it is quite as mature as ZFS, and no longer seems maintained, also does not auto-heal, simply keeps going if one replica is damaged or destroyed.
(I believe ChironFS could in-theory use an NFS mount as a replica, but problems occur in case the NFS mount is interrupted due to network connectivity issues, and since ChironFS does not autoheal, the NFS replica will remain non-updated afterwards)

>
> > > In the chance we fall out of sync I believe we can just start over a
> > > fresh snapshot, it's not ideal but should be robust?
>
> > How do we know we are out of sync?
>
> Christian had the idea of using an update state counter, so the plugin
> could know if it's at the correct state
>
> I guess the problem is if the drive died right after updated to the
> latest state and somehow the plugin crashes or failed to commit this
> latest state to the remote server.

This seems plausible.
I would strongly recommend sending this statecounter in the `db_hook`.

But in any case, regardless of location of where you are replicating (remote or local), this is still a form of RAID-1, and consistency issues with RAID-1 must be assumed to be possible here.

>
> > If the plugin dies at some point and `lightningd` auto-removes the hook,
> > some queries will not get backed up. Some of those queries may update
> > channels that end up not getting updated for a good while (maybe the
> > other side just goes offline for a long while), so you cannot even know
> > that you do not know of an update until too late.
>
> I figured in these scenarios you can start over from a new update
> counter. The plugin would have to save uncommitted logs incase the
> remote goes down.
>
> > Maybe add a flag that says "there must be something that is hooked to
> > `db_hook`, if not and we would update the DB, fatal ourselves rather
> > than write something to the DB that cannot be backed up".
>
> That's another approach, although constantly crashing your node when the
> backup server is down is kind of annoying, but it would save the hastle
> of buffering logs.
>
> Any other ideas are welcome, I'm not against ZFS. I'm just worried that
> most people won't be able to set that up.

I would suggest that the plugin have a master/slave structure.
The master process stores a copy of each request sent to the plugin, then forwards to the slave.
If the slave dies before it has replied to the request, master restarts it and re-forwards the request to the slave.
The master process should be kept very simple (to reduce scope of crashes at the master).
`init` cannot be relied on in a `db_hook` plugin since `db_hook` can occur before `init`, so the slave should just respond and do nothing on `init`.
This implies that restart of the slave should leave it in the same "Waiting" state where it could handle `db_hook` at any time, since the slave cannot rely on `init` in any case.
`getmanifest` is a "read-only" command to the plugin, thus the slave getting restarted at any time should be fine with respect to `getmanifest`.
No other requests other than `db_hook` itself should come from `lightningd`, again due to the `db_hook` restrictions.

Also, you might not have seen this ninja edit on the github thread:

> Edit: If you ***really*** want to continue here, I would suggest rather the creation of a `channel_state_update` hook that focuses on the only important DB update: revocation of old channel states. This removes a lot of the risk and complexity of using the DB statements. Then add a `restorechannel` command that requires the same information as `channel_state_update` provides, with some checks to ensure that we do not restore to a known-old channel state.

Possibly you might also want a `getchannelstate` command that gives the same information as `channel_state_update` hook -- for example, after your plugin restarts, you might want to `getchannelstate` all live channels.
Attempting `restorechannel` on all channels we currently hold would also be doable at startup of plugin.
This may be more useful than a remote backup of the entire database.

Of course, loss of invoice data is bad but presumably your shopping cart software should also have a copy of any invoice it has issued too.

Regards,
ZmnSCPxj