[c-lightning] Replicated backups

Fri May 31 02:11:00 AEST 2019

Good morning Christian,

> ZmnSCPxj ZmnSCPxj at protonmail.com writes:
>
> > RAID-Z and mirroring. `scrub` once a week to help the filesystem
> > detect inconsistencies among mirrors. Continuously monitor ZFS health
> > and once you start getting high error rates on a component storage
> > device, do a graceful shutdown of `lightningd`, replace the failing
> > device, have ZFS recover, restart `lightningd`.
> > This assumes all your hardware is in one place where ZFS can manage them.
> > If you need remote backup, well... GlusterFS?
> > A simpler alternative to ZFS is ChironFS but I do not think it is
> > quite as mature as ZFS, and no longer seems maintained, also does not
> > auto-heal, simply keeps going if one replica is damaged or destroyed.
> > (I believe ChironFS could in-theory use an NFS mount as a replica, but
> > problems occur in case the NFS mount is interrupted due to network
> > connectivity issues, and since ChironFS does not autoheal, the NFS
> > replica will remain non-updated afterwards)
>
> Like mentioned before I think focusing too much on the FS is the wrong
> level to look at this. The snapshot + journal approach is more flexible
> in that it can be encrypted and stored wherever we want.

You can generally expect that any style you can imagine, some toy FUSE implementation exists, GlusterFS probably has some translator for it, and if it is any good, ZFS will eventually get into the action.

ZFS encryption: https://docs.oracle.com/cd/E23824_01/html/821-1448/gkkih.html
GlusterFS network-level encryption: https://kshlm.in/post/network-encryption-in-glusterfs/

You could even use a stack of FUSE filesystems, with remote Google Drive FUSE and an encypting FUSE on top, to get "store wherever" and "encrypt".
As a bonus you also get to replicate `hsmd_secret` and `lightningd.log` for free.

I still hold that filesystems are the correct place to do replication: storage is the responsibility of filesystems, and replicated storage as well.

But in any case, feel free to continue.
Perhaps you can improve much better on existing solutions.

Some things you should pay attention to:

1.  Using a FEC encoding on-network and on-disk
    * Provides resilience against dropped packets (helpful if the remote backup is *really* remote, we do not want TCP-based 1.5 RTT for dropped packets).
    * Provides resilience against bitrot on permanent storage, especially on flash / SSD media.
    * Encrypt *before* FEC encoding.
2.  Keeping track of `db_migrations` as mentioned before.
    This may require some incompatibility with the current deployed `db_hook`.
    In particular, I strongly suggest that the db version counter should not be integrated into `db_migrations` but instead be something that is "outside" the normal DB, hence not in `db_migrations`.
    That is, put it in a separate table that is initialized separately from the `db_migrations` array.
    That way, even any `db_migrations` updates are part of the db version counter.

> The information we store about the channels is currently evolving rather more quickly than we can specify a backup format on every change.

It will still remain a rote fact that there must exist some transaction that can be used to perform a unilateral close, no matter what changes to the update mechanism will occur.
The format need only provide information about this transaction, when it can be redeemed, how, as well as a sequence number for each state update (as a simple way to judge if the current in-db state is later than the saved channel state).
But regardless, you seem quite excited about db-query-based approach, so never mind.

The issue then becomes that plugins that attempt to extract information by using an in-memory db will bitrot when we want to update the db format in order to adapt to changes in what information we need to store per-channel.
The same argument applies, except now we have greater friction as the plugin may very well not be maintained in the C-lightning tree (of course, if it *is* maintained in the C-lightning tree, this objection partly goes away, especially if part of the automated testing: but note that you have not improved upon the situation where we have an interface to a specific channel-state-update scheme if the plugin is in the C-lightning tree).

It will always remain a possibility in Lightning to close channels unilaterally using some sequence of transactions, and that is what we can focus on in the channel state interface, and proper versioning can be used in the channel state JSON data if we need to change the format of the channel state JSON data.
But in any case, I myself am satisfied with FS-level replication strategies, so good luck in your endeavors and feel free to ignore me.
Building is more important than arguing.

Regards,
ZmnSCPxj