[c-lightning] Plug-in design requirements

Sat Dec 8 02:59:46 AEDT 2018

Hi,

From a side discussion during the Lightning summit I understood that my
input
is wanted for the design of the plug-in system of C-Lightning. In this
e-mail
I'll sketch what I'd like to have. In various places, I'll quote structure
names from BOLT 4.

CJP

# Roles

As in regular Lightning payments, I distinguish three roles:

* Sending node
* Forwarding node
* Receiving node

Nothing gets changed for the forwarding nodes; the focus is on the
sending and
receiving node.

# Sending node behavior

Application-specific code on the sending node must be able to send out a
payment
to the receiving node with an onion packet that is different from a regular
onion packet in the following ways:

* The onion data received by the receiving node indicates that, instead of
  following the regular behavior of a Lightning node, the receiving node
must
  follow certain application-specific behavior. The preferred way of
indicating
  this is through a non-default `realm` number, but other methods have been
  proposed as well (such as a non-existing `short_channel_id`).
* The `per_hop` data received by the receiving node contains
  application-specific data instead of, or in addition to, the regular
`per_hop`
  data.

Obviously, if the `realm` number is used as indicator, and the
application-specific data completely replaces the regular `per_hop`
data, this
maximizes the amount of application-specific data that can be sent.

If more data needs to be sent than fits in a single `per_hop` element, the
application might require the use of multiple `per_hop` elements, at the
cost of
reducing the maximum transaction route length. However, for my
application that
does not seem to be needed. Specifically, my application needs the following
data in the onion:
* 8 bytes: the amount to be forwarded in an application-specific way
* ? bytes: an ID related to some data shared between sender and receiver.
  Can be RIPEMD160 (20 bytes).

# Receiving node behavior

On receiving an incoming transaction that indicates application-specific
behavior (as described above), the receiving node must do one of the
following:

* If no application-specific handler is present, fail the incoming
transaction.
* If an application-specific handler is present, accept the incoming HTLC,
  notify the handler of the incoming transaction and pass it the following
  information:

  - Incoming amount
  - Incoming CLTV
  - Incoming payment hash
  - `realm` number
  - `per_hop` data

If more data needs to be sent than fits in a single `per_hop` element, the
entire `hops_data` structure may need to be passed; TBD is whether the
remaining
hops need to be decrypted in a certain way. As far as my application
goes, this
will not be needed, since its data fits in a single `per_hop` element.

The application-specific code must be able to fail the incoming transaction,
or to finish it by providing the transaction pre-image. Note that,
generally,
the pre-image exists on the application-side, and not on the Lightning node
side of the RPC, prior to finishing the transaction; this is different
from how
regular incoming transactions are usually handled on a receiving node.

The Lightning node must take care that, if the application-specific code
fails
or finishes a transaction, this must only directly affect the last incoming
HTLC. It is possible that the transaction route passes multiple time
through the
receiving node; in that case, the payment hash alone is insufficient to
identify
the HTLC that must be removed. Removing all incoming HTLCs with the same
payment
hash should not result in a loss of funds, but it has potential privacy
implications.

It is theoretically possible (though not in my application) that the
application-specific code triggers a new Lightning transaction with the same
transaction hash, which again passes through the receiving node.
Normally, that
transaction should have lower lock times and its HTLC should be removed
before
the application-specific code fails or finishes, but the possibility
that they
still exist might give some interesting edge cases.

If the application-specific code neither fails nor finishes the transaction,
the Lightning node should respect the lock time of the incoming HTLC.
To prevent the peer from closing the channel, the incoming transaction
should be
failed some short time before its lock time; we may want to make this time
difference configurable, maybe even on a per-application basis.

# Handling crashed applications

The most important thing is that a crashed application must never cause the
Lightning node to crash: the Lightning node has potentially more
important jobs
to do, like guarding its channels.

The Lightning node may optionally check whether the application has crashed,
for instance whenever an incoming transaction signals application-specific
handling, and immediately fail the incoming transaction in that case.

However, once the Lightning node has sent the transaction data to the
application, and the possibility exists that the application has
(partially or
fully) processed the transaction data before it crashed, the Lightning node
must not fail the incoming transaction until either the application
recovers and
indicates transaction failure, or the incoming HTLC is about to time out.

To aid in crash recovery, it is recommended to add an extra interface
(function)
between application and Lightning node, to allow the Lightning node to query
the application about ongoing transactions after application restart.
Informally, the Lightning node asks "are you handling this
transaction?", and
the application answers yes or no. If it answers no, the Lightning node may
fail the incoming transaction. The advantage is that transactions are failed
faster than when waiting for the HTLC time-out.

Automatic restarting of the application may be desirable, but following
the UNIX
philosophy of "do one simple thing, and do it well" I don't think it
should be
the responsibility of the Lightning node. In fact, the whole concept of a
persistent application process can be application-specific; for some
applications, it might be better suited to kick off a new (script)
process on
every incoming transaction.

# Further ideas

On the receiver side, the decision of what signals "application-specific
handling" may be delegated to application-specific code. It may be seen as a
chain of handlers: an incoming transaction is offered to the handlers one by
one, until a handler accepts the transaction. The last handler in the
chain is
always the default built-in handler of C-Lightning; it can not refuse a
transaction, only accept it or fail it immediately. So the handler chain
always
has at least length one.

A maximum chain length of two handlers may be good enough: if a user wants
more handlers, it's always possible to use a handler that further
delegates a
transaction to a list of several sub-handlers. This moves a little bit of
complexity out of C-Lightning.

A potential issue here is that, since all transactions are passed
through the
custom code, if the custom code has issues, it may potentially block all
incoming transactions, not just the application-specific ones. This is
not the
end of the world, since writing good custom code should be just as doable as
writing good C-Lightning code. In case of this architecture, we should
encourage
application developers to separate a simple handler stub from their full
application core, for instance by providing some example handler stubs.