From ZmnSCPxj at protonmail.com  Wed Jun  2 09:30:59 2021
From: ZmnSCPxj at protonmail.com (ZmnSCPxj)
Date: Tue, 01 Jun 2021 23:30:59 +0000
Subject: [c-lightning] Routing: bias against smaller channels
In-Reply-To: <A821927A-3342-44B6-85E0-793930992A01@nibnalin.me>
References: <102B86C6-73EE-4CF1-9353-84D9E60203CF@nibnalin.me>
 <7df2f90a3a31edc5611d23ff27942d72@schmoock.net>
 <jbAwjxpypYSaIljH9IA4F9BerjKR49xb9faA6t05ZP22RXT6I1gW6c_oxi_15HiLL0aOOKmv7SW5gWzqUBAiXPlN4vIr98nBAG-lam_-T6c=@protonmail.com>
 <5f4691b90d85f515cb9f7403e109e880@schmoock.net>
 <A821927A-3342-44B6-85E0-793930992A01@nibnalin.me>
Message-ID: <dzuZro1KE1FkHI8yw1dyHbhPuj2dzUJSGLlW5a6kBhcF1-rdrzGFyypGvbb2H_cLaZl48ktWzGQTxGG-bNrJQOLmkTtga3h3Sna6aECi35E=@protonmail.com>


> Hi,
>
> > In code this looks like this:
> >
> > https://github.com/lightningd/plugins/pull/246/commits/2dbff662f712c7883169e7557f5170e356c942e8
>
> Interesting! So if I understand this routine correctly, this finds a path with the least number of hops, and in the subset of paths of that length, a path with high msat (likely just the one with the?highest?msat in practice).
>
> I suppose it's not surprising that pathfinding's O(maxhop*(N log)) isn't the bottleneck for most practical usage,

Yes.

However it does somewhat assume that the published network will not grow in the future.
If it **does** grow, then the n * n log n will be significant.

> but to add to ZmnSCPxj's comment, if I'm not mistaken, I think we can redefine the edge weights in the graph and run the equivalent of your subroutine in a single pass of dijkstra.

Correct, but that is the sticking point in the reluctance to modify `getroute`!

For example, we have a `riskfactor` argument to `getroute`.
This translates the CLTV-delta cost on each hop into an `msatoshi` cost that is then added to the fee to that hop, to generate the cost ("edge weight") of that hop.

If we add another conversion factor from channel size (or log(channel size) --- and what would the base of the log be as well?) then that adds more parameters to pass to `getroute`.

Personally I do not really have an intuitive grasp of the existing `riskfactor` argument to `getroute`, and have to look up the table in the docs all the time.
I imagine that if we add another parameter to factor in the channel size, the same effect will apply (i.e. it is difficult to intuitively understand the parameter that converts channel size to the edge weight).

That is why there is an idea to just expose the routemap directly, so that plugins can provide their own exact pathfinding algorithm.

--

Another idea would be to pass in a "function" to the `getroute` method.

Basically, `getroute` would be given two optional arguments, `costfunctionmethod` and `costfunctionargs`.
The first would be a string, the second would be a generic JSON object.

* A plugin that wants to provide its own edge weight computation to `getroute` would first implement its own unique cost function as an ordinary plugin command.
* Then it would pass in the name of that command as the `costfunctionmethod` to `getroute`.
* `getroute` would then invoke the given method with the channel details (i.e. similar to an object in `listchannels` output?), which should return a `{"cost": 42.0}` object.
  * `getroute` would then cache the cost for every channel it has to open, in order to reduce the RPC-call overhead.
* Optionally, if the function requires additional arguments, the plugin could add `costfunctionargs`, and any object keys in that object will be added to the arguments to the given `costfunctionmethod`.

Basically the above just uses the command name as a function address, and implements a closure by explicitly passing in additional arguments to the function.

What is more, as I noted in my presentation in Berlin in 2019, Dijkstra, A\*, and Greedy Best First Search are really a continuum of related algorithms.
The core of this family of algorithms is that they have **two** cost functions, `G()` and `H()`.
They form the "cumulative" cost function `F() = G() + H()`.

`G()` is the "actual distance from current node to next node".
`H()` is the "estimated distance from next node to destination" i.e. the "heuristic".

In Dijkstra at one end, `H()` always returns 0.
At the other end, Greedy Best First Search, `G()` always returns 0.

In between them, A\* has non-zero `H()` (biased to underestimate; if it overestimates, it starts to move towards Greedy Best First) and non-zero `G()`.

We can instead have a `getroutealgo` command that accepts two functions, `G()` and `H()` using the above technique ("pass in the method name and any extra method args, i.e. closure"), which would also allow plugins to experiment with using A* and Greedy Best First by simply playing around with the cost functions.

Then we could also have a `payalgo` command that accepts a `getroute`-like replacement function, using the closure technique, so that plugins can experiment with various routing functions without having to reimplement the complicated `pay` logic.

--

The difficulty here is that `getroute` is really implemented over in `gossipd`, and the interfaces to the plugins are over in the main daemon `lightningd`.
So we may very well need to expose the on-disk routemap format to `lightningd` at least so that `getroute` is instead implemented there, where it can poke at the plugin RPC interface.

Thoughts, Rusty?

Regards,
ZmnSCPxj


From michael at schmoock.net  Thu Jun  3 01:46:19 2021
From: michael at schmoock.net (michael at schmoock.net)
Date: Wed, 02 Jun 2021 17:46:19 +0200
Subject: [c-lightning] Routing: bias against smaller channels
In-Reply-To: <dzuZro1KE1FkHI8yw1dyHbhPuj2dzUJSGLlW5a6kBhcF1-rdrzGFyypGvbb2H_cLaZl48ktWzGQTxGG-bNrJQOLmkTtga3h3Sna6aECi35E=@protonmail.com>
References: <102B86C6-73EE-4CF1-9353-84D9E60203CF@nibnalin.me>
 <7df2f90a3a31edc5611d23ff27942d72@schmoock.net>
 <jbAwjxpypYSaIljH9IA4F9BerjKR49xb9faA6t05ZP22RXT6I1gW6c_oxi_15HiLL0aOOKmv7SW5gWzqUBAiXPlN4vIr98nBAG-lam_-T6c=@protonmail.com>
 <5f4691b90d85f515cb9f7403e109e880@schmoock.net>
 <A821927A-3342-44B6-85E0-793930992A01@nibnalin.me>
 <dzuZro1KE1FkHI8yw1dyHbhPuj2dzUJSGLlW5a6kBhcF1-rdrzGFyypGvbb2H_cLaZl48ktWzGQTxGG-bNrJQOLmkTtga3h3Sna6aECi35E=@protonmail.com>
Message-ID: <2937df42f0cb93ed2c8e698a1ddc5290@schmoock.net>

Hi,

I made some test by pre-filtering the routes on a maxhops/capacity basis 
which prefer "shorter and bigger pipes first" approach as mentioned 
earlier.
Here are some results:

# Disable both filters (like without this PR, just the erring node 
filter=
#> lnc rebalance outgoing_scid=685247x957x0 incoming_scid=685132x1087x0 
msatoshi=0.005btc
Plugin rebalance initialized with 1000msat base / 10 ppm fee  
cltv_final:18  maxhops:0  msatfactor:1.0
maxhops:20  msatfactor:1.0  running_for:4  count_getroute:1  
time_getroute:0.16330862045288086  time_getroute_avg:0.16330862045288086 
  count_sendpay:1  time_sendpay:4.338505029678345  
time_sendpay_avg:4.338505029678345
maxhops:20  msatfactor:1.0  running_for:8  count_getroute:2  
time_getroute:0.46318650245666504  time_getroute_avg:0.23159325122833252 
  count_sendpay:2  time_sendpay:7.869954586029053  
time_sendpay_avg:3.9349772930145264
maxhops:20  msatfactor:1.0  running_for:12  count_getroute:3  
time_getroute:0.6282370090484619  time_getroute_avg:0.2094123363494873  
count_sendpay:3  time_sendpay:11.94294023513794  
time_sendpay_avg:3.980980078379313
maxhops:20  msatfactor:1.0  running_for:19  count_getroute:4  
time_getroute:0.845451831817627  time_getroute_avg:0.21136295795440674  
count_sendpay:4  time_sendpay:18.372679471969604  
time_sendpay_avg:4.593169867992401
maxhops:20  msatfactor:1.0  running_for:23  count_getroute:5  
time_getroute:1.018324375152588  time_getroute_avg:0.20366487503051758  
count_sendpay:5  time_sendpay:22.277318716049194  
time_sendpay_avg:4.4554637432098385
maxhops:20  msatfactor:1.0  running_for:27  count_getroute:6  
time_getroute:1.1925179958343506  time_getroute_avg:0.1987529993057251  
count_sendpay:6  time_sendpay:25.69105839729309  
time_sendpay_avg:4.281843066215515
maxhops:20  msatfactor:1.0  running_for:31  count_getroute:7  
time_getroute:1.4166362285614014  time_getroute_avg:0.2023766040802002  
count_sendpay:7  time_sendpay:29.830421209335327  
time_sendpay_avg:4.261488744190761
maxhops:20  msatfactor:1.0  running_for:35  count_getroute:8  
time_getroute:1.590040683746338  time_getroute_avg:0.19875508546829224  
count_sendpay:8  time_sendpay:33.221359729766846  
time_sendpay_avg:4.152669966220856
maxhops:20  msatfactor:1.0  running_for:38  count_getroute:9  
time_getroute:1.763425588607788  time_getroute_avg:0.19593617651197645  
count_sendpay:9  time_sendpay:36.21986770629883  
time_sendpay_avg:4.024429745144314
maxhops:20  msatfactor:1.0  running_for:42  count_getroute:10  
time_getroute:1.9394252300262451  time_getroute_avg:0.19394252300262452  
count_sendpay:10  time_sendpay:39.911234855651855  
time_sendpay_avg:3.9911234855651854
maxhops:20  msatfactor:1.0  running_for:44  count_getroute:11  
time_getroute:2.115779161453247  time_getroute_avg:0.19234356013211337  
count_sendpay:11  time_sendpay:42.25611186027527  
time_sendpay_avg:3.841464714570479
maxhops:20  msatfactor:1.0  running_for:49  count_getroute:12  
time_getroute:2.32531476020813  time_getroute_avg:0.19377623001734415  
count_sendpay:12  time_sendpay:46.37626266479492  
time_sendpay_avg:3.8646885553995767
maxhops:20  msatfactor:1.0  running_for:51  count_getroute:13  
time_getroute:2.5044617652893066  time_getroute_avg:0.19265090502225435  
count_sendpay:13  time_sendpay:48.622201442718506  
time_sendpay_avg:3.740169341747577
maxhops:20  msatfactor:1.0  running_for:54  count_getroute:14  
time_getroute:2.6831555366516113  time_getroute_avg:0.19165396690368652  
count_sendpay:14  time_sendpay:51.87213206291199  
time_sendpay_avg:3.705152290207999
maxhops:20  msatfactor:1.0  running_for:58  count_getroute:15  
time_getroute:2.9157254695892334  time_getroute_avg:0.19438169797261556  
count_sendpay:15  time_sendpay:55.00669240951538  
time_sendpay_avg:3.6671128273010254

# Now with the currently suggested default values maxhops := 5  and 
msatfactor :=4
Plugin rebalance initialized with 1000msat base / 10 ppm fee  
cltv_final:18  maxhops:5  msatfactor:4.0
maxhops:3  msatfactor:4.0  running_for:3  count_getroute:9  
time_getroute:0.06924104690551758  time_getroute_avg:0.00769344965616862 
  count_sendpay:1  time_sendpay:0.7183606624603271  
time_sendpay_avg:0.7183606624603271
maxhops:3  msatfactor:4.0  running_for:4  count_getroute:10  
time_getroute:0.1384265422821045  time_getroute_avg:0.01384265422821045  
count_sendpay:2  time_sendpay:1.7709376811981201  
time_sendpay_avg:0.8854688405990601
maxhops:3  msatfactor:4.0  running_for:5  count_getroute:11  
time_getroute:0.20621562004089355  
time_getroute_avg:0.018746874549172142  count_sendpay:3  
time_sendpay:2.66377592086792  time_sendpay_avg:0.8879253069559733


We can see that:

1. CPU on excessive getroute usage isn't a big deal, even on a raspberry 
pi.
2. The route was found in 5 seconds compared to not found (in 60 
seconds)
3. Calling sendpay is more expensive timewise by a factor. Exact value 
cannot be determined by the above data, as the successful route was just 
5 (3+2) hops (compared to what getroute gives us without maybe 8 total 
hops).
4. We see that sendpay takes more time than getroute, this becomes more 
painful on longer routes (which no maxhops filter gives early on)

Just wanted to tell you that for big chunk payments (w/o mpp) it seems 
to make sense to try "shorter and bigger pipes first".


Michael

On 2021-06-02 01:30, ZmnSCPxj wrote:
>> Hi,
>> 
>> > In code this looks like this:
>> >
>> > https://github.com/lightningd/plugins/pull/246/commits/2dbff662f712c7883169e7557f5170e356c942e8
>> 
>> Interesting! So if I understand this routine correctly, this finds a 
>> path with the least number of hops, and in the subset of paths of that 
>> length, a path with high msat (likely just the one with 
>> the?highest?msat in practice).
>> 
>> I suppose it's not surprising that pathfinding's O(maxhop*(N log)) 
>> isn't the bottleneck for most practical usage,
> 
> Yes.
> 
> However it does somewhat assume that the published network will not
> grow in the future.
> If it **does** grow, then the n * n log n will be significant.
> 
>> but to add to ZmnSCPxj's comment, if I'm not mistaken, I think we can 
>> redefine the edge weights in the graph and run the equivalent of your 
>> subroutine in a single pass of dijkstra.
> 
> Correct, but that is the sticking point in the reluctance to modify 
> `getroute`!
> 
> For example, we have a `riskfactor` argument to `getroute`.
> This translates the CLTV-delta cost on each hop into an `msatoshi`
> cost that is then added to the fee to that hop, to generate the cost
> ("edge weight") of that hop.
> 
> If we add another conversion factor from channel size (or log(channel
> size) --- and what would the base of the log be as well?) then that
> adds more parameters to pass to `getroute`.
> 
> Personally I do not really have an intuitive grasp of the existing
> `riskfactor` argument to `getroute`, and have to look up the table in
> the docs all the time.
> I imagine that if we add another parameter to factor in the channel
> size, the same effect will apply (i.e. it is difficult to intuitively
> understand the parameter that converts channel size to the edge
> weight).
> 
> That is why there is an idea to just expose the routemap directly, so
> that plugins can provide their own exact pathfinding algorithm.
> 
> --
> 
> Another idea would be to pass in a "function" to the `getroute` method.
> 
> Basically, `getroute` would be given two optional arguments,
> `costfunctionmethod` and `costfunctionargs`.
> The first would be a string, the second would be a generic JSON object.
> 
> * A plugin that wants to provide its own edge weight computation to
> `getroute` would first implement its own unique cost function as an
> ordinary plugin command.
> * Then it would pass in the name of that command as the
> `costfunctionmethod` to `getroute`.
> * `getroute` would then invoke the given method with the channel
> details (i.e. similar to an object in `listchannels` output?), which
> should return a `{"cost": 42.0}` object.
>   * `getroute` would then cache the cost for every channel it has to
> open, in order to reduce the RPC-call overhead.
> * Optionally, if the function requires additional arguments, the
> plugin could add `costfunctionargs`, and any object keys in that
> object will be added to the arguments to the given
> `costfunctionmethod`.
> 
> Basically the above just uses the command name as a function address,
> and implements a closure by explicitly passing in additional arguments
> to the function.
> 
> What is more, as I noted in my presentation in Berlin in 2019,
> Dijkstra, A\*, and Greedy Best First Search are really a continuum of
> related algorithms.
> The core of this family of algorithms is that they have **two** cost
> functions, `G()` and `H()`.
> They form the "cumulative" cost function `F() = G() + H()`.
> 
> `G()` is the "actual distance from current node to next node".
> `H()` is the "estimated distance from next node to destination" i.e.
> the "heuristic".
> 
> In Dijkstra at one end, `H()` always returns 0.
> At the other end, Greedy Best First Search, `G()` always returns 0.
> 
> In between them, A\* has non-zero `H()` (biased to underestimate; if
> it overestimates, it starts to move towards Greedy Best First) and
> non-zero `G()`.
> 
> We can instead have a `getroutealgo` command that accepts two
> functions, `G()` and `H()` using the above technique ("pass in the
> method name and any extra method args, i.e. closure"), which would
> also allow plugins to experiment with using A* and Greedy Best First
> by simply playing around with the cost functions.
> 
> Then we could also have a `payalgo` command that accepts a
> `getroute`-like replacement function, using the closure technique, so
> that plugins can experiment with various routing functions without
> having to reimplement the complicated `pay` logic.
> 
> --
> 
> The difficulty here is that `getroute` is really implemented over in
> `gossipd`, and the interfaces to the plugins are over in the main
> daemon `lightningd`.
> So we may very well need to expose the on-disk routemap format to
> `lightningd` at least so that `getroute` is instead implemented there,
> where it can poke at the plugin RPC interface.
> 
> Thoughts, Rusty?
> 
> Regards,
> ZmnSCPxj