You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Originally posted by bjohnson5 September 3, 2024
I believe I have discovered a potential deadlock situation, but I am relatively new to LND and wanted to discuss it before opening an issue, to be sure that I am not missing something. This is when using the bbolt backend database.
In lnwallet/channel.go the LightningChannel struct defines several methods that the comments explain as the "state machine which corresponds to the current commitment protocol wire spec". These methods are: SignNextCommitment, ReceiveNewCommitment, RevokeCurrentCommitment, and ReceiveRevocation. Each of these will first lock the LightningChannel: lc.lock() and then they will typically attempt to update the channel db.
When updating the channel db, sometimes the database must be re-sized and re-mapped to memory using the mmap function in bbolt's db.go file. This function first attempts to lock the mmaplock mutex.
This is all fine except that if one of the state machine functions is called while the node is trying to find a route a deadlock could occur. The RequestRoute function in payment_session.go will get a routing graph from the db and this will acquire the mmaplock on the db (for good reason, it needs to be sure the db is not re-mapped while it is using it to find a route). It will eventually call functions of the LightningChannel struct in order to find bandwidth, balances, etc... It is possible that these functions are locked by one of the state machine methods and that state machine method could be stuck waiting on the mmaplock.
For example: Thread1: RequestRoute -> NewGraphSession() -> mmaplock.lock()
----------------------------------> p.pathFinder -> availableChanBandwidth -> attempts to call LC functions, blocks waiting on lc.lock()
Thread2: ReceiveRevocation -> lc.lock()
-----------------------------------------> AdvanceCommitChainTail -> attempts to update db, blocks waiting on mmaplock.lock()
If anyone has experience in this area, please let me know if this is all correct and if I should open an issue. Thanks.
The text was updated successfully, but these errors were encountered:
So analysed this issue, and I think there is absolutely no reason we keep an open transaction as part of the session implementation. Its only used there:
Discussed in #9060
Originally posted by bjohnson5 September 3, 2024
I believe I have discovered a potential deadlock situation, but I am relatively new to LND and wanted to discuss it before opening an issue, to be sure that I am not missing something. This is when using the bbolt backend database.
In
lnwallet/channel.go
theLightningChannel
struct defines several methods that the comments explain as the "state machine which corresponds to the current commitment protocol wire spec". These methods are:SignNextCommitment
,ReceiveNewCommitment
,RevokeCurrentCommitment
, andReceiveRevocation
. Each of these will first lock theLightningChannel
:lc.lock()
and then they will typically attempt to update the channel db.When updating the channel db, sometimes the database must be re-sized and re-mapped to memory using the
mmap
function in bbolt'sdb.go
file. This function first attempts to lock themmaplock
mutex.This is all fine except that if one of the state machine functions is called while the node is trying to find a route a deadlock could occur. The
RequestRoute
function inpayment_session.go
will get a routing graph from the db and this will acquire themmaplock
on the db (for good reason, it needs to be sure the db is not re-mapped while it is using it to find a route). It will eventually call functions of theLightningChannel
struct in order to find bandwidth, balances, etc... It is possible that these functions are locked by one of the state machine methods and that state machine method could be stuck waiting on themmaplock
.For example:
Thread1: RequestRoute -> NewGraphSession() ->
mmaplock.lock()
----------------------------------> p.pathFinder -> availableChanBandwidth -> attempts to call LC functions, blocks waiting on
lc.lock()
Thread2: ReceiveRevocation ->
lc.lock()
-----------------------------------------> AdvanceCommitChainTail -> attempts to update db, blocks waiting on
mmaplock.lock()
If anyone has experience in this area, please let me know if this is all correct and if I should open an issue. Thanks.
The text was updated successfully, but these errors were encountered: