Optimize Parcel Resumption and Reduce Blocking in ChainPorter #1068

ffranr · 2024-08-08T12:03:32Z

This PR addresses this concern by @jharveyb : #1055 (comment)

Changes

Improved Goroutine Management: The WaitGroup.Done() call has been repositioned to align closely with the corresponding WaitGroup.Add(1) call, making the code easier to read and reducing the risk of errors.
Buffered Channel for Outbound Parcels: A buffer has been added to the outboundParcels channel, allowing the system to handle new parcels without being blocked by pending ones, which enhances overall performance.
Concurrent Resumption of Pending Parcels: The ChainPorter.resumePendingParcels method is now executed in a separate goroutine, ensuring that resuming pending parcels does not delay the startup process.

This commit moves the WaitGroup.Done() call closer to the corresponding WaitGroup.Add(1) call. The purpose of this change is to group the goroutine management code together, making it easier to read and reducing the risk of forgetting to decrement the WaitGroup counter.

This commit introduces a buffer to the ChainPorter.outboundParcels channel. By adding this buffer, the system can handle new parcels without being blocked by resumed pending parcels, improving overall efficiency and reducing potential delays.

Resume any pending parcels in a new goroutine so that we don't delay returning from the `ChainPorter.Start` method.

jharveyb · 2024-08-08T17:55:12Z

tapfreighter/chain_porter.go

@@ -132,7 +132,7 @@ func NewChainPorter(cfg *ChainPorterConfig) *ChainPorter {
 	)
 	return &ChainPorter{
 		cfg:             cfg,
-		outboundParcels: make(chan Parcel),
+		outboundParcels: make(chan Parcel, 10),


I think we want to leave this as unbuffered? Otherwise we would end up trying to start the state machine with multiple packets at once:

taproot-assets/tapfreighter/chain_porter.go

Line 318 in 1b5a4ef

case outboundParcel := <-p.outboundParcels:

jharveyb · 2024-08-08T17:56:28Z

tapfreighter/chain_porter.go

+		p.Wg.Add(1)
+		go func() {
+			defer p.Wg.Done()
+			startErr = p.resumePendingParcels()


I think resumePendingParcels() needs to be modified to receive Quit signals?

Like this case:

taproot-assets/tapfreighter/chain_porter.go

Line 330 in 1b5a4ef

case <-p.Quit:

But here:

taproot-assets/tapfreighter/chain_porter.go

Line 194 in 1b5a4ef

p.outboundParcels <- pendingParcel

Why is it even a problem if we delay startup by waiting for pending parcels? Anything that takes a while (e.g. proof transfer) will be done in a goroutine anyway. So I don't see a pressing reason to do things async here.

Also, if we do this in another goroutine, we don't need the buffered channel as already mentioned by @jharveyb. I think it just makes the behavior less deterministic (e.g. with 9 parcels the goroutine finishes almost immediately but with 11 it blocks until complete)...

IIUC the start of each 'subsystem' is sequential and blocking on any one Start() call would block the rest?

taproot-assets/server.go

Line 185 in a8d8b5a

if err := s.cfg.ChainPorter.Start(); err != nil {

Even with proof transfer in a goroutine, if we're resuming more than one parcel I think the transfer process for the first one would block resumption of the second? And these wouldn't happen in parallel.

The concrete case I was thinking of was: "If I have a wide transfer (with many recipients), what happens on restart?"

I think they would attempt to be transferred, sequentially, and any issues with proof upload at that point would block the caretaker startup. Maybe I'm wrong about which errors would cause which functions to block though.

We already spin up a goroutine for each individual parcel:

taproot-assets/tapfreighter/chain_porter.go

Line 337 in 9ac1baa

go p.advanceState(sendPkg, outboundParcel.kit())

So at startup, because the main event loop already is a goroutine and just starts another one for each parcel, we can feed in the parcels to resume synchronously and block on that.

Ah, so we can handle parcels in parallel then?

In that case, IIUC, the only startup delay would be calling advanceState() for each resumed parcel, which should be fast.

And then we actually don't need to modify the current behavior, and the problem I was thinking of doesn't exist.

I'm not sure if we have an existing test that handles multiple parcels at once, that would be good to have to validate this.

guggero · 2024-08-09T07:53:50Z

tapfreighter/chain_porter.go

@@ -311,8 +314,6 @@ func (p *ChainPorter) QueryParcels(ctx context.Context,
 // requests, and attempt to complete a transfer. A response is sent back to the
 // caller if a transfer can be completed. Otherwise, an error is returned.
 func (p *ChainPorter) mainEventLoop() {
-	defer p.Wg.Done()


We should add a NOTE comment to this method that it MUST be run as a goroutine.
Normally, the defer p.Wg.Done() at the start of a method indicates this to someone reading the code.

guggero · 2024-08-09T07:59:18Z

tapfreighter/chain_porter.go

+		p.Wg.Add(1)
+		go func() {
+			defer p.Wg.Done()
+			startErr = p.resumePendingParcels()


Why is it even a problem if we delay startup by waiting for pending parcels? Anything that takes a while (e.g. proof transfer) will be done in a goroutine anyway. So I don't see a pressing reason to do things async here.

Also, if we do this in another goroutine, we don't need the buffered channel as already mentioned by @jharveyb. I think it just makes the behavior less deterministic (e.g. with 9 parcels the goroutine finishes almost immediately but with 11 it blocks until complete)...

ffranr · 2024-08-13T12:37:43Z

Closing, see #1068 (comment)

jharveyb · 2024-08-13T15:04:54Z

Closing, see #1068 (comment)

@ffranr

It would still be good to have an itest that exercised having multiple transfers running concurrently. A simple way to get that would be just delaying mining a block; so submit a transfer, then a second transfer - both should end up waiting for the confirmation, but proceed as normal after a block is mined.

ffranr · 2024-08-13T15:27:13Z

Closing, see #1068 (comment)

@ffranr

It would still be good to have an itest that exercised having multiple transfers running concurrently. A simple way to get that would be just delaying mining a block; so submit a transfer, then a second transfer - both should end up waiting for the confirmation, but proceed as normal after a block is mined.

@jharveyb good idea! I'll spin that into an issue.

ffranr · 2024-08-13T15:33:12Z

issue: #1081

ffranr added 3 commits August 8, 2024 12:29

tapfreighter: exec ChainPorter.resumePendingParcels in new goroutine

9ac1baa

Resume any pending parcels in a new goroutine so that we don't delay returning from the `ChainPorter.Start` method.

ffranr requested a review from jharveyb August 8, 2024 12:03

ffranr self-assigned this Aug 8, 2024

ffranr mentioned this pull request Aug 8, 2024

Reattempt proof delivery on node restart #1055

Merged

ffranr requested a review from GeorgeTsagk August 8, 2024 12:22

jharveyb reviewed Aug 8, 2024

View reviewed changes

guggero reviewed Aug 9, 2024

View reviewed changes

GeorgeTsagk removed their request for review August 9, 2024 10:01

ffranr closed this Aug 13, 2024

ffranr mentioned this pull request Aug 13, 2024

[feature]: Add integration test for concurrent transfers with delayed block mining #1081

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Parcel Resumption and Reduce Blocking in ChainPorter #1068

Optimize Parcel Resumption and Reduce Blocking in ChainPorter #1068

ffranr commented Aug 8, 2024

jharveyb Aug 8, 2024

jharveyb Aug 8, 2024

guggero Aug 9, 2024

jharveyb Aug 9, 2024

guggero Aug 12, 2024

jharveyb Aug 12, 2024

guggero Aug 9, 2024

guggero Aug 9, 2024

ffranr commented Aug 13, 2024

jharveyb commented Aug 13, 2024 •

edited

Loading

ffranr commented Aug 13, 2024

ffranr commented Aug 13, 2024

Optimize Parcel Resumption and Reduce Blocking in ChainPorter #1068

Optimize Parcel Resumption and Reduce Blocking in ChainPorter #1068

Conversation

ffranr commented Aug 8, 2024

Changes

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ffranr commented Aug 13, 2024

jharveyb commented Aug 13, 2024 • edited Loading

ffranr commented Aug 13, 2024

ffranr commented Aug 13, 2024

jharveyb commented Aug 13, 2024 •

edited

Loading