Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeout #251

Closed
wants to merge 12 commits into from
Closed

Timeout #251

wants to merge 12 commits into from

Conversation

theref
Copy link
Contributor

@theref theref commented Jul 17, 2023

Type of PR:

  • Feature

Required reviews:

How many reviews does the PR author need?

  • 2

Fixes #249

@theref theref changed the base branch from main to alpha July 17, 2023 12:54
Copy link
Member

@manumonti manumonti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me (I made a little suggestion).

Thinking aloud, since the "timer" that runs in nucypher-ts and the "timer" than runs in Coordinator contract are not the same and probably, due to the network delays, one of them will reach the deadline before the other... I wonder if the difference between these two deadlines is significant or not 🤔

src/dkg.ts Outdated Show resolved Hide resolved
provider: ethers.providers.Web3Provider
): Promise<number> {
const Coordinator = await this.connectReadOnly(provider);
const timeout = await Coordinator.timeout();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timeout is the number of seconds counted since the ritual started, right? What happens if my transaction gets stuck (low gas etc.) but I'm still counting from when I send the transaction? Should we instead count the timeout from the "ritual start" event or some other cut-off?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, timeout is a variable on the Coordinator https://github.com/nucypher/nucypher-contracts/blob/da35b9d9f13ddebd7ce7bbbc3fbc7d5fb0d95411/contracts/contracts/coordination/Coordinator.sol#L61

It's set at Coordinator level and applies to all rituals

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm missing something here. Timeout is defined as ritual.initTimestamp + timeout < block.timestamp, and the initial value of ritual.initTimestamp is taken from block.timestamp. So it seems to me like the timeout denotes some number of seconds, and the timeout occurs after we reach some number of seconds after the ritual started.

Do you see any edge cases here? If we use setTimeout, is it going to match exactly the calculation performed in ritual state checks? I.e. ritual.initTimestamp + timeout < block.timestamp

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

timeout denotes some number of seconds, and the timeout occurs after we reach some number of seconds after the ritual started.
Yes, agree

setTimeout is changing the config of Coordinator, so not sure that it helps us.

But i see what your original comment implied now. When we do:

const ritualId = await DkgCoordinatorAgent.initializeRitual(
      web3Provider,
      ursulas.sort()
    );

The fact that we await doesn't always mean that the transaction went through? and it could therefore be sitting around in the mempool whilst the Promise.race starts counting down. Is that correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initializeRitual is blocked until the ritual is started (and some event is emitted, etc.) so it may not be an issue after all.

I was trying to give a concrete example of something that @manumonti already hinted on in his comment:

Thinking aloud, since the "timer" that runs in nucypher-ts and the "timer" than runs in Coordinator contract are not the same and probably, due to the network delays, one of them will reach the deadline before the other... I wonder if the difference between these two deadlines is significant or not

I didn't make a full analysis on what are the different edge cases we may have here, but one other that comes to mind is where we're awaiting for timeout and the Coordinator admin changes the value of timeout. I think this case and some basket of other cases can be handled by replacing setTimeout by setInterval and by redoing the contract calculation ritual.initTimestamp + timeout < block.timestamp in JS. Or alternatively, querying the contract for the ritual state. The latter seems to be more robust.

theref and others added 2 commits July 18, 2023 13:03
Co-authored-by: Manuel Montenegro <manuel@nucypher.com>
ritualId
);
if (!isSuccessful) {
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't thought this through fully just yet, but you may be able to leverage the Coordinator contract to do the bulk of the timeout work for you. See https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol#L75.

The ritual state will return TIMEOUT if the Ritual has not completed within the timeout window. Of course, you don't want to loop hitting the Coordinator contract, so only hitting it periodically could be simpler.

Basically the end case for waiting is you receive EndRitual or the returned state from the Coordinator contract is INVALID / TIMEOUT / FINALIZED.

src/dkg.ts Outdated
);
if (!isSuccessful) {
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider);
const bufferedTimeout = timeout * 1.1;
Copy link
Member

@derekpierre derekpierre Jul 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you want to do this.

At the moment, the timeout is 1 day = 86400 seconds (at least for testnet). An additional 10% is 8640s = 144 minnutes = 2.4 hrs which is pretty inefficient.

Even if the timeout was less, say 4 hours, 10% would be 24 minutes.

Of course this is in the worst case when EndRitual isn't received - but still.

See comment above about potential way to be more efficient by possibly leveraging periodic calls to the Coordinator contract. Just something to consider that may/may not help.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i was trying to avoid potential infinite loops, but i agree that the addition of 10% is too much, it was really there as a placeholder to see how people react

@derekpierre
Copy link
Member

Repeating my comment from Discord (https://discord.com/channels/866378471868727316/866378471868727319/1131960750836031618) to this PR:

Just so that I get the full picture:

  • call DKGCoordinatorAgent.initializeRitual(...) which doesn't return until there is at least one block confirmation
  • then you basically wait for the EndRitual event - this is the ideal path - but of course you want to be robust. There is the case that the ritual times out in which case there is not EndRitual event, so you want to timeout waiting for the EndRitual event
  • If you wait the ritual timeout, you can then call the Coordinator contract to determine the status of the ritual - https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol#L75. In all likelihood by this time the Contract would return a final state (TIMEOUT, INVALID, or FINALIZED), but the concern (which is a good one 👍 ) is that you may have some slight discrepancy in the time that the Contract started it's timer and the time that you did.

Assuming this is the correct premise...

You can probably use the ritual's initTimestamp value stored in the Ritual object in the contract (https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol#L49) to calculate when you need to wait until - which is basically timestamp_to_wait_util = initTimestamp + ritual timeout and then wait until current_block_time > time_to_wait_until and then call getRitualState() and the proper status will be returned. The getRitualState() function uses the same calc - https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol#L84C20-L84C35

wdyt?

@theref
Copy link
Contributor Author

theref commented Jul 24, 2023

Just so that I get the full picture:

Absolutely correct in your outline here.

You can probably use the ritual's initTimestamp value stored in the Ritual object in the contract

Hadn't thought of this, love it 🚀

@codecov-commenter
Copy link

Codecov Report

Merging #251 (16903d7) into alpha (49fa8ab) will decrease coverage by 1.50%.
The diff coverage is 0.00%.

@@            Coverage Diff             @@
##            alpha     #251      +/-   ##
==========================================
- Coverage   80.56%   79.06%   -1.50%     
==========================================
  Files          37       37              
  Lines        1055     1075      +20     
  Branches      144      145       +1     
==========================================
  Hits          850      850              
- Misses        196      215      +19     
- Partials        9       10       +1     
Impacted Files Coverage Δ
src/agents/coordinator.ts 23.07% <0.00%> (-4.20%) ⬇️
src/dkg.ts 42.39% <0.00%> (-6.36%) ⬇️

Copy link
Contributor

@piotr-roslaniec piotr-roslaniec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, just a few nitpicks - Please address them at will.

I see there are no tests added to this PR. Were these changes tested manually?

src/dkg.ts Outdated
web3Provider,
ritualId
);
if (!isSuccessful) {
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider);
const endTime = initTimestamp + timeout;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
const endTime = initTimestamp + timeout;
const endTimestamp = initTimestamp + timeout;

src/dkg.ts Outdated
const endTime = initTimestamp + timeout;

// Wait until the current time is past the endTime
while (Math.floor(Date.now() / 1000) < endTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
while (Math.floor(Date.now() / 1000) < endTime) {
const nowTimestamp = Math.floor(Date.now() / 1000);
while (nowTimestamp < endTimestamp) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will nowTimestamp be updated at the end of every loop? i thought it would be calculated once and then remain static

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, you're right. I just wanted to suggest naming this variable.

src/dkg.ts Outdated
do {
const block = await web3Provider.getBlock('latest');
currentBlockTime = block.timestamp;
if (currentBlockTime < endTime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (currentBlockTime < endTime) {
if (currentBlockTime < endTimestamp) {

src/dkg.ts Outdated
if (currentBlockTime < endTime) {
await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait for 1 second before checking again
}
} while (currentBlockTime < endTime);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two blocks (checking for timeout and the one below, line 147) are large enough to be refactored into separate functions. It could make them somewhat more readable.


// Wait until the current time is past the endTime
while (Math.floor(Date.now() / 1000) < endTime) {
await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait for 1 second before checking again
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use some help understanding this code.

Does this while loop and the one below it just loop every second until the timeout / endTime?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i don't think there's an equivalent of python's time.sleep - this seemed to be how people would wait a certain amount of time

Copy link
Member

@derekpierre derekpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with James a bit, about some issues we may need to address in this PR before merging:

  1. It seems that onRitualEndEvent(...) uses Coordinator.once(...) which always (?) expects the EndRitual event to occur. However in the case of TIMEOUT state there is no EndRitual event - timeouts don't produce an EndRitual event because a tx is required to emit an event, and timeouts by nature imply no response, and therefore no tx.

  2. Currently the code waits for clock time to expire, then waits for block time to expire(the contract uses block time for timeout), then tries to get the EndRitual event. We need to be able to short-circuit this check if the ritual finishes quickly i.e. it's possible/likely (😅 ) that the ritual completes (FINALIZED or INVALID) before the timeout, but it seems the code still waits the entire timeout which is unnecessary.

@piotr-roslaniec
Copy link
Contributor

Ritual initialization will be disabled during 7.0.0-beta. Shall we close this PR?

@piotr-roslaniec piotr-roslaniec added the do not merge Open for review but do not merge please label Sep 8, 2023
@piotr-roslaniec piotr-roslaniec deleted the branch nucypher:alpha October 18, 2023 08:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do not merge Open for review but do not merge please
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle ritual initialization timeout
5 participants