-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Timeout #251
Timeout #251
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good to me (I made a little suggestion).
Thinking aloud, since the "timer" that runs in nucypher-ts
and the "timer" than runs in Coordinator contract
are not the same and probably, due to the network delays, one of them will reach the deadline before the other... I wonder if the difference between these two deadlines is significant or not 🤔
provider: ethers.providers.Web3Provider | ||
): Promise<number> { | ||
const Coordinator = await this.connectReadOnly(provider); | ||
const timeout = await Coordinator.timeout(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Timeout is the number of seconds counted since the ritual started, right? What happens if my transaction gets stuck (low gas etc.) but I'm still counting from when I send the transaction? Should we instead count the timeout from the "ritual start" event or some other cut-off?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, timeout
is a variable on the Coordinator https://github.com/nucypher/nucypher-contracts/blob/da35b9d9f13ddebd7ce7bbbc3fbc7d5fb0d95411/contracts/contracts/coordination/Coordinator.sol#L61
It's set at Coordinator level and applies to all rituals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I'm missing something here. Timeout is defined as ritual.initTimestamp + timeout < block.timestamp
, and the initial value of ritual.initTimestamp
is taken from block.timestamp
. So it seems to me like the timeout
denotes some number of seconds, and the timeout occurs after we reach some number of seconds after the ritual started.
Do you see any edge cases here? If we use setTimeout
, is it going to match exactly the calculation performed in ritual state checks? I.e. ritual.initTimestamp + timeout < block.timestamp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
timeout
denotes some number of seconds, and the timeout occurs after we reach some number of seconds after the ritual started.
Yes, agree
setTimeout
is changing the config of Coordinator, so not sure that it helps us.
But i see what your original comment implied now. When we do:
const ritualId = await DkgCoordinatorAgent.initializeRitual(
web3Provider,
ursulas.sort()
);
The fact that we await
doesn't always mean that the transaction went through? and it could therefore be sitting around in the mempool whilst the Promise.race
starts counting down. Is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
initializeRitual
is blocked until the ritual is started (and some event is emitted, etc.) so it may not be an issue after all.
I was trying to give a concrete example of something that @manumonti already hinted on in his comment:
Thinking aloud, since the "timer" that runs in nucypher-ts and the "timer" than runs in Coordinator contract are not the same and probably, due to the network delays, one of them will reach the deadline before the other... I wonder if the difference between these two deadlines is significant or not
I didn't make a full analysis on what are the different edge cases we may have here, but one other that comes to mind is where we're awaiting for timeout
and the Coordinator
admin changes the value of timeout
. I think this case and some basket of other cases can be handled by replacing setTimeout
by setInterval
and by redoing the contract calculation ritual.initTimestamp + timeout < block.timestamp
in JS. Or alternatively, querying the contract for the ritual state. The latter seems to be more robust.
Co-authored-by: Manuel Montenegro <manuel@nucypher.com>
… return a new error
ritualId | ||
); | ||
if (!isSuccessful) { | ||
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haven't thought this through fully just yet, but you may be able to leverage the Coordinator contract to do the bulk of the timeout work for you. See https://github.com/nucypher/nucypher-contracts/blob/main/contracts/contracts/coordination/Coordinator.sol#L75.
The ritual state will return TIMEOUT if the Ritual has not completed within the timeout window. Of course, you don't want to loop hitting the Coordinator contract, so only hitting it periodically could be simpler.
Basically the end case for waiting is you receive EndRitual
or the returned state from the Coordinator contract is INVALID / TIMEOUT / FINALIZED.
src/dkg.ts
Outdated
); | ||
if (!isSuccessful) { | ||
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider); | ||
const bufferedTimeout = timeout * 1.1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to do this.
At the moment, the timeout is 1 day = 86400 seconds (at least for testnet). An additional 10% is 8640s = 144 minnutes = 2.4 hrs which is pretty inefficient.
Even if the timeout was less, say 4 hours, 10% would be 24 minutes.
Of course this is in the worst case when EndRitual
isn't received - but still.
See comment above about potential way to be more efficient by possibly leveraging periodic calls to the Coordinator contract. Just something to consider that may/may not help.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah i was trying to avoid potential infinite loops, but i agree that the addition of 10% is too much, it was really there as a placeholder to see how people react
Repeating my comment from Discord (https://discord.com/channels/866378471868727316/866378471868727319/1131960750836031618) to this PR: Just so that I get the full picture:
Assuming this is the correct premise... You can probably use the ritual's wdyt? |
Absolutely correct in your outline here.
Hadn't thought of this, love it 🚀 |
Codecov Report
@@ Coverage Diff @@
## alpha #251 +/- ##
==========================================
- Coverage 80.56% 79.06% -1.50%
==========================================
Files 37 37
Lines 1055 1075 +20
Branches 144 145 +1
==========================================
Hits 850 850
- Misses 196 215 +19
- Partials 9 10 +1
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just a few nitpicks - Please address them at will.
I see there are no tests added to this PR. Were these changes tested manually?
src/dkg.ts
Outdated
web3Provider, | ||
ritualId | ||
); | ||
if (!isSuccessful) { | ||
const timeout = await DkgCoordinatorAgent.getTimeout(web3Provider); | ||
const endTime = initTimestamp + timeout; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
const endTime = initTimestamp + timeout; | |
const endTimestamp = initTimestamp + timeout; |
src/dkg.ts
Outdated
const endTime = initTimestamp + timeout; | ||
|
||
// Wait until the current time is past the endTime | ||
while (Math.floor(Date.now() / 1000) < endTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while (Math.floor(Date.now() / 1000) < endTime) { | |
const nowTimestamp = Math.floor(Date.now() / 1000); | |
while (nowTimestamp < endTimestamp) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will nowTimestamp
be updated at the end of every loop? i thought it would be calculated once and then remain static
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah, you're right. I just wanted to suggest naming this variable.
src/dkg.ts
Outdated
do { | ||
const block = await web3Provider.getBlock('latest'); | ||
currentBlockTime = block.timestamp; | ||
if (currentBlockTime < endTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (currentBlockTime < endTime) { | |
if (currentBlockTime < endTimestamp) { |
src/dkg.ts
Outdated
if (currentBlockTime < endTime) { | ||
await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait for 1 second before checking again | ||
} | ||
} while (currentBlockTime < endTime); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two blocks (checking for timeout and the one below, line 147) are large enough to be refactored into separate functions. It could make them somewhat more readable.
|
||
// Wait until the current time is past the endTime | ||
while (Math.floor(Date.now() / 1000) < endTime) { | ||
await new Promise((resolve) => setTimeout(resolve, 1000)); // Wait for 1 second before checking again |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could use some help understanding this code.
Does this while loop and the one below it just loop every second until the timeout / endTime
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, i don't think there's an equivalent of python's time.sleep
- this seemed to be how people would wait a certain amount of time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chatted with James a bit, about some issues we may need to address in this PR before merging:
-
It seems that
onRitualEndEvent(...)
usesCoordinator.once(...)
which always (?) expects theEndRitual
event to occur. However in the case ofTIMEOUT
state there is no EndRitual event - timeouts don't produce anEndRitual
event because a tx is required to emit an event, and timeouts by nature imply no response, and therefore no tx. -
Currently the code waits for clock time to expire, then waits for block time to expire(the contract uses block time for timeout), then tries to get the
EndRitual
event. We need to be able to short-circuit this check if the ritual finishes quickly i.e. it's possible/likely (😅 ) that the ritual completes (FINALIZED
orINVALID
) before the timeout, but it seems the code still waits the entire timeout which is unnecessary.
Ritual initialization will be disabled during 7.0.0-beta. Shall we close this PR? |
Type of PR:
Required reviews:
Fixes #249