-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backoff still not working well #769
Comments
DebuggingSo I think the reason for this could be that the session = await client.connect_async(reconnecting=True)
result = await session.execute(query) Should also be noted that decorator effects don't get passed to methods that are called inside the the wrapped method so it is possible that the code in metamist/metamist/graphql/__init__.py Lines 166 to 169 in f1b9811
does not pass the decorator effects to the Potential SolutionAn approach that could be worth trying is to modify the RequestsHTTPTransport instance here: metamist/metamist/graphql/__init__.py Lines 71 to 74 in f1b9811
to transport = RequestsHTTPTransport(
url=url or get_sm_url(),
headers={'Authorization': f'Bearer {token}'},
retries=6,
retry_backoff_factor=1.0,
) This would make 6 retries, spaced at 1s, 2s, 4s, 8s, 16s, 32s before giving up. I believe this can be tried without the Alternatively, we could just instantiate the |
Happy to try out the retries/backoff in that object/client. I only went with the backoff decorator as it was working A-ok in my local experimentation and looked to be simple. After a few revisions I'm kinda sick of it now 😓 The backoff/retry was working, just not necessarily according to the timing options I set. It did seem to recognise the error types to catch etc. so I'm unsure about how some but not all of the arguments are being respected... |
Mmmmm, yeah this does look tricky. I'd investigate it further but I'm not too clear on how to reproduce this, and I also do not have the perms to see that batch so most of this is a shot in the dark for me. If you have the bandwidth to try the retries/backoff in the object/client, we can rule out/assess next steps from there and I'll look into it a bit more then :( |
Can we just roll our own? This should be fairly straightforward to implement. |
From the logging messages emitted during graphQL queries it looks like the backoff isn't running for nearly as long as the code suggests it should.
Ref:
metamist/metamist/graphql/__init__.py
Line 145 in f1b9811
Batch example: https://batch.hail.populationgenomics.org.au/batches/445352/jobs/1
Logging:
The text was updated successfully, but these errors were encountered: