Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backoff still not working well #769

Open
MattWellie opened this issue May 9, 2024 · 4 comments
Open

Backoff still not working well #769

MattWellie opened this issue May 9, 2024 · 4 comments
Assignees
Labels
low priority don't sweat it

Comments

@MattWellie
Copy link
Contributor

MattWellie commented May 9, 2024

From the logging messages emitted during graphQL queries it looks like the backoff isn't running for nearly as long as the code suggests it should.

Ref:

@backoff.on_exception(

Batch example: https://batch.hail.populationgenomics.org.au/batches/445352/jobs/1

Logging:

2024-05-09 00:39:35 INFO (backoff 105): Backing off query(...) for 0.4s (gql.transport.exceptions.TransportServerError: 504 Server Error: Gateway Timeout for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
2024-05-09 00:44:36 ERROR (backoff 120): Giving up query(...) after 2 tries (gql.transport.exceptions.TransportServerError: 504 Server Error: Gateway Timeout for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
> docker run australia-southeast1-docker.pkg.dev/cpg-common/images/cpg_workflows:latest pip show metamist
Name: metamist
Version: 6.10.1
...
@MattWellie MattWellie added the low priority don't sweat it label May 9, 2024
@nevoodoo nevoodoo self-assigned this May 30, 2024
@nevoodoo
Copy link
Collaborator

Debugging

So I think the reason for this could be that the backoff decorator only works when you have a session instantiated like:

session = await client.connect_async(reconnecting=True)

result = await session.execute(query)

Should also be noted that decorator effects don't get passed to methods that are called inside the the wrapped method so it is possible that the code in

response = (client or configure_sync_client()).execute_sync(
_query if isinstance(_query, DocumentNode) else gql(_query),
variable_values=variables,
)

does not pass the decorator effects to the session object that is initiated inside execute_sync as needed.

Potential Solution

An approach that could be worth trying is to modify the RequestsHTTPTransport instance here:

transport = RequestsHTTPTransport(
url=url or get_sm_url(),
headers={'Authorization': f'Bearer {token}'},
)

to

transport = RequestsHTTPTransport(
            url=url or get_sm_url(),
            headers={'Authorization': f'Bearer {token}'},
            retries=6,
            retry_backoff_factor=1.0,
        )

This would make 6 retries, spaced at 1s, 2s, 4s, 8s, 16s, 32s before giving up. I believe this can be tried without the backoff decorator.

Alternatively, we could just instantiate the session variable in the code and try to use the decorator like we are right now and see if that helps.

@MattWellie
Copy link
Contributor Author

Happy to try out the retries/backoff in that object/client. I only went with the backoff decorator as it was working A-ok in my local experimentation and looked to be simple. After a few revisions I'm kinda sick of it now 😓

The backoff/retry was working, just not necessarily according to the timing options I set. It did seem to recognise the error types to catch etc. so I'm unsure about how some but not all of the arguments are being respected...

@nevoodoo
Copy link
Collaborator

Mmmmm, yeah this does look tricky. I'd investigate it further but I'm not too clear on how to reproduce this, and I also do not have the perms to see that batch so most of this is a shot in the dark for me. If you have the bandwidth to try the retries/backoff in the object/client, we can rule out/assess next steps from there and I'll look into it a bit more then :(

@illusional
Copy link
Collaborator

Can we just roll our own? This should be fairly straightforward to implement.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
low priority don't sweat it
Projects
None yet
Development

No branches or pull requests

3 participants