Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add robustness to gql SG query #950

Merged
merged 2 commits into from
Oct 24, 2024
Merged

Conversation

MattWellie
Copy link
Contributor

@MattWellie MattWellie commented Oct 24, 2024

The backoff wrapper around query in metamist here, which I @MattWellie implemented, is trash. It looks (to me) like it's parameterised appropriately, but the pauses between re-attempts are a joke, e.g.:

2024-10-24 01:23:33 INFO (backoff 105): Backing off query(...) for 0.3s (gql.transport.exceptions.TransportServerError: 504 Server Error: Gateway Timeout for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
2024-10-24 01:23:33 INFO (root 342): Getting sequencing groups for dataset perth-neuro
2024-10-24 01:23:34 INFO (root 342): Getting sequencing groups for dataset ravenscroft-arch
2024-10-24 01:28:34 INFO (backoff 105): Backing off query(...) for 0.5s (gql.transport.exceptions.TransportServerError: 504 Server Error: Gateway Timeout for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)
2024-10-24 01:33:35 ERROR (backoff 120): Giving up query(...) after 2 tries (gql.transport.exceptions.TransportServerError: 504 Server Error: Gateway Timeout for url: https://sample-metadata-api-mnrpw3mdza-ts.a.run.app/graphql)

Three failures and one successful call, all in 2 seconds. The pipeline startup still does a ton of Metamist interaction, and when setting up for more than ~5 projects, the startup routinely fails. Example job.

Elsewhere in metamist.py @milo-hyben has added tenacity decorators to retry failing calls, usually with a wrapper method to avoid directly wrapping the gql query. I don't know whether this or that is a better fix, but please can we have some more robustness during pipeline setup.

@codecov-commenter
Copy link

codecov-commenter commented Oct 24, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 78.42%. Comparing base (dcc2f18) to head (fad586e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #950      +/-   ##
==========================================
+ Coverage   78.41%   78.42%   +0.01%     
==========================================
  Files          10       10              
  Lines        1793     1794       +1     
==========================================
+ Hits         1406     1407       +1     
  Misses        387      387              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Collaborator

@milo-hyben milo-hyben left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey Matt, very good point regarding inconsistency.
Usually REST API do not implement backoff, it is up to a client consuming those APIs.
I guess that would be reason why none of our metamist REST API have it.
I had created my wrapper around RESTFull metamist API in production pipelines.

Regarding Metamist GraphQL Query, that is kind of different API, it is not RESTfull and I suspect idea was to add backoff there at some stage in the past.

I do not see any problem with your implementation of backoff in production-pipelines, but we should really get it fixed in Metamist GraphQL Query part.
I had created a JIRA task for fixing metamist (SET-220).

@MattWellie MattWellie merged commit 1126264 into main Oct 24, 2024
4 checks passed
@MattWellie MattWellie deleted the add-robustness-to-gql-calls branch October 24, 2024 23:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants