-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike 2 big query #3773
Draft
hannako
wants to merge
8
commits into
main
Choose a base branch
from
spike_2_big_query
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Spike 2 big query #3773
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Local env is used only for locally testing, this is to ensure secrets are not accidentally pushed.
- Create a BigQuery service so the app can talk to BigQuery NB in order to use this you must add the credentials to config/local_env.yml. There are instructions on how to do this in the [dev docs](https://docs.publishing.service.gov.uk/repos/content-data-api/google_analytics_setup.html)
- Create a popular tasks name space for these two services - Ideally biqquery would have a simple table per browse page, and Collections would just be scraping the top 6 links. Would need to chat to a PA to see what table structure is most cost effective (maybe better to have a single table for all browse) For performance reasons, I think we should be relying on the processing of data at the big query side, and Collections should just fetch the minimum amount of data with little/no processing at the application side. This is a different approach to that taken in 046d618 This commit needs some work on the tests.
hannako
force-pushed
the
spike_2_big_query
branch
from
September 20, 2024 21:16
16b5c6a
to
120d27d
Compare
To parse the raw data
hannako
force-pushed
the
spike_2_big_query
branch
from
September 20, 2024 21:18
120d27d
to
426aa27
Compare
hannako
force-pushed
the
spike_2_big_query
branch
from
September 20, 2024 22:21
be16af9
to
4cf28ca
Compare
https://cloud.google.com/bigquery/docs/error-messages#errortable If the cache expires and we can't fetch fresh data we should fall back to a backup that is rebuilt via a nightly job.
Small refactor, so that we can re-use this method to fetch data without having an expiry (maybe!)
hannako
force-pushed
the
spike_2_big_query
branch
from
September 20, 2024 22:25
4cf28ca
to
31e5c2e
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Follow these steps if you are doing a Rails upgrade.