Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spike 2 big query #3773

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from
Draft

Spike 2 big query #3773

wants to merge 8 commits into from

Conversation

hannako
Copy link
Contributor

@hannako hannako commented Sep 20, 2024

⚠️ This repo is Continuously Deployed: make sure you follow the guidance ⚠️

Follow these steps if you are doing a Rails upgrade.

beccapearce and others added 3 commits September 20, 2024 11:12
Local env is used only for locally testing, this is to ensure secrets
are not accidentally pushed.
- Create a BigQuery service so the app can talk to BigQuery

NB in order to use this you must add the credentials to
config/local_env.yml. There are instructions on how to do this in the
[dev docs](https://docs.publishing.service.gov.uk/repos/content-data-api/google_analytics_setup.html)
- Create a popular tasks name space for these two services

- Ideally biqquery would have a simple table per browse page, and Collections would just be scraping the top 6 links.
Would need to chat to a PA to see what table structure is most cost effective (maybe better to have a single table for all browse)
For performance reasons, I think we should be relying on the processing of data at the big query side, and Collections should just
fetch the minimum amount of data with little/no processing at the application side.
This is a different approach to that taken in 046d618

This commit needs some work on the tests.
To parse the raw data
https://cloud.google.com/bigquery/docs/error-messages#errortable

If the cache expires and we can't fetch fresh data we should fall back
to a backup that is rebuilt via a nightly job.
Small refactor, so that we can re-use this method to fetch data
without having an expiry (maybe!)
@govuk-ci govuk-ci temporarily deployed to collections-pr-3773 September 20, 2024 22:25 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants