-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spike generate popular tasks using BigQuery #3761
Draft
beccapearce
wants to merge
6
commits into
main
Choose a base branch
from
SPIKE-generate-popular-tasks-extra
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
39029ed
Add local_env to the gitignore
beccapearce 4cc4901
Initial BigQuery setup
beccapearce 046d618
Add a BigQuery query to get some initial data
beccapearce 794a1c2
Render popular tasks in view
beccapearce bcebb64
Implement basic caching for the popular tasks
beccapearce b6d0eff
Implement backup cache for popular tasks
beccapearce File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -28,3 +28,6 @@ | |
# vim swap files and tags | ||
*.sw[a-z] | ||
/tags | ||
|
||
# Ignore local config | ||
config/local_env.yml |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,27 +1,33 @@ | ||
module BrowseHelper | ||
def display_popular_links_for_slug?(slug) | ||
I18n.exists?(slug.to_s, scope: "browse.popular_links") | ||
def slug(path = base_path) | ||
path.sub(%r{.*(?=/browse/)}, "") | ||
end | ||
|
||
def display_popular_tasks_for_slug?(slug) | ||
%w[benefits business].include?(slug) | ||
end | ||
|
||
def popular_links_for_slug(slug) | ||
links = I18n.t(slug.to_s, scope: "browse.popular_links") | ||
count = links.length | ||
links.map.with_index(1) do |link, index| | ||
{ | ||
text: link[:title], | ||
href: link[:url], | ||
data_attributes: { | ||
module: "ga4-link-tracker", | ||
ga4_track_links_only: "", | ||
ga4_link: { | ||
event_name: "navigation", | ||
type: "action", | ||
index_link: index, | ||
index_total: count, | ||
text: link[:title], | ||
}, | ||
}, | ||
} | ||
browse_page = slug(slug) | ||
|
||
# Cache keys for the specific browse page | ||
cache_key_latest = "popular_tasks_#{browse_page}_#{Date.yesterday.strftime("%Y-%m-%d")}" | ||
cache_key_backup = "popular_tasks_backup_#{browse_page}" | ||
|
||
# Try to fetch the latest cache first | ||
popular_task_data = Rails.cache.read(cache_key_latest) | ||
|
||
# If the latest cache doesn't exist, fall back to the backup cache | ||
if popular_task_data.nil? | ||
# Falling back to backup cache | ||
popular_task_data = Rails.cache.read(cache_key_backup) | ||
end | ||
|
||
# If both caches are empty, fetch fresh data and cache it | ||
if popular_task_data.nil? | ||
popular_task_data = PopularTasks.new.fetch_data("/browse/#{browse_page}") | ||
end | ||
|
||
popular_task_data | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,25 @@ | ||
require "google/cloud/bigquery" | ||
require "googleauth" | ||
|
||
class Bigquery | ||
include Google::Auth | ||
|
||
def self.build | ||
new.build | ||
end | ||
|
||
def build | ||
credentials = { | ||
"client_email" => ENV["BIGQUERY_CLIENT_EMAIL"], | ||
"private_key" => ENV["BIGQUERY_PRIVATE_KEY"], | ||
} | ||
|
||
Google::Cloud::Bigquery.new( | ||
project_id: ENV["BIGQUERY_PROJECT"], | ||
credentials: Google::Auth::ServiceAccountCredentials.make_creds( | ||
json_key_io: StringIO.new(credentials.to_json), | ||
scope: ["https://www.googleapis.com/auth/bigquery"], | ||
), | ||
) | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,71 @@ | ||
class PopularTasks | ||
CACHE_EXPIRATION = 24.hours # Set the cache expiration time | ||
BACKUP_CACHE_EXPIRATION = 7.days # Backup cache can have a longer expiration | ||
|
||
def initialize; end | ||
|
||
def client | ||
@client ||= Bigquery.build | ||
end | ||
|
||
def fetch_data(browse_page, date: Date.yesterday) | ||
@fetch_data = client | ||
@date = date.strftime("%Y-%m-%d") | ||
|
||
cache_key_latest = "popular_tasks_#{browse_page}_#{@date}" | ||
cache_key_backup = "popular_tasks_backup_#{browse_page}" | ||
|
||
Rails.cache.fetch(cache_key_latest, expires_in: CACHE_EXPIRATION) do | ||
# If cache is empty, this block is executed | ||
query = <<~SQL | ||
WITH cte1 as (SELECT | ||
event_date, | ||
event_name, | ||
search_term, | ||
cleaned_page_location, | ||
cleaned_page_referrer, | ||
link_url, | ||
count(event_name) as click, | ||
|
||
FROM `ga4-analytics-352613.flattened_dataset.flattened_daily_ga_data_*` | ||
WHERE _TABLE_SUFFIX = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 2 DAY)) | ||
-- WHERE _table_suffix IN ('20240708', '20240709','20240710','20240711','20240712','20240713','20240714') | ||
group by 1,2,3,4,5,6), | ||
|
||
CTE2 as (SELECT | ||
event_date, | ||
sum(click) as clicks, | ||
cleaned_page_referrer as BrowsePage, | ||
search_term, | ||
ROW_NUMBER() OVER(PARTITION BY cleaned_page_referrer ORDER BY click DESC) Rank, | ||
link_url as SearchDestPage | ||
FROM cte1 | ||
WHERE event_name = 'select_item' | ||
AND cleaned_page_referrer = '#{browse_page}' | ||
AND cleaned_page_location = '/search/all' | ||
group by click,event_date,cleaned_page_referrer,search_term,link_url | ||
order by cleaned_page_referrer,Rank asc) | ||
|
||
SELECT | ||
* | ||
FROM CTE2 | ||
WHERE Rank <6 | ||
SQL | ||
|
||
data = @fetch_data.query(query).all | ||
@results = data.map do |row| | ||
{ | ||
url: row[:SearchDestPage], # Using SearchDestPage as the link URL | ||
browse_page: row[:BrowsePage], # Using BrowsePage as the L1 browse | ||
rank: row[:Rank], # Rank to order the links | ||
} | ||
end | ||
@results.sort_by { |link| link[:rank] } # Order the links by their rank | ||
|
||
# Cache the results in the backup cache as well | ||
Rails.cache.write(cache_key_backup, @results, expires_in: BACKUP_CACHE_EXPIRATION) | ||
|
||
@results | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to think a bit more about how this would work.
If the bigquery data was unavailable for more than 7 days then what happens?
I can think of other ways to do it - but this feels like a problem that must have been solved many times before. i.e. Only expire the cache if fresh data is available to fill it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've caught up now... the cache will expire regardless of whether or not the API responds so I understand the need for a backup. And I like the idea of writing to the backup at the same time as you fetch the fresh data.