-
Notifications
You must be signed in to change notification settings - Fork 5
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add dispatchable workflow for deleting test model runs (#60)
* Add delete-model-runs workflow * Temporarily run a readonly version of the delete-model-runs workflow for testing * Install libgit2-dev for git2r in delete-model-runs.yaml * Fix R styler error in delete_current_year_model_runs.R * Clean up docstrings and test against 2023-11-14-frosty-jacob model * Satisfy pre-commit * Revert to workflow_dispatch event trigger for delete-model-runs.yaml * Revert "Revert to workflow_dispatch event trigger for delete-model-runs.yaml" This reverts commit 51843f3. * Fix typo in Delete model runs step of delete-model-runs workflow * Test obviously bogus value for run-ids to delete-model-runs workflow * Raise an error in delete_current_year_model_runs.R if no objects were deleted * Disable renv sandbox to speed up install/run times * Run delete_current_year_model_runs.R on all run IDs at once * Check for validity of run IDs before issuing delete operations in delete_current_year_model_runs.R * Refactor run ID validity check in delete_current_year_model_runs to be nondestructive * Try deleting multiple invalid run IDs * Refactor delete_current_year_model_runs.R to accept a comma-delimited string of run IDs * Test a delete-model-runs workflow where one run is valid and one isn't * Test a delete-model-runs workflow where all run IDs are valid * Revert to workflow_dispatch trigger for delete-model-runs.yaml * Remove extraneous print statement from delete_current_year_model_runs.R * Clean up delete-model-runs and associated script in response to review * Temporarily run delete-model-runs on pull_request event for testing * Revert "Temporarily run delete-model-runs on pull_request event for testing" This reverts commit 09cfa5b.
- Loading branch information
1 parent
bb5d2c7
commit c5dda8e
Showing
3 changed files
with
159 additions
and
14 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,59 @@ | ||
# Workflow that can be manually dispatched to delete test model runs that | ||
# do not need to be persisted indefinitely. | ||
# | ||
# Gated such that it's impossible to delete runs older than the current | ||
# assessment cycle, where each assessment cycle starts in April. | ||
|
||
name: delete-model-runs | ||
|
||
on: | ||
workflow_dispatch: | ||
inputs: | ||
run-ids: | ||
description: > | ||
Run IDs: Space-delimited list of IDs of model runs to delete. Note | ||
that the workflow assumes these IDs correspond to model runs for the | ||
current assessment cycle, and if that's not the case the deletion | ||
script will raise an error. | ||
required: true | ||
type: string | ||
default: 2024-01-01-foo-bar 2024-01-02-bar-baz | ||
|
||
jobs: | ||
delete-model-runs: | ||
runs-on: ubuntu-latest | ||
permissions: | ||
# Needed to interact with GitHub's OIDC Token endpoint so we can auth AWS | ||
contents: read | ||
id-token: write | ||
steps: | ||
- name: Checkout repo code | ||
uses: actions/checkout@v4 | ||
|
||
- name: Setup R | ||
uses: r-lib/actions/setup-r@v2 | ||
with: | ||
use-public-rspm: true | ||
|
||
- name: Install system dependencies | ||
run: sudo apt-get install libgit2-dev | ||
shell: bash | ||
|
||
- name: Disable renv sandbox to speed up install time | ||
run: echo "RENV_CONFIG_SANDBOX_ENABLED=FALSE" >> .Renviron | ||
shell: bash | ||
|
||
- name: Setup renv | ||
uses: r-lib/actions/setup-renv@v2 | ||
|
||
- name: Configure AWS credentials | ||
uses: aws-actions/configure-aws-credentials@v4 | ||
with: | ||
role-to-assume: ${{ secrets.AWS_IAM_ROLE_MODEL_DELETION_ARN }} | ||
aws-region: us-east-1 | ||
|
||
- name: Delete model runs | ||
run: Rscript ./R/delete_current_year_model_runs.R "${RUN_IDS// /,}" | ||
shell: bash | ||
env: | ||
RUN_IDS: ${{ inputs.run-ids }} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
# Script to delete a list of model runs by ID from AWS. | ||
# | ||
# Accepts one argument, a comma-delimited list of run IDs for model runs | ||
# whose artifacts should be deleted. | ||
# | ||
# Assumes that model runs are restricted to the current assessment cycle, where | ||
# each assessment cycle starts in May. Raises an error if no objects matching | ||
# a given ID for the current year could be located in S3. This error will get | ||
# raised before any deletion occurs, so if one or more IDs are invalid then | ||
# no objects will be deleted. | ||
# | ||
# Example usage (delete the runs 123, 456, and 789 in the current year): | ||
# | ||
# delete_current_year_model_runs.R 123,456,789 | ||
|
||
library(glue) | ||
library(here) | ||
library(magrittr) | ||
source(here("R", "helpers.R")) | ||
|
||
current_date <- as.POSIXct(Sys.Date()) | ||
current_month <- current_date %>% format("%m") | ||
current_year <- current_date %>% format("%Y") | ||
|
||
# The following heuristic determines the current upcoming assessment cycle year: | ||
# | ||
# * From May to December (post assessment), `year` = next year | ||
# * From January to April (during assessment), `year` = current year | ||
year <- if (current_month < "05") { | ||
current_year | ||
} else { | ||
as.character(as.numeric(current_year) + 1) | ||
} | ||
|
||
# Convert the comma-delimited input to a vector of run IDs. Accepting one or | ||
# more positional arguments would be a cleaner UX, but since this script is | ||
# intended to be called from a dispatched GitHub workflow, it's easier to parse | ||
# one comma-delimited string than convert a space-separated string passed as a | ||
# workflow input to an array of function arguments | ||
raw_run_ids <- commandArgs(trailingOnly = TRUE) | ||
run_ids <- raw_run_ids %>% | ||
strsplit(split = ",", fixed = TRUE) %>% | ||
unlist() | ||
|
||
"Confirming artifacts exist for run IDs in year {year}: {raw_run_ids}" %>% | ||
glue::glue() %>% | ||
print() | ||
|
||
# We consider a run ID to be valid if it has any matching data in S3 for | ||
# the current year | ||
run_id_is_valid <- function(run_id, year) { | ||
return( | ||
model_get_s3_artifacts_for_run(run_id, year) %>% | ||
sapply(aws.s3::object_exists) %>% | ||
any() | ||
) | ||
} | ||
|
||
# We check for validity separate from the deletion operation for two reasons: | ||
# | ||
# 1. The aws.s3::delete_object API does not raise an error if an object does | ||
# not exist, so a delete operation alone won't alert us for an incorrect | ||
# ID | ||
# 2. Even if aws.s3::delete_object could raise an error for missing objects, | ||
# we want to alert the caller that one or more of the IDs were incorrect | ||
# before deleting any objects so that this script is nondestructive | ||
# in the case of a malformed ID | ||
valid_run_ids <- run_ids %>% sapply(run_id_is_valid, year = year) | ||
|
||
if (!all(valid_run_ids)) { | ||
invalid_run_ids <- run_ids[which(valid_run_ids == FALSE)] %>% | ||
paste(collapse = ", ") | ||
|
||
"Some run IDs are missing all S3 artifacts for {year}: {invalid_run_ids}" %>% | ||
glue::glue() %>% | ||
stop() | ||
} | ||
|
||
"Deleting S3 artifacts run IDs in year {year}: {run_ids}" %>% | ||
glue::glue() %>% | ||
print() | ||
|
||
run_ids %>% sapply(model_delete_run, year = year) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters