-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add dispatchable workflow for deleting test model runs #60
Conversation
145f5a7
to
e5e9404
Compare
…ete_current_year_model_runs.R
e5e9404
to
c624b0b
Compare
42d4fc9
to
2c8edb9
Compare
2c8edb9
to
17fcf90
Compare
2e7a3ba
to
cf4ff3c
Compare
R/delete_current_year_model_runs.R
Outdated
# Convert the comma-delimited input to a vector of run IDs. Accepting one or | ||
# more positional arguments would be a cleaner UX, but since this script is | ||
# intended to be called from a dispatched GitHub workflow, it's easier to parse | ||
# one comma-delimited string than split a space-separated string passed as a | ||
# workflow input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agonized over this interface, perhaps unnecessarily; I'm curious if you agree that a single comma-delimited string is the right compromise!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine! Though we might want to use a space-delimited string in order to match build-and-test-dbt. It would have to be quoted i.e. not positional args. I'm really fine with either though as long as it's clear what the input is supposed to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool, I switched the workflow input to accept a space-delimited string in a4895c2.
R/delete_current_year_model_runs.R
Outdated
# The following heuristic determines the current upcoming assessment cycle year: | ||
# | ||
# * From April to December (post assessment), `year` = next year | ||
# * From January to March (during assessment), `year` = current year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this heuristic seem reasonable to you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! I would bump the cutoff month to May though, as sometimes we still do model touchup stuff in April.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in a4895c2!
"Deleting S3 artifacts run IDs in year {year}: {run_ids}" %>% | ||
glue::glue() %>% | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To test deletion, I copied over run artifacts for two different model runs from year=2023/
to year=2024/
and deleted the 2024 versions. Example workflow here: https://github.com/ccao-data/model-res-avm/actions/runs/6908697011/job/18798517688
In order to delete existing test model runs from year=2023/
, my thinking was that we could make a list of model runs we want to delete, and then I can move them from year=2023/
to year=2024/
and delete them. I'll need your signoff on this list to make sure I don't delete anything important, and I'll plan to do this task after merging this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me! Let's also delete other test runs created during the model infra construction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, will do!
@jeancochrane Just a quick answer to the PR body before a full review: no, we don't need to port this over to the condo model, since it shares the exact same run/output structure. We can just use this workflow to delete runs from both models, even if it's a bit confusing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great @jeancochrane. This will be quite handy once we start flooding S3 with useless test models :D
run: Rscript ./R/delete_current_year_model_runs.R "$RUN_IDS" | ||
shell: bash | ||
env: | ||
RUN_IDS: ${{ inputs.run-ids }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nitpick: Can't you just pass the input directly here?
run: Rscript ./R/delete_current_year_model_runs.R "$RUN_IDS" | |
shell: bash | |
env: | |
RUN_IDS: ${{ inputs.run-ids }} | |
run: Rscript ./R/delete_current_year_model_runs.R ${{ inputs.run-ids }} | |
shell: bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be indeed be cleaner! I decided to switch the format of the run-ids
input variable to expect a space-separated string in a4895c2, however, so I made a slightly different change here, instead using the RUN_IDS
env var to strip spaces and replace them with commas. (My thinking is that spaces provide a clearer interface for the GitHub workflow, but on the off chance we ever need to run R/delete_current_year_model_runs.R
manually, a single space-delimited string would be a much more confusing command line interface than a single comma-delimited string. Accepting a space-delimited string in the GitHub UI but a comma-delimited string via the command line seems like a reasonable compromise to me.)
R/delete_current_year_model_runs.R
Outdated
# Convert the comma-delimited input to a vector of run IDs. Accepting one or | ||
# more positional arguments would be a cleaner UX, but since this script is | ||
# intended to be called from a dispatched GitHub workflow, it's easier to parse | ||
# one comma-delimited string than split a space-separated string passed as a | ||
# workflow input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's fine! Though we might want to use a space-delimited string in order to match build-and-test-dbt. It would have to be quoted i.e. not positional args. I'm really fine with either though as long as it's clear what the input is supposed to be.
workflow_dispatch: | ||
inputs: | ||
run-ids: | ||
description: > |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggestion: IMO it'd be really helpful here to have a default input showing an example (nonexistent) run ID.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, done in a4895c2! Note that the trade-off here is that default arguments are populated as actual values in the workflow input form, rather than placeholder text, so there's a chance of a caller misinterpreting the form and submitting it with the (fake) default values. This should just raise an error and not actually delete any data, however, so the risk is worth it in my view.
"Deleting S3 artifacts run IDs in year {year}: {run_ids}" %>% | ||
glue::glue() %>% | ||
print() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works for me! Let's also delete other test runs created during the model infra construction.
R/delete_current_year_model_runs.R
Outdated
# The following heuristic determines the current upcoming assessment cycle year: | ||
# | ||
# * From April to December (post assessment), `year` = next year | ||
# * From January to March (during assessment), `year` = current year |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Totally! I would bump the cutoff month to May though, as sometimes we still do model touchup stuff in April.
…esting" This reverts commit 09cfa5b.
Thanks for the excellent review @dfsnow! I made some tweaks to the interface based on your comments, so I'm requesting one final review to make sure the new interface looks good to you. I tested via a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me @jeancochrane!
@@ -52,7 +53,7 @@ jobs: | |||
aws-region: us-east-1 | |||
|
|||
- name: Delete model runs | |||
run: Rscript ./R/delete_current_year_model_runs.R "$RUN_IDS" | |||
run: Rscript ./R/delete_current_year_model_runs.R "${RUN_IDS// /,}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
praise: Good thinking!
This PR adds a new workflow,
delete-model-runs
that can be dispatched with a comma-delimited string of model run IDs to delete models that have been run for the current assessment cycle.Models in previous assessment cycles are protected from deletion in two ways:
One high level question about the condo model: I've been assuming that we need to port these changes over to the condo model before we can count #56 as complete, but taking a look at the
file_dict.csv
s for both repos, it seems like there aren't actually any differences in the output paths for the model artifacts; the models are simply distinguished by their autogenerated run IDs. Do we need to port this code over to the condo model, or is it good enough to use the workflow in this repo to delete condo model runs? I think the tradeoff is that it would be more confusing from an operational perspective to need to delete condo model runs from the context of amodel-res-avm
workflow, but at the same time it would make long-term maintenance easier since we don't need to maintain two copies of thedelete-model-runs
workflow (I'm assuming that it's not worth extracting this logic out into a shared external workflow as we did forbuild-and-run-model
because I don't imagine that this logic will be particularly useful outside the context of the res and condo model repos).Connects #56.