Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit DVC deps to include cmd run files #240

Merged
34 changes: 34 additions & 0 deletions dvc.lock
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,11 @@ schema: '2.0'
stages:
ingest:
cmd: Rscript pipeline/00-ingest.R
deps:
- path: pipeline/00-ingest.R
hash: md5
md5: c04f9224e873b1ee29a64fa68aa6c8d9
size: 23355
params:
params.yaml:
assessment:
Expand Down Expand Up @@ -61,6 +66,10 @@ stages:
train:
cmd: Rscript pipeline/01-train.R
deps:
- path: pipeline/01-train.R
hash: md5
md5: 46115d48cf066d35b0db14dc13a8d9b3
size: 17448
- path: input/training_data.parquet
hash: md5
md5: 680e07bdb2a55166b7070155c4ff5a38
Expand Down Expand Up @@ -332,6 +341,10 @@ stages:
assess:
cmd: Rscript pipeline/02-assess.R
deps:
- path: pipeline/02-assess.R
hash: md5
md5: 5e8c9b7d547ea41d9ec9441465e6e275
size: 22749
- path: input/assessment_data.parquet
hash: md5
md5: 5450bfd412c9b552a1a2722b04e49706
Expand Down Expand Up @@ -520,6 +533,10 @@ stages:
evaluate:
cmd: Rscript pipeline/03-evaluate.R
deps:
- path: pipeline/03-evaluate.R
hash: md5
md5: d33c8e642e5e29a0683463ce885771f8
size: 16292
- path: output/assessment_pin/model_assessment_pin.parquet
hash: md5
md5: f5641cb4506847814181996692064b6e
Expand Down Expand Up @@ -577,6 +594,10 @@ stages:
interpret:
cmd: Rscript pipeline/04-interpret.R
deps:
- path: pipeline/04-interpret.R
hash: md5
md5: 1cc57c0bcdaf2725fa343c6d88c1592c
size: 9619
- path: input/assessment_data.parquet
hash: md5
md5: 582a6197429e99ee24271a3d4f9e9323
Expand Down Expand Up @@ -700,6 +721,10 @@ stages:
finalize:
cmd: Rscript pipeline/05-finalize.R
deps:
- path: pipeline/05-finalize.R
hash: md5
md5: 5c5a5100ebae2013bc24e8f9333d136b
size: 8762
- path: output/intermediate/timing/model_timing_assess.parquet
hash: md5
md5: 5f93cb109c073d91a9c9b55b3a56755b
Expand Down Expand Up @@ -991,6 +1016,11 @@ stages:
size: 73
export:
cmd: Rscript pipeline/07-export.R
deps:
- path: pipeline/07-export.R
hash: md5
md5: b4615315b52165eed4a030c94def015b
size: 33718
params:
params.yaml:
assessment.year: '2023'
Expand Down Expand Up @@ -1024,6 +1054,10 @@ stages:
upload:
cmd: Rscript pipeline/06-upload.R
deps:
- path: pipeline/06-upload.R
hash: md5
md5: 3b7d11c518447cf6c14ec7668c488968
size: 11733
- path: output/assessment_card/model_assessment_card.parquet
hash: md5
md5: 7f558cd27ce54a39390180383a0af3fc
Expand Down
11 changes: 10 additions & 1 deletion dvc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ stages:
desc: >
Ingest training and assessment data from Athena + generate townhome
complex identifiers
deps:
- pipeline/00-ingest.R
params:
- assessment
- input
Expand All @@ -15,14 +17,14 @@ stages:
- input/land_nbhd_rate_data.parquet
- input/land_site_rate_data.parquet
- input/training_data.parquet
frozen: true

train:
cmd: Rscript pipeline/01-train.R
desc: >
Train a LightGBM model with cross-validation. Generate model objects,
data recipes, and predictions on the test set (most recent 10% of sales)
deps:
- pipeline/01-train.R
- input/training_data.parquet
params:
- cv
Expand Down Expand Up @@ -58,6 +60,7 @@ stages:
County. Also generate flags, calculate land values, and make any
post-modeling changes
deps:
- pipeline/02-assess.R
- input/training_data.parquet
- input/assessment_data.parquet
- input/complex_id_data.parquet
Expand Down Expand Up @@ -86,6 +89,7 @@ stages:
2. An assessor-specific ratio study comparing estimated assessments to
the previous year's sales
deps:
- pipeline/03-evaluate.R
- output/test_card/model_test_card.parquet
- output/assessment_pin/model_assessment_pin.parquet
params:
Expand All @@ -109,6 +113,7 @@ stages:
Generate SHAP values for each card and feature as well as feature
importance metrics for each feature
deps:
- pipeline/04-interpret.R
- input/assessment_data.parquet
- input/training_data.parquet
- output/assessment_card/model_assessment_card.parquet
Expand All @@ -134,6 +139,7 @@ stages:
Save run timings and run metadata to disk and render a performance report
using Quarto.
deps:
- pipeline/05-finalize.R
- output/intermediate/timing/model_timing_train.parquet
- output/intermediate/timing/model_timing_assess.parquet
- output/intermediate/timing/model_timing_evaluate.parquet
Expand Down Expand Up @@ -164,6 +170,7 @@ stages:
outputs prior to upload and attach a unique run ID. This step requires
access to the CCAO Data AWS account, and so is assumed to be internal-only
deps:
- pipeline/06-upload.R
- output/parameter_final/model_parameter_final.parquet
- output/parameter_range/model_parameter_range.parquet
- output/parameter_search/model_parameter_search.parquet
Expand All @@ -189,6 +196,8 @@ stages:
Generate Desk Review spreadsheets and iasWorld upload CSVs from a finished
run. NOT automatically run since it is typically only run once. Manually
run once a model is selected
deps:
- pipeline/07-export.R
params:
- assessment.year
- input.min_sale_year
Expand Down