Skip to content

Commit

Permalink
Rename tag for iasWorld data tests and refactor script to run tests (#…
Browse files Browse the repository at this point in the history
…571)

* Update README.md to clarify role of dbt generics

* Make some updates to dbt/README.md to sketch out QC doc structure

* Add link to QC tests and reports section to root README.md

* Continue fleshing out dbt test docs in dbt/README.md

* Clean up data testing section of dbt/README.md

* Update dbt/README.md to add docs for running QC reports

* Finish fleshing out docs on QC reports in dbt/README.md

* Move transform_dbt_test_results.py script to dbt/scripts/ subdirectory

* Fix small issues in dbt/README.md

* Clean up some trailing whitespace in dbt/README.md

* Run pip install step in dbt/ subdir in test_dbt_models workflow

* Clarify bug described in unit tests section of dbt/README.md

* Clarify --rebuild docs in dbt/README.md

* Small clarification in dbt/README.md

* Apply a few small nitpicks from Dan's code review to dbt/README.md

Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>

* Standardize on "Valuations staff" in dbt/README.md

* Use relative paths in dbt/README.md

* Refactor meta.export_template option to not expect file extensions for export_models script

* Refactor docs to use clearer terminology for iasWorld data tests

* Rename tag for iasWorld data tests and refactor script to run tests

* Small tweak to dbt/README.md

* Better error handling for dbt invocation in run_iasworld_data_tests.py

* Rename `data_test_iasworld` selector -> `select_data_test_iasworld`

* Add --target flag to run_iasworld_data_tests script

---------

Co-authored-by: Dan Snow <31494343+dfsnow@users.noreply.github.com>
  • Loading branch information
jeancochrane and dfsnow authored Aug 19, 2024
1 parent bd0c7bc commit 348d7ec
Show file tree
Hide file tree
Showing 49 changed files with 1,799 additions and 296 deletions.
6 changes: 3 additions & 3 deletions .github/workflows/build_and_test_dbt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -84,14 +84,14 @@ jobs:
if [[ $MODIFIED_RESOURCES_ONLY == 'true' ]]; then
if [[ $MANUALLY_DISPATCHED == 'true' ]]; then
echo "Running tests on manually selected resources"
dbt test -t "$TARGET" -s ${{ inputs.models }} --exclude "tag:test_qc*" --defer --state "$STATE_DIR"
dbt test -t "$TARGET" -s ${{ inputs.models }} --exclude "tag:data_test_iasworld" --defer --state "$STATE_DIR"
else
echo "Running tests on modified/new resources only"
dbt test -t "$TARGET" -s state:modified state:new --exclude "tag:test_qc*" --defer --state "$STATE_DIR"
dbt test -t "$TARGET" -s state:modified state:new --exclude "tag:data_test_iasworld" --defer --state "$STATE_DIR"
fi
else
echo "Running tests on all resources"
dbt test -t "$TARGET" --exclude "tag:test_qc*"
dbt test -t "$TARGET" --exclude "tag:data_test_iasworld"
fi
working-directory: ${{ env.PROJECT_DIR }}
shell: bash
Expand Down
29 changes: 3 additions & 26 deletions .github/workflows/test_dbt_models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,37 +23,14 @@ jobs:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }}

- name: Install Python requirements
run: pip install -r scripts/requirements.transform_dbt_test_results.txt
run: pip install -r scripts/requirements.run_iasworld_data_tests.txt
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Run tests
run: |
# dbt doesn't differentiate between test failures and errors, but we
# need to since we expect failures. Do this by capturing the output
# and checking its contents to look for errors
if output=$(dbt test --target "$TARGET" --selector qc_tests --store-failures); then
echo "$output"
else
status_code="$?"
if [[ "$output" =~ "Runtime Error" || "$output" =~ "Compilation Error" ]]; then
# The presence of an error string indicates that these tests
# failed due to an error, so print the output and fail the
# pipeline
echo "$output"
exit "$status_code"
else
# The tests must have failed rather than errored out, so
# print the output but don't raise an error
echo "$output"
fi
fi
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Extract test results
run: |
python3 scripts/transform_dbt_test_results.py \
python3 scripts/run_iasworld_data_tests.py \
--target "$TARGET" \
--output-dir ./qc_test_results/
working-directory: ${{ env.PROJECT_DIR }}
shell: bash
Expand Down
46 changes: 17 additions & 29 deletions dbt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -541,14 +541,9 @@ built for iasWorld data tests.

#### Running iasWorld data tests

The iasWorld data test suite can be run using the [`dbt test`
command](https://docs.getdbt.com/reference/commands/test) with a dedicated
[selector](https://docs.getdbt.com/reference/node-selection/yaml-selectors)
and the [`--store-failures`
flag](https://docs.getdbt.com/reference/resource-configs/store_failures),
and its output can be transformed for review and analysis using the
[`transform_dbt_test_results` script](./scripts/transform_dbt_test_results.py).
This script reads the metadata for the most recent `dbt test` run and outputs a number of
The iasWorld data test suite can be run using the [`run_iasworld_data_tests`
script](./scripts/run_iasworld_data_tests.py).
This script runs the tests and reads the metadata for the run to output a number of
different artifacts with information about the tests:

* An Excel workbook with detailed information on each failure to aid in resolving
Expand All @@ -572,25 +567,19 @@ Typically, Valuations staff will ask for test output for a specific township. We
[township code](https://github.com/ccao-data/wiki/blob/master/Data/Townships.md) for this township
using the bash variable `$TOWNSHIP_CODE`.

First, run the tests locally using dbt and the [iasWorld data test
selector](./selectors.yml):
Run the tests locally using the [`run_iasworld_data_tests`
script](./scripts/run_iasworld_data_tests.yml):

```bash
# Make sure you're in the dbt subdirectory with the virtualenv activated
cd dbt
source venv/bin/activate

# Run the tests and store failures in Athena
dbt test --selector qc_tests --store-failures
# Run the script
python3 scripts/run_iasworld_data_tests.py --township $TOWNSHIP_CODE
```

Next, transform the results for the township that Valuations staff requested:

```bash
python3 scripts/transform_dbt_test_results.py --township $TOWNSHIP_CODE
```

Finally, spot check the Excel workbook that the script produced to make sure it's formatted
Then, check the Excel workbook that the script produced to make sure it's formatted
correctly, and send it to Valuations staff for review.

#### Adding iasWorld data tests
Expand All @@ -601,14 +590,14 @@ by the script:

* One of either the test or the model that the test is defined on must be
[tagged](https://docs.getdbt.com/reference/resource-configs/tags) with
the tag `test_qc_iasworld`
the tag `data_test_iasworld`
* Prefer tagging the model, and fall back to tagging the test if for
some reason the model cannot be tagged (e.g. if it has some non-QC
tests defined on it)
* If you would like to disable a data test but you don't want to remove it
altogether, you can tag it or its model with `test_qc_exclude_from_workbook`,
altogether, you can tag it or its model with `data_test_iasworld_exclude_from_workbook`,
which will prevent the test (or all of the model's tests, if you tagged
the model) from running as part of the `qc_tests` selector
the model) from running as part of the `select_data_test_iasworld` selector
* The test definition must supply a few specific parameters:
* `name` must be set and follow the pattern
`iasworld_<table_name>_<test_description>`
Expand All @@ -622,14 +611,14 @@ the tag `test_qc_iasworld`
test you're using
* `config.where` should typically set to provide a filter expression
that restricts tests to unique rows and to rows matching a date range
set by the `test_qc_year_start` and `test_qc_year_end`
set by the `data_test_iasworld_year_start` and `data_test_iasworld_year_end`
[project variables](https://docs.getdbt.com/docs/build/project-variables)
* `meta` should be set with a few specific string attributes:
* `description` (required): A short human-readable description of the test
* `category` (optional): A workbook category for the test, required if
a category is not defined for the test's generic in the `TEST_CATEGORIES`
constant in the [`transform_dbt_test_results`
script](./scripts/transform_dbt_test_results.py)
constant in the [`run_iasworld_data_tests`
script](./scripts/run_iasworld_data_tests.py)
* `table_name` (optional): The name of the table to report in the output
workbook, if the workbook should report a different table name than the
name of the model that the test is defined on
Expand Down Expand Up @@ -674,8 +663,8 @@ do so, you have two options:
and other tests. You'll also need to follow a few extra steps that are specific
to our environment:
1. Add a default category for your generic test in
the `TEST_CATEGORIES` constant in the [`transform_dbt_test_results`
script](./scripts/transform_dbt_test_results.py)
the `TEST_CATEGORIES` constant in the [`run_iasworld_data_tests`
script](./scripts/run_iasworld_data_tests.py)
2. Make sure that your generic test supports the `additional_select_columns`
parameter that most of our generic tests support, making use
of the `format_additional_select_columns` macro to format the
Expand Down Expand Up @@ -767,8 +756,7 @@ model during export:
reports at the same time, so we tag each model with the `qc_report_town_close` tag such that
we can select them all at once when running the `export_models` script using
`--select tag:qc_report_town_close`. For consistency, prefer tags that start with the `qc_report_*`
prefix, but beware not to use the `test_qc_*` prefix, which is instead used for [QC
tests](#adding-qc-tests).
prefix.
* **Filtering**: Since the `export_models` script can filter your model using the `--where`
option, you should define your model such that it selects any fields that you want to use
for filtering in the `SELECT` clause. It's common to filter reports by `taxyr` and
Expand Down
21 changes: 10 additions & 11 deletions dbt/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,19 +22,18 @@ clean-targets:
- "dbt_packages"

# Variables that can change with invocation of dbt commands. Note that we parse
# the test_* variables by name in scripts, so if you change their names, make
# sure you check all of our scripts and adjust them accordingly
# the data_test_iasworld_* variables by name in scripts, so if you change their
# names, make sure you check all of our scripts and adjust them accordingly
vars:
# Start year for testing data using test_qc_* tagged dbt tests. Typically
# this should be the current year, since errors in past data cannot usually
# be amended once records are closed. Set as an integer for compatibility
# with comparison operators and SQL's BETWEEN
test_qc_year_start: 2024
# Start year for iasWorld data tests. Typically this should be the current
# year, since errors in past data cannot usually be amended once records are
# closed. Set as an integer for compatibility with comparison operators and
# SQL's BETWEEN
data_test_iasworld_year_start: 2024

# End year for testing data using test_qc_* tagged dbt tests. Typically set
# to a date in the future, but can also be use to select specific time
# frames for testing
test_qc_year_end: 2030
# End year for iasWorld data tests. Typically set to a date in the future,
# but can also be use to select specific time frames for testing
data_test_iasworld_year_end: 2030

# Configuring models
# Full documentation: https://docs.getdbt.com/docs/configuring-models
Expand Down
2 changes: 1 addition & 1 deletion dbt/models/default/schema/default.vw_pin_address.yml
Original file line number Diff line number Diff line change
Expand Up @@ -64,7 +64,7 @@ models:
- mail_address_zipcode_1
- mail_address_zipcode_2
config:
where: CAST(year AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
where: CAST(year AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
error_if: ">900000"
- unique_combination_of_columns:
name: default_vw_pin_address_unique_by_14_digit_pin_and_year
Expand Down
4 changes: 2 additions & 2 deletions dbt/models/default/schema/default.vw_pin_appeal.yml
Original file line number Diff line number Diff line change
Expand Up @@ -91,7 +91,7 @@ models:
- mailed_tot
- certified_tot
config:
where: CAST(year AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
where: CAST(year AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
error_if: ">266719"
# `change` should be an enum
- expression_is_true:
Expand All @@ -103,7 +103,7 @@ models:
- case_no
- change
config:
where: CAST(year AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
where: CAST(year AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
- row_count:
name: default_vw_pin_appeal_row_count
above: 8407667 # as of 2023-11-22
Expand Down
18 changes: 9 additions & 9 deletions dbt/models/default/schema/default.vw_pin_value.yml
Original file line number Diff line number Diff line change
Expand Up @@ -80,59 +80,59 @@ models:
name: default_vw_pin_value_mailed_class_not_null
column_name: mailed_class
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }}
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }}
error_if: ">289"
- not_null:
name: default_vw_pin_value_certified_class_not_null
column_name: certified_class
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }}
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }}
error_if: ">15"
- not_null:
name: default_vw_pin_value_board_class_not_null
column_name: board_class
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }} - 1
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }} - 1
error_if: ">1260"
- not_null:
name: default_vw_pin_value_mailed_tot_not_null
column_name: mailed_tot
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }}
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }}
error_if: ">310"
- not_null:
name: default_vw_pin_value_certified_tot_not_null
column_name: certified_tot
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }}
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }}
error_if: ">15"
- not_null:
name: default_vw_pin_value_board_tot_not_null
column_name: board_tot
config:
where: CAST(year AS int) < {{ var('test_qc_year_start') }} - 1
where: CAST(year AS int) < {{ var('data_test_iasworld_year_start') }} - 1
error_if: ">1260"
- not_null:
name: default_vw_pin_value_mailed_tot_mv_not_null
column_name: mailed_tot_mv
config:
where: |
CAST(year AS int) < {{ var('test_qc_year_start') }} AND
CAST(year AS int) < {{ var('data_test_iasworld_year_start') }} AND
year >= '2021'
error_if: ">310"
- not_null:
name: default_vw_pin_value_certified_tot_mv_not_null
column_name: certified_tot_mv
config:
where: |
CAST(year AS int) < {{ var('test_qc_year_start') }} AND
CAST(year AS int) < {{ var('data_test_iasworld_year_start') }} AND
year >= '2021'
error_if: ">15"
- not_null:
name: default_vw_pin_value_board_tot_mv_not_null
column_name: board_tot_mv
config:
where: |
CAST(year AS int) < {{ var('test_qc_year_start') }} - 1 AND
CAST(year AS int) < {{ var('data_test_iasworld_year_start') }} - 1 AND
year >= '2020'
error_if: ">1260"
2 changes: 1 addition & 1 deletion dbt/models/iasworld/schema/iasworld.aasysjur.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sources:
loaded_at_field: date_parse(loaded_at, '%Y-%m-%d %H:%i:%S.%f')
tags:
- load_auto
- test_qc_iasworld
- data_test_iasworld

tables:
- name: aasysjur
Expand Down
8 changes: 4 additions & 4 deletions dbt/models/iasworld/schema/iasworld.addn.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sources:
loaded_at_field: date_parse(loaded_at, '%Y-%m-%d %H:%i:%S.%f')
tags:
- load_auto
- test_qc_iasworld
- data_test_iasworld
- type_res

tables:
Expand All @@ -30,7 +30,7 @@ sources:
- wen
config: &unique-conditions
where: |
CAST(taxyr AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
CAST(taxyr AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
AND cur = 'Y'
AND deactivat IS NULL
meta:
Expand Down Expand Up @@ -75,7 +75,7 @@ sources:
additional_select_columns: *select-columns
config:
where: |
CAST(taxyr AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
CAST(taxyr AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
AND class != 'EX'
AND cur = 'Y'
AND deactivat IS NULL
Expand All @@ -93,7 +93,7 @@ sources:
additional_select_columns: *select-columns
config:
where: |
CAST(taxyr AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
CAST(taxyr AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
meta:
description: cur should be 'Y' or 'D'
- name: deactivat
Expand Down
2 changes: 1 addition & 1 deletion dbt/models/iasworld/schema/iasworld.addrindx.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sources:
loaded_at_field: date_parse(loaded_at, '%Y-%m-%d %H:%i:%S.%f')
tags:
- load_auto
- test_qc_iasworld
- data_test_iasworld

tables:
- name: addrindx
Expand Down
6 changes: 3 additions & 3 deletions dbt/models/iasworld/schema/iasworld.aprval.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ sources:
loaded_at_field: date_parse(loaded_at, '%Y-%m-%d %H:%i:%S.%f')
tags:
- load_auto
- test_qc_iasworld
- data_test_iasworld

tables:
- name: aprval
Expand Down Expand Up @@ -34,7 +34,7 @@ sources:
- wen
config: &unique-conditions
where: |
CAST(taxyr AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
CAST(taxyr AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
AND cur = 'Y'
AND deactivat IS NULL
meta:
Expand Down Expand Up @@ -121,7 +121,7 @@ sources:
additional_select_columns: *select-columns
config:
where: |
CAST(taxyr AS int) BETWEEN {{ var('test_qc_year_start') }} AND {{ var('test_qc_year_end') }}
CAST(taxyr AS int) BETWEEN {{ var('data_test_iasworld_year_start') }} AND {{ var('data_test_iasworld_year_end') }}
meta:
description: cur should be 'Y' or 'D'
- name: deactivat
Expand Down
Loading

0 comments on commit 348d7ec

Please sign in to comment.