Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add class checks and other data validation prior to final db export #31

Open
dfsnow opened this issue Mar 18, 2024 · 2 comments
Open

Add class checks and other data validation prior to final db export #31

dfsnow opened this issue Mar 18, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@dfsnow
Copy link
Member

dfsnow commented Mar 18, 2024

Now that we've set up dbt tests in the data-architecture repo, we should set up something similar for PTAXSIM.

All scripts in data-raw/ actually populate an S3 bucket that feed the tax database in Athena. It should be relatively simple to add dbt tests on top of the tax data which need to pass prior to PTAXSIM export. Some ideas for tests:

  • Class code of each PIN is contained in our class dictionary
  • PIN-level AVs match iasWorld records
  • PIN-level class codes match iasWorld
  • Tax code of each PIN matches iasWorld
@dfsnow dfsnow added the enhancement New feature or request label Mar 18, 2024
@jeancochrane
Copy link
Contributor

Changes have been made in ccao-data/data-architecture#360 that should support these tests @Damonamajor! Here's how they fit into the tests listed above:

  • Class code of each PIN is contained in our class dictionary: This should use the relationships test to compare tax.pin.class to ccao.class_dict.class_code (example)
    • The additional_select_columns should be pin and year in the new test
    • The new test does not need the config or meta attributes that are present on the example test
  • PIN-level AVs match iasWorld records: Let's hold off on this one for now, since we need to figure out how to properly compare 2021/2022 records.
  • PIN-level class codes match iasWorld: This should use the res_class_matches_pardat test to compare tax.pin.class to iasworld.pardat.class (example)
    • We need to add the arguments parid_column_name: pin and taxyr_column_name: year to the new test since tax.pin uses a different naming scheme for these two fields than iasworld
    • The new test does not need the config or meta attributes that are present on the example test
    • Contrary to the example test, the only additional_select_column entry we need for this test is year, which can be aggregated with agg_func: max since tax.pin is unique by (pin, year)
  • Tax code of each PIN matches iasWorld: This should use the row_values_match_after_join test to compare tax.pin.tax_code_num to the relevant field in iasworld (source code). Currently there are no existing instances of row_values_match_after_join in the data, so this will be an exercise to figure it out. As a hint, here are the parameters you'll need to set:
    • external_model: Whatever model in iasworld has the equivalent of tax_code_num. This should be specified using the source() function, like external_model: source("iasworld", "<tablename>").
    • external_column_name: The name of the equivalent of tax_code_num in the iasworld table specified by external_model.
    • column_alias: Set this to tax_code_num.
    • external_column_alias: Set this to whatever the value of external_column_name is.
    • group_by: This should be a list containing two values, year and pin.
    • additional_select_columns: We will need to select pin and year using the dictionary form of additional_select_columns (example here). pin should have the alias parid and year should have the alias taxyr, and both fields should use max for agg_func.
    • join_condition: This should be a string that represents the SQL ON or USING clause of the join between tax.pin and the iasworld table in question, using the tablename model to refer to tax.pin and external_model to refer to the iasworld table in question. For example (alter this to reflect the correct columns to join on, and the correct columns to assure currency and uniqueness in the iasworld table):
join_condition: |
    ON  model.pin = external_model.parid
    AND model.year = external_model.taxyr
    AND external_model.cur = 'Y'
    AND external_model.deactivat IS NULL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants