-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add conditional uniqueness tests for iasWorld tables #98
Changes from 12 commits
1eb0a2b
006ade8
d9366cb
913db16
4fafd6f
a829bfb
671dc6d
cf32372
2ca0f81
8f0e011
6a0f10e
a6f876a
8e72252
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -30,6 +30,20 @@ sources: | |
filter: &latest_taxyr taxyr >= date_format(current_date - interval '1' year, '%Y') | ||
warn_after: &24_hours {count: 24, period: hour} | ||
error_after: &48_hours {count: 48, period: hour} | ||
tests: | ||
- unique_combination_of_columns: | ||
name: asmt_all_unique_by_parid_procname_and_taxyr | ||
combination_of_columns: | ||
- parid | ||
- procname | ||
- taxyr | ||
where: >- | ||
cur = 'Y' and | ||
deactivat is null and | ||
procname in ('CCAOVALUE', 'CCAOFINAL', 'BORVALUE') and | ||
valclass is null | ||
config: | ||
error_if: ">125" | ||
- name: asmt_hist | ||
- name: cname | ||
- name: comdat | ||
|
@@ -73,6 +87,17 @@ sources: | |
filter: *latest_taxyr | ||
warn_after: *24_hours | ||
error_after: *48_hours | ||
tests: | ||
- unique_combination_of_columns: | ||
name: htpar_unique_by_parid_caseno_taxyr_subkey | ||
combination_of_columns: | ||
- parid | ||
- caseno | ||
- taxyr | ||
- subkey | ||
where: cur = 'Y' and deactivat is null | ||
config: | ||
error_if: ">2" | ||
- name: land | ||
description: '{{ doc("land") }}' | ||
tests: | ||
|
@@ -89,7 +114,7 @@ sources: | |
name: legdat_unique_by_parid_taxyr | ||
combination_of_columns: | ||
- parid | ||
- taxyr | ||
- taxyr | ||
- name: lpmod | ||
- name: lpnbhd | ||
- name: oby | ||
|
@@ -109,21 +134,28 @@ sources: | |
name: owndat_unique_by_parid_taxyr | ||
combination_of_columns: | ||
- parid | ||
- taxyr | ||
- taxyr | ||
- name: pardat | ||
description: '{{ doc("pardat") }}' | ||
tests: | ||
- unique_combination_of_columns: | ||
name: pardat_unique_by_parid_taxyr | ||
combination_of_columns: | ||
- parid | ||
- taxyr | ||
- taxyr | ||
- name: permit | ||
freshness: | ||
filter: date_format(date_parse(permdt, '%Y-%m-%d %H:%i:%s.0'), '%Y') >= date_format(current_date - interval '1' year, '%Y') | ||
warn_after: *48_hours | ||
error_after: &72_hours {count: 72, period: hour} | ||
- name: rcoby | ||
- name: sales | ||
tests: | ||
- unique_combination_of_columns: | ||
name: sales_unique_by_parid_instruno | ||
combination_of_columns: | ||
- parid | ||
- instruno | ||
where: substr(saledt, 1, 4) >= '2023' | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. issue (non-blocking): This conditional hopefully shouldn't be necessary to make the test pass (sales should be unique by document number). If it is, we may need to borrow some of the more complex logic of the sales view, i.e: WHERE sales.instruno IS NOT NULL
AND sales.deactivat IS NULL
AND tc.township_code IS NOT NULL @wrridgeway Any thoughts here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't fully understand the data context here, but I wanted to confirm that the test doesn't pass if you remove the
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm still curious what @wrridgeway thinks, but I'm going to go ahead and merge in the meantime to unblock test development. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. for sales, yeah, we can just do unique by There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wrridgeway Does that mean we should remove this test and replace it with one like this? - name: sales
tests:
- unique_combination_of_columns:
name: sales_unique_by_doc_no
combination_of_columns:
- doc_no
config:
where: is_multisale = false
error_if: ">2 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @wrridgeway You're thinking of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is true, but in that case, we do not expect |
||
- name: splcom | ||
- name: valclass |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,22 +1,26 @@ | ||
-- Test that a given set of columns are unique, with an optional | ||
-- threshold indicating an allowable number of duplicates. | ||
-- threshold indicating an allowable number of duplicates and an optional | ||
-- filter clause to restrict the set of rows that should be unique. | ||
-- | ||
-- For example, test that a given PIN has been sold no more than | ||
-- twice in one year. | ||
-- twice in one year, for only active rows. | ||
-- | ||
-- The duplicate threshold defaults to 1, in which case this is a standard | ||
-- uniqueness test. | ||
-- uniqueness test. The where clause defaults to null, which indicates | ||
-- that the full set of rows should be unique on the given columns. | ||
-- | ||
-- Adapted from dbt_utils.unique_combination_of_columns, and adjusted to add the | ||
-- optional duplicate threshold and to only report one row for each dupe. | ||
-- optional duplicate threshold, to add the optional where clause, and to only | ||
-- report one row for each dupe. | ||
{% test unique_combination_of_columns( | ||
model, combination_of_columns, allowed_duplicates=0 | ||
model, combination_of_columns, allowed_duplicates=0, where=null | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
) %} | ||
|
||
{%- set columns_csv = combination_of_columns | join(", ") %} | ||
|
||
select {{ columns_csv }}, count(*) as num_duplicates | ||
from {{ model }} | ||
{% if where %} where ({{ where }}) {% endif %} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This line is really funny if you don't know what's going on. question (non-blocking): Am I correct in my understanding that dbt/jinja interpret any non-null value as truthy here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's true! I would be on board for changing it to |
||
group by {{ columns_csv }} | ||
having count(*) > {{ allowed_duplicates }} + 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. thought: We may run into lots of cases where this dupe test triggers due to many |
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thought (dreadful): It's going to be so much fun to figure out what all these errors are...