Skip to content

Commit

Permalink
Add dbt documentation to sale.* assets (#202)
Browse files Browse the repository at this point in the history
* Add sale table column defs

* Add sale table docs

* Capitalize MyDec

* Fix year typo in sales schema

* Add @wagnerlmichael suggestions
  • Loading branch information
dfsnow authored Oct 31, 2023
1 parent f1f2a90 commit 6940e16
Show file tree
Hide file tree
Showing 4 changed files with 215 additions and 14 deletions.
2 changes: 1 addition & 1 deletion dbt/models/default/schema/default.vw_pin_sale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ models:
- name: is_multisale
description: '{{ doc("shared_column_sale_is_multisale") }}'
- name: is_mydec_date
description: Indicator for whether or not the observation uses the myDec sale date
description: Indicator for whether or not the observation uses the MyDec sale date
- name: nbhd
description: '{{ doc("shared_column_nbhd_code") }}'
- name: num_parcels_sale
Expand Down
60 changes: 54 additions & 6 deletions dbt/models/sale/docs.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,67 @@
# flag

{% docs flag %}
This table holds the flag information from the sales val program.
PIN-level sales validation flags created by
[model-sales-val](https://github.com/ccao-data/model-sales-val).

This is the primary sales validation output table. Flags within this table
should be possible to reconstruct using the other sales validation tables:
`sale.group_mean`, `sale.parameter`, and `sale.metadata`.

**Primary Key**: `meta_sale_document_number`, `run_id`, `version`
{% enddocs %}

# foreclosure

{% docs foreclosure %}
Foreclosure data ingested from Illinois Public Records (RIS).

**Primary Key**: `pin`, `document_number`
{% enddocs %}

# parameter

{% docs parameter %}
This table holds information about the specifications used to flag outliers in the sales val program.
Parameters used for each run of
[model-sales-val](https://github.com/ccao-data/model-sales-val),
including the statistical bounds, groupings, window sizes, etc.

**Primary Key**: `run_id`
{% enddocs %}

# group_mean

{% docs group_mean %}
This table holds group mean information which we can utilize to explain exactly why an outlier was flagged.
Information about groups used to calculate statistical deviations
for sales validation.

**Primary Key**: `run_id`, `group`
{% enddocs %}

# metadata

{% docs metadata %}
View to help the upload process of sales validation flags into iasWorld.
Information about the code used for a sales validation run, as well as
the start time and type of run.

**Primary Key**: `run_id`
{% enddocs %}

# mydec

{% docs mydec %}
MyDec data from the Illinois Department of Revenue (IDOR). Includes property
transfer declarations (sales) used to fill in missing data in `iasworld.sales`
and as an input to sales validation flagging.

**Primary Key**: `document_number`, `year_of_sale`
{% enddocs %}

# vw_ias_salesval_upload

{% docs vw_ias_salesval_upload %}
View to help the upload process of sales validation flags into iasWorld.
{% enddocs %}
View for sales validation outputs to create an upload format compatible
with iasWorld.

**Primary Key**: `salekey`, `run_id`
{% enddocs %}
150 changes: 146 additions & 4 deletions dbt/models/sale/schema.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,159 @@
sources:
- name: sale
tags:
- load_auto
tables:
- name: flag
description: '{{ doc("flag") }}'
- name: parameter
description: '{{ doc("parameter") }}'
tags:
- load_auto

columns:
- name: ptax_flag_original
description: |
Whether or not this sale was flagged on Q10 of the
PTAX-203 form (regardless of statistical deviation)
- name: meta_sale_document_number
description: '{{ doc("shared_column_document_number") }}'
- name: rolling_window
description: |
Rolling window period used to calculate statistics
for flagging this sale
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: sv_is_heuristic_outlier
description: '{{ doc("shared_column_sv_is_heuristic_outlier") }}'
- name: sv_is_ptax_outlier
description: '{{ doc("shared_column_sv_is_ptax_outlier") }}'
- name: sv_is_outlier
description: '{{ doc("shared_column_sv_is_outlier") }}'
- name: sv_outlier_type
description: '{{ doc("shared_column_sv_outlier_type") }}'
- name: version
description: '{{ doc("shared_column_sv_version") }}'

- name: foreclosure
description: '{{ doc("foreclosure") }}'
tags:
- load_manual

- name: group_mean
description: '{{ doc("group_mean") }}'
tags:
- load_auto

columns:
- name: group
description: |
Group string used as a unique identifier.
Typically a combination of year, township, and class
- name: group_size
description: Number of properties in the group
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: mean_price
description: Mean price of the group, in FMV
- name: mean_price_per_sqft
description: Mean price per sqft (of building) of the group, in FMV

- name: parameter
description: '{{ doc("parameter") }}'
tags:
- load_auto

columns:
- name: condo_stat_groups
description: |
Groups used to calculate flagging statistics (std. dev.)
for condominium (class 299, 399) properties
- name: dev_bounds
description: |
Boundaries for standard deviation flagging.
Sales with prices beyond these boundaries are flagged.
- name: earliest_data_ingest
description: |
Date of earliest sale used in validation.
This inclusive of the rolling window period used for
calculating statistical groups. In other words, if the earliest
sale to-be-flagged is 2013-12-01 and the rolling window period
is 9 months, then the earliest sale *used* would be 2013-03-01
- name: iso_forest_cols
description: Columns used as features in the isolation forest model
- name: latest_data_ingest
description: Date of latest sale used in validation
- name: min_group_thresh
description: |
Minimum number of sales required for statistical flagging.
If the minimum number of sales in our group methodology
(township, class, rolling window) is below N, these sales
are not flagged and are set to `Not outlier`
- name: ptax_sd
description: |
Boundaries for standard deviation flagging in combination
with a PTAX-203 flag
- name: res_stat_groups
description: |
Groups used to calculate flagging statistics (std. dev.)
for residential (class 2) properties
- name: rolling_window
description: |
Rolling window size, in months.
For each target sale, calculate statistics (std. dev.,
group size) using all sales in the period N months prior,
inclusive of the month of the sale itself
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: sales_flagged
description: |
Total number of sales flagged.
Inclusive of both sales flagged as outliers *and* sales
flagged as non-outliers
- name: short_term_owner_threshold
description: |
Properties with a significant price change and multiple
sales within this time duration (in days) are flagged
- name: metadata
description: '{{ doc("metadata") }}'
tags:
- load_auto

columns:
- name: long_commit_sha
description: Full commit SHA of the code used for the model run
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: run_timestamp
description: Start timestamp of the model run
- name: run_type
description: |
Type of model run.
Variable can be one of `initial_flagging`, `recurring`,
or `manual_update`
- name: short_commit_sha
description: Short commit SHA of the code used for the model run

- name: mydec
description: '{{ doc("mydec") }}'
tags:
- load_manual

models:
- name: sale.vw_ias_salesval_upload
description: '{{ doc("vw_ias_salesval_upload") }}'

columns:
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: salekey
description: '{{ doc("shared_column_sale_key") }}'
- name: sv_is_outlier
description: '{{ doc("shared_column_sv_is_outlier") }}'
- name: sv_outlier_type
description: '{{ doc("shared_column_sv_outlier_type") }}'
17 changes: 14 additions & 3 deletions dbt/models/shared_columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -989,7 +989,7 @@ prorated, but the building value is.
{% docs shared_column_buyer_name %}
Name of property buyer, as listed on deed.

Can be truncated by myDec/IDOR. See Clerk/Recorder of Deeds for full name.
Can be truncated by MyDec/IDOR. See Clerk/Recorder of Deeds for full name.
{% enddocs %}

## document_number
Expand Down Expand Up @@ -1049,15 +1049,15 @@ iasWorld internal sale identifier
{% docs shared_column_sale_price %}
Sale price of a PIN, as recorded on the deed.

Sales are sourced from myDec/IDOR. This serves as the outcome variable in regression models
Sales are sourced from MyDec/IDOR. This serves as the outcome variable in regression models
{% enddocs %}

## seller_name

{% docs shared_column_seller_name %}
Name of property seller, as listed on deed.

Can be truncated by myDec/IDOR. See Clerk/Recorder of Deeds for full name.
Can be truncated by MyDec/IDOR. See Clerk/Recorder of Deeds for full name.
{% enddocs %}

# Sale Validation
Expand All @@ -1077,6 +1077,8 @@ See [model-sales-val](https://github.com/ccao-data/model-sales-val) for full det
{% docs shared_column_sv_is_ptax_outlier %}
Outlier flagged due to certain answers on Q10 of the PTAX-203 form.

Must have a Q10 flag _in addition to_ a statistical flag.

See [model-sales-val](https://github.com/ccao-data/model-sales-val) for more details
{% enddocs %}

Expand All @@ -1090,6 +1092,15 @@ with `sv_is_ptax_outlier` (using OR logic).
NOTE: Outlier flags only exist for sales _after_ 2014.
{% enddocs %}

## sv_outlier_type

{% docs shared_column_sv_outlier_type %}
Heuristic or model used to flag an outlier.

See the [model-sales-val](https://github.com/ccao-data/model-sales-val) repo
for a list of possible flags.
{% enddocs %}

## sv_run_id

{% docs shared_column_sv_run_id %}
Expand Down

0 comments on commit 6940e16

Please sign in to comment.