Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dbt documentation to sale.* assets #202

Merged
merged 5 commits into from
Oct 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion dbt/models/default/schema/default.vw_pin_sale.yml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ models:
- name: is_multisale
description: '{{ doc("shared_column_sale_is_multisale") }}'
- name: is_mydec_date
description: Indicator for whether or not the observation uses the myDec sale date
description: Indicator for whether or not the observation uses the MyDec sale date
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just capitalizing MyDec correctly in all places in the documentation. Didn't want to put it in a separate PR.

- name: nbhd
description: '{{ doc("shared_column_nbhd_code") }}'
- name: num_parcels_sale
Expand Down
60 changes: 54 additions & 6 deletions dbt/models/sale/docs.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,67 @@
# flag

{% docs flag %}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Thought, non-blocking] I like the way you've been using prefixes to hint at the purpose of column descriptions, i.e. column_* and shared_column_*. I wonder if that's worth doing for tables, too? E.g. table_flag instead of flag? Views already follow a built-in vw_* naming convention, so it seems like tables are the odd ones out.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've already started to do that in the upcoming #203! So we're definitely on the same page.

This table holds the flag information from the sales val program.
PIN-level sales validation flags created by
[model-sales-val](https://github.com/ccao-data/model-sales-val).

This is the primary sales validation output table. Flags within this table
should be possible to reconstruct using the other sales validation tables:
`sale.group_mean`, `sale.parameter`, and `sale.metadata`.

**Primary Key**: `meta_sale_document_number`, `run_id`, `version`
dfsnow marked this conversation as resolved.
Show resolved Hide resolved
{% enddocs %}

# foreclosure

{% docs foreclosure %}
Foreclosure data ingested from Illinois Public Records (RIS).

**Primary Key**: `pin`, `document_number`
{% enddocs %}

# parameter

{% docs parameter %}
This table holds information about the specifications used to flag outliers in the sales val program.
Parameters used for each run of
[model-sales-val](https://github.com/ccao-data/model-sales-val),
including the statistical bounds, groupings, window sizes, etc.

**Primary Key**: `run_id`
{% enddocs %}

# group_mean
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added markdown headers to the doc assets just to make them easier to read, navigate, and fold.


{% docs group_mean %}
This table holds group mean information which we can utilize to explain exactly why an outlier was flagged.
Information about groups used to calculate statistical deviations
for sales validation.

**Primary Key**: `run_id`, `group`
{% enddocs %}

# metadata

{% docs metadata %}
View to help the upload process of sales validation flags into iasWorld.
Information about the code used for a sales validation run, as well as
the start time and type of run.

**Primary Key**: `run_id`
{% enddocs %}

# mydec

{% docs mydec %}
MyDec data from the Illinois Department of Revenue (IDOR). Includes property
transfer declarations (sales) used to fill in missing data in `iasworld.sales`
and as an input to sales validation flagging.

**Primary Key**: `document_number`, `year_of_sale`
{% enddocs %}

# vw_ias_salesval_upload

{% docs vw_ias_salesval_upload %}
View to help the upload process of sales validation flags into iasWorld.
{% enddocs %}
View for sales validation outputs to create an upload format compatible
with iasWorld.

**Primary Key**: `salekey`, `run_id`
{% enddocs %}
150 changes: 146 additions & 4 deletions dbt/models/sale/schema.yml
Original file line number Diff line number Diff line change
@@ -1,17 +1,159 @@
sources:
- name: sale
tags:
- load_auto
tables:
- name: flag
description: '{{ doc("flag") }}'
- name: parameter
description: '{{ doc("parameter") }}'
tags:
- load_auto

columns:
- name: ptax_flag_original
description: |
Whether or not this sale was flagged on Q10 of the
PTAX-203 form (regardless of statistical deviation)
- name: meta_sale_document_number
description: '{{ doc("shared_column_document_number") }}'
- name: rolling_window
description: |
Rolling window period used to calculate statistics
for flagging this sale
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: sv_is_heuristic_outlier
description: '{{ doc("shared_column_sv_is_heuristic_outlier") }}'
- name: sv_is_ptax_outlier
description: '{{ doc("shared_column_sv_is_ptax_outlier") }}'
- name: sv_is_outlier
description: '{{ doc("shared_column_sv_is_outlier") }}'
- name: sv_outlier_type
description: '{{ doc("shared_column_sv_outlier_type") }}'
- name: version
description: '{{ doc("shared_column_sv_version") }}'

- name: foreclosure
description: '{{ doc("foreclosure") }}'
tags:
- load_manual
Comment on lines +33 to +36
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Question, non-blocking] Do we want docs for sale.foreclosure columns as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably, but there are a billion columns and I don't want to add docs for them right now. I added this as a subtask to #201.


- name: group_mean
description: '{{ doc("group_mean") }}'
tags:
- load_auto

columns:
- name: group
description: |
Group string used as a unique identifier.

Typically a combination of year, township, and class
- name: group_size
description: Number of properties in the group
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: mean_price
description: Mean price of the group, in FMV
- name: mean_price_per_sqft
description: Mean price per sqft (of building) of the group, in FMV

- name: parameter
description: '{{ doc("parameter") }}'
tags:
- load_auto

columns:
- name: condo_stat_groups
description: |
Groups used to calculate flagging statistics (std. dev.)
for condominium (class 299, 399) properties
- name: dev_bounds
description: |
Boundaries for standard deviation flagging.

Sales with prices beyond these boundaries are flagged.
- name: earliest_data_ingest
description: |
Date of earliest sale used in validation.

This inclusive of the rolling window period used for
calculating statistical groups. In other words, if the earliest
sale to-be-flagged is 2013-12-01 and the rolling window period
is 9 months, then the earliest sale *used* would be 2013-03-01
- name: iso_forest_cols
description: Columns used as features in the isolation forest model
- name: latest_data_ingest
description: Date of latest sale used in validation
- name: min_group_thresh
description: |
Minimum number of sales required for statistical flagging.

If the minimum number of sales in our group methodology
(township, class, rolling window) is below N, these sales
are not flagged and are set to `Not outlier`
- name: ptax_sd
description: |
Boundaries for standard deviation flagging in combination
with a PTAX-203 flag
- name: res_stat_groups
description: |
Groups used to calculate flagging statistics (std. dev.)
for residential (class 2) properties
- name: rolling_window
description: |
Rolling window size, in months.

For each target sale, calculate statistics (std. dev.,
group size) using all sales in the period N months prior,
inclusive of the month of the sale itself
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: sales_flagged
description: |
Total number of sales flagged.

Inclusive of both sales flagged as outliers *and* sales
flagged as non-outliers
- name: short_term_owner_threshold
description: |
Properties with a significant price change and multiple
sales within this time duration (in days) are flagged


- name: metadata
description: '{{ doc("metadata") }}'
tags:
- load_auto

columns:
- name: long_commit_sha
description: Full commit SHA of the code used for the model run
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: run_timestamp
description: Start timestamp of the model run
- name: run_type
description: |
Type of model run.

Variable can be one of `initial_flagging`, `recurring`,
or `manual_update`
- name: short_commit_sha
description: Short commit SHA of the code used for the model run

- name: mydec
description: '{{ doc("mydec") }}'
tags:
- load_manual

models:
- name: sale.vw_ias_salesval_upload
description: '{{ doc("vw_ias_salesval_upload") }}'

columns:
- name: run_id
description: '{{ doc("shared_column_sv_run_id") }}'
- name: salekey
description: '{{ doc("shared_column_sale_key") }}'
- name: sv_is_outlier
description: '{{ doc("shared_column_sv_is_outlier") }}'
- name: sv_outlier_type
description: '{{ doc("shared_column_sv_outlier_type") }}'
17 changes: 14 additions & 3 deletions dbt/models/shared_columns.md
Original file line number Diff line number Diff line change
Expand Up @@ -989,7 +989,7 @@ prorated, but the building value is.
{% docs shared_column_buyer_name %}
Name of property buyer, as listed on deed.

Can be truncated by myDec/IDOR. See Clerk/Recorder of Deeds for full name.
Can be truncated by MyDec/IDOR. See Clerk/Recorder of Deeds for full name.
{% enddocs %}

## document_number
Expand Down Expand Up @@ -1049,15 +1049,15 @@ iasWorld internal sale identifier
{% docs shared_column_sale_price %}
Sale price of a PIN, as recorded on the deed.

Sales are sourced from myDec/IDOR. This serves as the outcome variable in regression models
Sales are sourced from MyDec/IDOR. This serves as the outcome variable in regression models
{% enddocs %}

## seller_name

{% docs shared_column_seller_name %}
Name of property seller, as listed on deed.

Can be truncated by myDec/IDOR. See Clerk/Recorder of Deeds for full name.
Can be truncated by MyDec/IDOR. See Clerk/Recorder of Deeds for full name.
{% enddocs %}

# Sale Validation
Expand All @@ -1077,6 +1077,8 @@ See [model-sales-val](https://github.com/ccao-data/model-sales-val) for full det
{% docs shared_column_sv_is_ptax_outlier %}
Outlier flagged due to certain answers on Q10 of the PTAX-203 form.

Must have a Q10 flag _in addition to_ a statistical flag.

See [model-sales-val](https://github.com/ccao-data/model-sales-val) for more details
{% enddocs %}

Expand All @@ -1090,6 +1092,15 @@ with `sv_is_ptax_outlier` (using OR logic).
NOTE: Outlier flags only exist for sales _after_ 2014.
{% enddocs %}

## sv_outlier_type

{% docs shared_column_sv_outlier_type %}
Heuristic or model used to flag an outlier.

See the [model-sales-val](https://github.com/ccao-data/model-sales-val) repo
for a list of possible flags.
{% enddocs %}

## sv_run_id

{% docs shared_column_sv_run_id %}
Expand Down