-
Notifications
You must be signed in to change notification settings - Fork 4
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update dbt docs overview page (#207)
* Update dbt overview page * Swap span colors for swatches * Fix natural language linting errors * Update dbt/models/overview.md Co-authored-by: William Ridgeway <10358980+wrridgeway@users.noreply.github.com> * Update dbt/models/overview.md Co-authored-by: William Ridgeway <10358980+wrridgeway@users.noreply.github.com> --------- Co-authored-by: William Ridgeway <10358980+wrridgeway@users.noreply.github.com>
- Loading branch information
1 parent
f9b2f78
commit 368233d
Showing
6 changed files
with
70 additions
and
17 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,13 @@ | ||
--- | ||
title: CCAO Data Flow Diagram | ||
--- | ||
flowchart TD | ||
A[Mainframe + AS/400] & B[User input] --> C[(iasWorld)] | ||
A[Mainframe + AS/400] & B[User input] & M[MyDec]--> C[(iasWorld)] | ||
C -- service-sqoop-iasworld --> D[(AWS Athena\nwarehouse)] | ||
E["Public data sources (e.g. Census, OSM, BetterSchools)"] & F["Private data sources (e.g. RPIE, sales)"] -- R extraction scripts --> D | ||
E["Public data sources\n(e.g. Census, OSM, GIS)"] & F["Private data sources\n(e.g. RPIE, sales)"] -- R extraction scripts --> D | ||
D -- R transformation scripts --> D | ||
D --> I[dbt] --> D | ||
D --> J[CTAs] --> D | ||
D --> I{dbt} --> D | ||
D ----> K[AWS Glue jobs] | ||
K ---> L(Ratio stats) -- reporting database --> D | ||
K ---> M(Res reporting) -- reporting database --> D | ||
K ---> N(Sales flagging) -- sale database --> D | ||
D --> O[On-prem modeling and dev. server] -- Socrata agent --> P[Open data portal] | ||
O -- R modeling pipeline --> D | ||
L & M --> Q[Tableau reports] | ||
D --> O[On-prem dev. server] -- Socrata agent --> P[Open data portal] | ||
L --> Q[Tableau reports] | ||
D -- Scheduled extracts --> Q | ||
D -- GitHub Actions --> Z{AWS Batch\nmodeling\ninstances} --> D |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,75 @@ | ||
{% docs __overview__ %} | ||
# Cook County Assessor Data Department Catalog | ||
# Cook County Assessor's Office Data Catalog and Documentation | ||
|
||
This site documents the data infrastructure created and used by | ||
the Data Department of the Cook County Assessor's Office. | ||
|
||
These docs are under active development and are generated automatically using | ||
[dbt](https://docs.getdbt.com/docs/introduction). You can view the source code | ||
on [GitHub](https://github.com/ccao-data/data-architecture/). | ||
### Using This Site | ||
|
||
## Data Flow Diagram | ||
The site is under active development and is generated automatically | ||
using [dbt](https://docs.getdbt.com/docs/introduction). It contains | ||
descriptions and documentation for nearly all of the actively used tables | ||
and views in the Data Department's | ||
[AWS Athena](https://aws.amazon.com/athena/)-based data warehouse. | ||
Assets (tables and views) are grouped by topic or source into databases within | ||
a single AWS data catalog. To view the documentation of a single table: | ||
|
||
The following diagram summarizes the Data Department's current infrastructure and data flows. | ||
1. Click the **Database** tab in the top left of this page, | ||
then **awsdatacatalog** to view a list of all current databases | ||
2. Click one of the database names (e.g. **model**) to view all the tables | ||
and/or views within that database | ||
3. Click a table name (e.g. **model** > **metadata**) to view a description | ||
of the table, its tags, any related tables or tests, and the SQL | ||
source code that created it (if applicable) | ||
4. Click a column name within a table (e.g. **model** > **metadata** > **run_id**) | ||
to get a full description of that column | ||
|
||
You can also view the | ||
[lineage graph](https://docs.getdbt.com/terms/data-lineage) of any view | ||
or table by clicking the teal button on the bottom right of this page. This | ||
lets you explore the upstream and downstream dependencies of any asset. | ||
Assets in the graph are color-coded by type: | ||
|
||
- ![source](/data-architecture/assets/swatch-source.png) [Sources](https://docs.getdbt.com/docs/build/sources) - Data loaded into the warehouse by scripts and other tools | ||
- ![model](/data-architecture/assets/swatch-model.png) [Models](https://docs.getdbt.com/docs/build/models) - Data transformation and assets created and managed by dbt | ||
- ![exposure](/data-architecture/assets/swatch-exposure.png) [Exposures](https://docs.getdbt.com/docs/build/exposures) - Downstream consumers of dbt assets e.g. [Open Data](https://datacatalog.cookcountyil.gov/browse?tags=cook+county+assessor) | ||
|
||
For more information on the function and construction of this system, visit the | ||
[Data Department's dbt README on GitHub](https://github.com/ccao-data/data-architecture/tree/master/dbt#readme). | ||
|
||
### Overall Architecture | ||
|
||
The overall data architecture outlined on this site is designed to be as | ||
simple as possible while supporting the Data Department's two primary duties: | ||
residential modeling and overall assessment reporting. | ||
|
||
The architecture is primarily built around an | ||
[AWS Athena](https://aws.amazon.com/athena/)-based data "lakehouse" which | ||
facilitates SQL querying of static files stored on [S3](https://aws.amazon.com/s3/) | ||
(mostly Parquet files). The core of this warehouse is built around a mirror | ||
of the County's property tax system-of-record, iasWorld. The Data Department | ||
collects, cleans, and appends additional data to this core to support residential | ||
modeling. It also aggregates and exports data to support valuation reporting | ||
via Tableau and the Assessor's site. The following diagram summarizes the | ||
Data Department's current infrastructure and data flows. | ||
|
||
#### Data Flow Diagram | ||
|
||
![Data Flow Diagram](/data-architecture/assets/dataflow-diagram.svg) | ||
|
||
#### iasWorld Entity Relationship Diagram | ||
|
||
The following entity relation diagram (ERD) represents the Data Department's | ||
best current understanding of the main iasWorld tables used by the CCAO. Note | ||
that this is _not_ an official diagram, only an approximation. The source for this | ||
diagram can be [found here](https://lucid.app/lucidchart/da854c6c-eead-4d15-8989-8f2060e3ba71/edit?invitationId=inv_f226cccb-c40c-4260-8334-f2f6bae216aa). | ||
|
||
![iasWorld ERD](/data-architecture/assets/iasworld-erd.pdf) | ||
|
||
### Additional Resources | ||
|
||
- [CCAO Data Department Wiki](https://github.com/ccao-data/wiki) | ||
- [iasWorld Tables and Columns Overview](https://github.com/ccao-data/wiki/raw/master/Data/iasWorld-Tables.xlsx) | ||
- [iasWorld Primary and Foreign Key List](https://github.com/ccao-data/wiki/raw/master/Data/iasWorld-PK-FK-2021-06-14.xlsx) | ||
|
||
{% enddocs %} |