Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initialize data catalog overview docs with data flow diagram #102

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
19 changes: 17 additions & 2 deletions .github/workflows/deploy_dbt_docs.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,32 @@ jobs:
with:
role-to-assume: ${{ secrets.AWS_IAM_ROLE_TO_ASSUME_ARN }}

- name: Setup node
uses: actions/setup-node@v3

- name: Install docs build dependencies
run: npm install -g @mermaid-js/mermaid-cli

- name: Prepare Mermaid assets for docs
run: |
for file in assets/*.mmd; do
mmdc -i "$file" -o "${file/.mmd/.svg}"
done
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Generate docs
run: dbt docs generate --target "$TARGET"
run: dbt docs generate --target prod
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

- name: Package doc files for upload
run: |
mkdir _site
mkdir -p _site/assets
for file in index.html catalog.json manifest.json; do
cp "target/$file" "_site/$file"
done
cp -R target/assets/* _site/assets
working-directory: ${{ env.PROJECT_DIR }}
shell: bash

Expand Down
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,3 +9,5 @@
*.csv
!open_data_info.csv
aws-s3/parcel-tmp
package*.json
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved
node_modules/
1 change: 1 addition & 0 deletions dbt/.gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@ venv/
target/
dbt_modules/
dbt_packages/
assets/*.svg
47 changes: 36 additions & 11 deletions dbt/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -30,49 +30,74 @@ dbt deps

## Usage

Make sure you have the virtual environment activated:
To run dbt commands, make sure you have the virtual environment activated:

```
source venv/bin/activate
```

Authenticate with AWS MFA if you haven't already today:
You must also authenticate with AWS using MFA if you haven't already today:

```
aws-mfa
```

Build the models to create views in our Athena warehouse:
### Build tables and views

Build the models to create tables and views in our Athena warehouse:

```
dbt run
```

By default, all `dbt` commands will run against the `dev` environment, which
namespaces the resources it creates by prefixing target database names with
your Unix `$USER` name (e.g. `jecochr-default` for the `default` database when
`dbt` is run on Jean's machine). To instead **run commands against prod**,
your Unix `$USER` name (e.g. `dev_jecochr_default` for the `default` database
when `dbt` is run on Jean's machine). To instead **run commands against prod**,
use the `--target` flag:

```
dbt run --target prod
```

Generate the documentation:
### Generate documentation

Note that we configure dbt's [`asset-paths`
attribute](https://docs.getdbt.com/reference/project-configs/asset-paths) in
order to link to images in our documentation. Some of those images, like the
Mermaid diagram defined in `assets/dataflow-diagram.md`, are generated
automatically during the `deploy-dbt-docs` deployment workflow. To generate
them locally, make sure you have
[`mermaid-cli`](https://github.com/mermaid-js/mermaid-cli) installed (we
recommend a [local
installation](https://github.com/mermaid-js/mermaid-cli#install-locally)) and
run the following command:

```bash
for file in assets/*.mmd; do
./node_modules/.bin/mmdc -i "$file" -o "${file/.mmd/.svg}"
done
```

Then, generate the documentation:

```
dbt docs generate
```

This will create a new file `target/index.html` representing the static
docs site.
This will create a set of static files in the `target/` subdirectory that can
be used to serve the docs site.

You can also serve the docs locally:
To serve the docs locally:

```
dbt docs serve
```

Then, navigate to http://localhost:8080 to view the site.

### Run tests

Run the tests:

```
Expand All @@ -91,7 +116,7 @@ Run tests for dbt macros:
dbt run-operation test_all
```

## Debugging dbt test failures
#### Debugging dbt test failures

Most of our dbt tests are simple SQL statements that we run against our
models in order to confirm that models conform to spec. If a test is
Expand All @@ -115,7 +140,7 @@ helpful for spotting it in the list of recent queries.
Open the query in the Athena query editor, and edit or run it as necessary
to debug the test failure.

### A special note on failures related to code changes
#### A special note on failures related to code changes

To quickly rule out a failure related to a code change, you can switch to the
main branch of this repository (or to an earlier commit where we know tests
Expand Down
18 changes: 18 additions & 0 deletions dbt/assets/dataflow-diagram.mmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: CCAO Data Flow Diagram
---
flowchart TD
A[Mainframe + AS/400] & B[User input] --> C[(iasWorld)]
C -- service-sqoop-iasworld --> D[(AWS Athena\nwarehouse)]
E["Public data sources (e.g. Census, OSM, BetterSchools)"] & F["Private data sources (e.g. RPIE, sales)"] -- R extraction scripts --> D
D -- R transformation scripts --> D
D --> I[dbt] --> D
D --> J[CTAs] --> D
D ----> K[AWS Glue jobs]
K ---> L(Ratio stats) -- reporting database --> D
K ---> M(Res reporting) -- reporting database --> D
K ---> N(Sales flagging) -- sale database --> D
D --> O[On-prem modeling and dev. server] -- Socrata agent --> P[Open data portal]
O -- R modeling pipeline --> D
L & M --> Q[Tableau reports]
D -- Scheduled extracts --> Q
1 change: 1 addition & 0 deletions dbt/dbt_project.yml
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
asset-paths: ["assets"]


# Directories to be removed by `dbt clean`
Expand Down
17 changes: 17 additions & 0 deletions dbt/models/overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
{% docs __overview__ %}
# Cook County Assessor Data Department Catalog
jeancochrane marked this conversation as resolved.
Show resolved Hide resolved

This site documents the data infrastructure created and used by
the Data Department of the Cook County Assessor's Office.

These docs are under active development and are generated automatically using
[dbt](https://docs.getdbt.com/docs/introduction). You can view the source code
on [GitHub](https://github.com/ccao-data/data-architecture/).

## Data Flow Diagram

The following diagram summarizes the Data Department's current infrastructure and data flows.

![Data Flow Diagram](/data-architecture/assets/dataflow-diagram.svg)

{% enddocs %}
Loading