Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flesh out docs on how to develop our data catalog #117

Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
224 changes: 224 additions & 0 deletions .github/ISSUE_TEMPLATE/new-dbt-model.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,224 @@
---
name: Add a new dbt model
description: Request the addition of a new model to the dbt DAG.
title: Add a new dbt model
---

_(Replace or delete anything in parentheses with your own issue content.)_

# New dbt model

_(Brief description of the task here.)_

## Model attributes

* **Name**: _(What should the model be called? See [Model
naming](/ccao-data/data-architecture#model-naming) for guidance.)_
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went with an absolute link here instead of a relative link because it seemed like a safer choice to link to the project homepage rather than assume a specific file structure going forward. But I could be convinced to switch to a relative link like:

Suggested change
naming](/ccao-data/data-architecture#model-naming) for guidance.)_
naming](../../README.md#model-naming) for guidance.)_

* **Materialization**: _(Should the model be a table or a view? See [Model
materialization](/ccao-data/data-architecture#model-materialization) for
guidance.)_
* **Tests**:
* _(Add a bulleted list of tests here. See [Model
tests](/ccao-data/data-architecture#model-tests) for guidance.)_
* **Description**: _(Provide a rich description for this model that will be
displayed in documentation. Markdown is supported, and encouraged for more
complex models. See [Model
description](/ccao-data/data-architecture#model-description) for guidance.)_

## Short checklist

_(Use this checklist if the assignee already knows how to add a dbt model.
Otherwise, delete it in favor of the long checklist in the following section.)_

- [ ] Define the SQL query that creates the model in the `aws-athena/` directory
- [ ] Optionally configure model materialization
- [ ] Confirm that a subdirectory for this model's database exists in
the `dbt/models/` directory, and if not, create one, add a new `schema.yml`
file, and update `dbt_project.yml` to document the `+schema`
- [ ] Add a symlink from the appropriate subfolder of the `dbt/models/`
directory to the new SQL query in the `aws-athena/` directory
- [ ] Add docs for the model to the subdirectory `docs.md` file
- [ ] Update the `schema.yml` file in the subfolder of `dbt/models/` where you
created your symlink to add a definition for your model
- [ ] Add tests to your new model definition in `schema.yml`
- [ ] If your model definition requires any new macros, make sure those macros
are tested in `dbt/macros/tests/test_all.sql`
- [ ] Commit your changes to a branch and open a pull request

## Checklist

Complete the following checklist to add the model:

- [ ] Define the SQL query that creates the model in the appropriate subfolder
of the `aws-athena/` directory. Views should live in `aws-athena/views/`
while tables should live in `aws-athena/ctas/`. When naming the file for the
query, the period in the model name that separates the entity name from the
database namespace should be changed to a hyphen (e.g. `default.new_model`
should become `default-new_model` for the purpose of the SQL filename).


```bash
# View example
touch aws-athena/views/default-vw_new_model.sql

# Table example
touch aws-athena/ctas/default-new_model.sql
```

- [ ] Use
[`source()`](https://docs.getdbt.com/reference/dbt-jinja-functions/source)
and [`ref()`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) to
reference other models where possible in your query.


```sql
-- View or table example
-- Either aws-athena/views/default-vw_new_model.sql
-- or aws-athena/ctas/default-new_model.sql
select pin10, year
from {{ source('raw', 'foobar') }}
join {{ ref('default.vw_pin_universe') }}
using (pin10, year)
```

- [ ] Optionally configure model materialization. If the output of the query
should be a view, no action is necessary, since the default for all models in
this repository is to materialize as views; but if the output should be a
table, with table data stored in S3, then you'll need to add a config block
to the top of the view to configure materialization.

```sql
-- Table example
-- aws-athena/ctas/default-new_model.sql
{{
config(
materialized='table',
partitioned_by=['year'],
bucketed_by=['pin10'],
bucket_count=1
)
}}
select pin10, year
from {{ source('raw', 'foobar') }}
join {{ ref('default.vw_pin_universe') }}
using (pin10, year)
```

- [ ] Confirm that a subdirectory for this model's database exists in
the `dbt/models/` directory (e.g. `dbt/models/default/` for
the `default.new_model` model). If a subdirectory does not yet exist, create
one, add a `schema.yml` file to the directory to store [model
properties](https://docs.getdbt.com/reference/model-properties), and update
`dbt_project.yml` to document the new directory under the `models.athena`
key with a `+schema` attribute.

```yaml
# Table example (only the model name would change for a view)
# schema.yml
version: 2


models:
- name: default.new_model
```

```diff
# View or table example
--- a/dbt/dbt_project.yml
+++ b/dbt/dbt_project.yml

models:
athena:
+materialized: view
+ default:
+ +schema: default
census:
+schema: census
```

- [ ] Add a symlink from the appropriate subfolder of the `dbt/models/`
directory to the SQL query you created in the `aws-athena/` directory.

```bash
# View example
ln -s aws-athena/views/default-vw_new_model.sql dbt/models/default/default.vw_new_model.sql

# Table example
ln -s aws-athena/ctas/default-new_model.sql dbt/models/default/default.new_model.sql
```

- [ ] Add or edit the docs file for the `dbt/models/` subdirectory your symlink
is in to add docs for your model.


```diff
# Table example (only the model name would change for a view)
--- a/dbt/models/default/docs.md
+++ b/dbt/models/default/docs.md

`spatial.township` is not yearly.
{% enddocs %}

+{% docs new_model %}
+
+Your Markdown docs go here.
+
+{% enddocs %}
+
{% docs vw_pin_value %}
CCAO mailed total, CCAO final, and BOR final values for each PIN by year.
```

- [ ] Update the `schema.yml` file in the subfolder of `dbt/models/` where you
created your symlink to add a definition for your model.

```diff
# Table example (only the model name would change for a view)
--- a/dbt/models/default/schema.yml
+++ b/dbt/models/default/schema.yml

models:
+ - name: default.new_model
+ description: '{{ doc("new_model") }}'
+ columns:
+ - name: pin10
+ description: 10-digit PIN
+ - name: year
+ description: Year
- name: default.vw_pin_history
description: '{{ doc("vw_pin_history") }}'
tests:
```

- [ ] Add tests to your new model definition in `schema.yml`.

```diff
# Table example (only the model name would change for a view)
--- a/dbt/models/default/schema.yml
+++ b/dbt/models/default/schema.yml

models:
- name: default.new_model
description: '{{ doc("new_model") }}'
columns:
- name: pin10
description: 10-digit PIN
- name: year
description: Year
+ tests:
+ - unique_combination_of_columns:
+ name: new_model_unique_by_pin_and_year
+ combination_of_columns:
+ - pin
+ - year
- name: default.vw_pin_history
description: '{{ doc("vw_pin_history") }}'
tests:
```

- [ ] If your model definition requires any new macros, make sure those macros
are tested in `dbt/macros/tests/test_all.sql`. If any tests need implementing,
follow the pattern set by existing tests to implement them.

- [ ] Commit your changes to a branch and open a pull request to build your
model and run tests in a CI environment.
Loading