-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flesh out docs on how to develop our data catalog #117
Merged
jeancochrane
merged 7 commits into
master
from
jeancochrane/67-data-catalog-flesh-out-dev-docs-on-how-to-build-and-test-with-dbt
Sep 11, 2023
Merged
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
d07717d
Flesh out dbt developer documentation
jeancochrane 8545bc1
Add new-dbt-model.md issue template
jeancochrane 97cc502
Tweak wording in dbt/README.md
jeancochrane c0cd6a8
Update new-dbt-model.md to fix typos
jeancochrane 0ab0c16
Resolve natural language errors to appease superlinter
jeancochrane 9fa3e16
Apply Dan's suggestions from code review to dbt/README.md
jeancochrane 7426ff0
Tweak note on column descriptions in dbt/README.md
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,224 @@ | ||
--- | ||
name: Add a new dbt model | ||
description: Request the addition of a new model to the dbt DAG. | ||
title: Add a new dbt model | ||
--- | ||
|
||
_(Replace or delete anything in parentheses with your own issue content.)_ | ||
|
||
# New dbt model | ||
|
||
_(Brief description of the task here.)_ | ||
|
||
## Model attributes | ||
|
||
* **Name**: _(What should the model be called? See [Model | ||
naming](/ccao-data/data-architecture#model-naming) for guidance.)_ | ||
* **Materialization**: _(Should the model be a table or a view? See [Model | ||
materialization](/ccao-data/data-architecture#model-materialization) for | ||
guidance.)_ | ||
* **Tests**: | ||
* _(Add a bulleted list of tests here. See [Model | ||
tests](/ccao-data/data-architecture#model-tests) for guidance.)_ | ||
* **Description**: _(Provide a rich description for this model that will be | ||
displayed in documentation. Markdown is supported, and encouraged for more | ||
complex models. See [Model | ||
description](/ccao-data/data-architecture#model-description) for guidance.)_ | ||
|
||
## Short checklist | ||
|
||
_(Use this checklist if the assignee already knows how to add a dbt model. | ||
Otherwise, delete it in favor of the long checklist in the following section.)_ | ||
|
||
- [ ] Define the SQL query that creates the model in the `aws-athena/` directory | ||
- [ ] Optionally configure model materialization | ||
- [ ] Confirm that a subdirectory for this model's database exists in | ||
the `dbt/models/` directory, and if not, create one, add a new `schema.yml` | ||
file, and update `dbt_project.yml` to document the `+schema` | ||
- [ ] Add a symlink from the appropriate subfolder of the `dbt/models/` | ||
directory to the new SQL query in the `aws-athena/` directory | ||
- [ ] Add docs for the model to the subdirectory `docs.md` file | ||
- [ ] Update the `schema.yml` file in the subfolder of `dbt/models/` where you | ||
created your symlink to add a definition for your model | ||
- [ ] Add tests to your new model definition in `schema.yml` | ||
- [ ] If your model definition requires any new macros, make sure those macros | ||
are tested in `dbt/macros/tests/test_all.sql` | ||
- [ ] Commit your changes to a branch and open a pull request | ||
|
||
## Checklist | ||
|
||
Complete the following checklist to add the model: | ||
|
||
- [ ] Define the SQL query that creates the model in the appropriate subfolder | ||
of the `aws-athena/` directory. Views should live in `aws-athena/views/` | ||
while tables should live in `aws-athena/ctas/`. When naming the file for the | ||
query, the period in the model name that separates the entity name from the | ||
database namespace should be changed to a hyphen (e.g. `default.new_model` | ||
should become `default-new_model` for the purpose of the SQL filename). | ||
|
||
|
||
```bash | ||
# View example | ||
touch aws-athena/views/default-vw_new_model.sql | ||
|
||
# Table example | ||
touch aws-athena/ctas/default-new_model.sql | ||
``` | ||
|
||
- [ ] Use | ||
[`source()`](https://docs.getdbt.com/reference/dbt-jinja-functions/source) | ||
and [`ref()`](https://docs.getdbt.com/reference/dbt-jinja-functions/ref) to | ||
reference other models where possible in your query. | ||
|
||
|
||
```sql | ||
-- View or table example | ||
-- Either aws-athena/views/default-vw_new_model.sql | ||
-- or aws-athena/ctas/default-new_model.sql | ||
select pin10, year | ||
from {{ source('raw', 'foobar') }} | ||
join {{ ref('default.vw_pin_universe') }} | ||
using (pin10, year) | ||
``` | ||
|
||
- [ ] Optionally configure model materialization. If the output of the query | ||
should be a view, no action is necessary, since the default for all models in | ||
this repository is to materialize as views; but if the output should be a | ||
table, with table data stored in S3, then you'll need to add a config block | ||
to the top of the view to configure materialization. | ||
|
||
```sql | ||
-- Table example | ||
-- aws-athena/ctas/default-new_model.sql | ||
{{ | ||
config( | ||
materialized='table', | ||
partitioned_by=['year'], | ||
bucketed_by=['pin10'], | ||
bucket_count=1 | ||
) | ||
}} | ||
select pin10, year | ||
from {{ source('raw', 'foobar') }} | ||
join {{ ref('default.vw_pin_universe') }} | ||
using (pin10, year) | ||
``` | ||
|
||
- [ ] Confirm that a subdirectory for this model's database exists in | ||
the `dbt/models/` directory (e.g. `dbt/models/default/` for | ||
the `default.new_model` model). If a subdirectory does not yet exist, create | ||
one, add a `schema.yml` file to the directory to store [model | ||
properties](https://docs.getdbt.com/reference/model-properties), and update | ||
`dbt_project.yml` to document the new directory under the `models.athena` | ||
key with a `+schema` attribute. | ||
|
||
```yaml | ||
# Table example (only the model name would change for a view) | ||
# schema.yml | ||
version: 2 | ||
|
||
|
||
models: | ||
- name: default.new_model | ||
``` | ||
|
||
```diff | ||
# View or table example | ||
--- a/dbt/dbt_project.yml | ||
+++ b/dbt/dbt_project.yml | ||
|
||
models: | ||
athena: | ||
+materialized: view | ||
+ default: | ||
+ +schema: default | ||
census: | ||
+schema: census | ||
``` | ||
|
||
- [ ] Add a symlink from the appropriate subfolder of the `dbt/models/` | ||
directory to the SQL query you created in the `aws-athena/` directory. | ||
|
||
```bash | ||
# View example | ||
ln -s aws-athena/views/default-vw_new_model.sql dbt/models/default/default.vw_new_model.sql | ||
|
||
# Table example | ||
ln -s aws-athena/ctas/default-new_model.sql dbt/models/default/default.new_model.sql | ||
``` | ||
|
||
- [ ] Add or edit the docs file for the `dbt/models/` subdirectory your symlink | ||
is in to add docs for your model. | ||
|
||
|
||
```diff | ||
# Table example (only the model name would change for a view) | ||
--- a/dbt/models/default/docs.md | ||
+++ b/dbt/models/default/docs.md | ||
|
||
`spatial.township` is not yearly. | ||
{% enddocs %} | ||
|
||
+{% docs new_model %} | ||
+ | ||
+Your Markdown docs go here. | ||
+ | ||
+{% enddocs %} | ||
+ | ||
{% docs vw_pin_value %} | ||
CCAO mailed total, CCAO final, and BOR final values for each PIN by year. | ||
``` | ||
|
||
- [ ] Update the `schema.yml` file in the subfolder of `dbt/models/` where you | ||
created your symlink to add a definition for your model. | ||
|
||
```diff | ||
# Table example (only the model name would change for a view) | ||
--- a/dbt/models/default/schema.yml | ||
+++ b/dbt/models/default/schema.yml | ||
|
||
models: | ||
+ - name: default.new_model | ||
+ description: '{{ doc("new_model") }}' | ||
+ columns: | ||
+ - name: pin10 | ||
+ description: 10-digit PIN | ||
+ - name: year | ||
+ description: Year | ||
- name: default.vw_pin_history | ||
description: '{{ doc("vw_pin_history") }}' | ||
tests: | ||
``` | ||
|
||
- [ ] Add tests to your new model definition in `schema.yml`. | ||
|
||
```diff | ||
# Table example (only the model name would change for a view) | ||
--- a/dbt/models/default/schema.yml | ||
+++ b/dbt/models/default/schema.yml | ||
|
||
models: | ||
- name: default.new_model | ||
description: '{{ doc("new_model") }}' | ||
columns: | ||
- name: pin10 | ||
description: 10-digit PIN | ||
- name: year | ||
description: Year | ||
+ tests: | ||
+ - unique_combination_of_columns: | ||
+ name: new_model_unique_by_pin_and_year | ||
+ combination_of_columns: | ||
+ - pin | ||
+ - year | ||
- name: default.vw_pin_history | ||
description: '{{ doc("vw_pin_history") }}' | ||
tests: | ||
``` | ||
|
||
- [ ] If your model definition requires any new macros, make sure those macros | ||
are tested in `dbt/macros/tests/test_all.sql`. If any tests need implementing, | ||
follow the pattern set by existing tests to implement them. | ||
|
||
- [ ] Commit your changes to a branch and open a pull request to build your | ||
model and run tests in a CI environment. |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went with an absolute link here instead of a relative link because it seemed like a safer choice to link to the project homepage rather than assume a specific file structure going forward. But I could be convinced to switch to a relative link like: