Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate dbt target and state directories to properly support slim CI #69

Conversation

jeancochrane
Copy link
Contributor

@jeancochrane jeancochrane commented Aug 14, 2023

One underdocumented subtlety of dbt's state selection options is that when selecting resources based on state attributes (e.g. selecting only resources that are new or modified since last run, for "slim CI") you have to point to a separate state directory containing a manifest file from the previous run, you can't just point to the default state directory that dbt uses (target/). I had assumed it was best practice to use one state directory, so I designed the build-and-test-dbt workflow to save/restore the state cache from the default target/ state directory; however, it appears that dbt updates its own manifest file before comparing it to the state manifest file to see which resources have changed, so if those manifest files are the same file then dbt will never think that any resources have changed.

Another subtlety is that state:modified and state:new are two separate selectors, so you can't just use state:modified if you want to pick up modified and new models, you have to use both. These two problems are causing the bug described in #68 which is preventing us from deploying changes to our models.

This PR fixes our build-and-test-dbt workflow such that we compare dbt state to the cached manifest file using a new state/ directory that is separate from target/. It also tweaks the build and test commands to filter for state:new models in addition to state:modified.

Here's an example of a run for a test commit that adds a new model (default.vw_test) and modifies an existing one (default.vw_pin_universe). You can check the output for the Build models and Test models steps to confirm that CI is only building and testing new or modified models.

Closes #68.

@jeancochrane jeancochrane force-pushed the jeancochrane/68-data-catalog-dbt-ci-is-not-properly-selecting-for-new-and-modified-models branch from b46d734 to c1d7d20 Compare August 14, 2023 19:49
@jeancochrane jeancochrane force-pushed the jeancochrane/68-data-catalog-dbt-ci-is-not-properly-selecting-for-new-and-modified-models branch from 402d400 to 257ba0f Compare August 14, 2023 20:02
@@ -31,15 +31,15 @@ models:
- pin
- year
config:
error_if: ">280662"
error_if: ">280679"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As per usual these days, I also had to bump these thresholds to get tests to pass. Billy is still investigating but his hunch is that the assumptions behind these two tests are not actually correct.

@jeancochrane jeancochrane marked this pull request as ready for review August 14, 2023 20:15
@jeancochrane jeancochrane requested a review from a team as a code owner August 14, 2023 20:15
Copy link
Member

@dfsnow dfsnow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great! Nice debugging @jeancochrane. Merge at your leisure.

@jeancochrane jeancochrane merged commit b1e3182 into data-catalog Aug 15, 2023
3 checks passed
@jeancochrane jeancochrane deleted the jeancochrane/68-data-catalog-dbt-ci-is-not-properly-selecting-for-new-and-modified-models branch August 15, 2023 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants