Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support run multiple dbt incremental models in the same table in parallel #1362

Open
3 tasks done
leoricklin opened this issue Oct 3, 2024 · 1 comment
Open
3 tasks done
Labels
enhancement New feature or request triage

Comments

@leoricklin
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt-bigquery functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Context

dbt creates a temporary table and uses a merge statement on Bigquery to refresh incremental models. See https://discourse.getdbt.com/t/bigquery-dbt-incremental-changes/982 for more details.

Current Status

In the current implementation, it automatically generates a temporary table name by appending __dbt_tmp to the existing target table name. This limitation prevents parallel execution of incremental models that target the same table. See https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/include/bigquery/macros/materializations/incremental.sql#L80C26-L80C44 and https://github.com/dbt-labs/dbt-adapters/blob/main/dbt/include/global_project/macros/adapters/relation.sql#L9C12-L9C31).

Request

Provide a setting to create unique temporary table names for parallel execution of multiple incremental models targeting the same table. There is a similar feature in dbt-athena See “unique_tmp_table_suffix” section in https://github.com/dbt-labs/dbt-athena/blob/main/README.md

Describe alternatives you've considered

No response

Who will this benefit?

Developers using GCP Bigquery

Are you interested in contributing this feature?

Yes

Anything else?

No response

@leoricklin leoricklin added enhancement New feature or request triage labels Oct 3, 2024
@leoricklin
Copy link
Author

This code snippet proposes improvements to the https://github.com/dbt-labs/dbt-bigquery/blob/main/dbt/include/bigquery/macros/materializations/incremental.sql by introducing a new table configuration unique_tmp_table_suffix which provides similar capabilities in the dbt-athena :

  • For incremental models using merge strategy on Bigquery table
  • Replace the __dbt_tmp suffix used as temporary table name suffix by the unique invocation_id
  • Useful if you are looking to run multiple dbt build inserting in the same table in parallel
{########## updated ##########}
{% macro generate_unique_temporary_table_suffix() %}

  {%- set tmp_table_suffix = '__dbt_tmp_' ~ invocation_id -%}

  {{ return(tmp_table_suffix) }}

{% endmacro %}

{% materialization incremental, adapter='bigquery', supported_languages=['sql', 'python'] -%}

  {%- set unique_key = config.get('unique_key') -%}
  {%- set full_refresh_mode = (should_full_refresh()) -%}
  {%- set language = model['language'] %}

  {%- set target_relation = this %}
  {%- set existing_relation = load_relation(this) %}


  {########## updated ##########}
  {%- set unique_tmp_table_suffix = config.get('unique_tmp_table_suffix', False) | as_bool -%}
  {% if unique_tmp_table_suffix == True and strategy == 'merge' %}
    {% set tmp_table_suffix = generate_unique_temporary_table_suffix() %}
  {% else %}
    {% set tmp_table_suffix = '__dbt_tmp' %}
  {% endif %}
  {%- set tmp_relation = make_temp_relation(this, suffix=tmp_table_suffix) -%}  

@leoricklin leoricklin reopened this Oct 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request triage
Projects
None yet
Development

No branches or pull requests

1 participant