Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support running SQL models on Google Cloud Dataproc Serverless #1353

Open
3 tasks done
gddezero opened this issue Sep 20, 2024 · 1 comment
Open
3 tasks done
Labels
enhancement New feature or request

Comments

@gddezero
Copy link

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Context

Google Cloud Dataproc Serverless lets you run Spark workloads without requiring you to provision and manage your own Dataproc cluster. Use the Google Cloud console, Google Cloud CLI, or Dataproc API to submit a batch workload to the Dataproc Serverless service. The service will run the workload on a managed compute infrastructure, autoscaling resources as needed.

Dataproc Serverless is widely used for GCP customers to build data pipelines. A typical use case is submitting Spark SQL jobs to Dataproc Serverless to transform data and build data warehouse.

Current Status

dbt only supports running Python models on Dataproc Serverless as a companion service of BigQuery
https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup#running-python-models-on-dataproc

Request

Support running SQL models on Dataproc Serverless

Describe alternatives you've considered

No response

Who will this benefit?

Customers using Google Cloud

Are you interested in contributing this feature?

No response

Anything else?

No response

@gddezero gddezero added enhancement New feature or request triage labels Sep 20, 2024
@gddezero gddezero changed the title [Feature] Support Google Cloud Serverless [Feature] Support running SQL models on Google Cloud Dataproc Serverless Sep 20, 2024
@dbeatty10 dbeatty10 transferred this issue from dbt-labs/dbt-core Sep 23, 2024
@amychen1776
Copy link

Hello @gddezero Could you provide more context about why you prefer Datapoc for SQL rather than directly on BQ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants