Skip to content

Commit

Permalink
feat(ingest/dremio): Dremio Source Ingestion (#11598)
Browse files Browse the repository at this point in the history
Co-authored-by: Jonny Dixon <jonny.dixon@acryl.io>
Co-authored-by: Jonny Dixon <45681293+acrylJonny@users.noreply.github.com>
Co-authored-by: Mayuri Nehate <33225191+mayurinehate@users.noreply.github.com>
  • Loading branch information
4 people authored Nov 5, 2024
1 parent 5c58128 commit 5d17ecb
Show file tree
Hide file tree
Showing 29 changed files with 11,812 additions and 0 deletions.
4 changes: 4 additions & 0 deletions datahub-web-react/src/app/ingest/source/builder/constants.ts
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import clickhouseLogo from '../../../../images/clickhouselogo.png';
import cockroachdbLogo from '../../../../images/cockroachdblogo.png';
import trinoLogo from '../../../../images/trinologo.png';
import dbtLogo from '../../../../images/dbtlogo.png';
import dremioLogo from '../../../../images/dremiologo.png';
import druidLogo from '../../../../images/druidlogo.png';
import elasticsearchLogo from '../../../../images/elasticsearchlogo.png';
import feastLogo from '../../../../images/feastlogo.png';
Expand Down Expand Up @@ -52,6 +53,8 @@ export const COCKROACHDB = 'cockroachdb';
export const COCKROACHDB_URN = `urn:li:dataPlatform:${COCKROACHDB}`;
export const DBT = 'dbt';
export const DBT_URN = `urn:li:dataPlatform:${DBT}`;
export const DREMIO = 'dremio';
export const DREMIO_URN = `urn:li:dataPlatform:${DREMIO}`;
export const DRUID = 'druid';
export const DRUID_URN = `urn:li:dataPlatform:${DRUID}`;
export const DYNAMODB = 'dynamodb';
Expand Down Expand Up @@ -139,6 +142,7 @@ export const PLATFORM_URN_TO_LOGO = {
[CLICKHOUSE_URN]: clickhouseLogo,
[COCKROACHDB_URN]: cockroachdbLogo,
[DBT_URN]: dbtLogo,
[DREMIO_URN]: dremioLogo,
[DRUID_URN]: druidLogo,
[DYNAMODB_URN]: dynamodbLogo,
[ELASTICSEARCH_URN]: elasticsearchLogo,
Expand Down
8 changes: 8 additions & 0 deletions datahub-web-react/src/app/ingest/source/builder/sources.json
Original file line number Diff line number Diff line change
Expand Up @@ -302,5 +302,13 @@
"description": "Configure a custom recipe using YAML.",
"docsUrl": "https://datahubproject.io/docs/metadata-ingestion/",
"recipe": "source:\n type: <source-type>\n config:\n # Source-type specifics config\n <source-configs>"
},
{
"urn": "urn:li:dataPlatform:dremio",
"name": "dremio",
"displayName": "Dremio",
"description": "Import Spaces, Sources, Tables and statistics from Dremio.",
"docsUrl": "https://datahubproject.io/docs/metadata-ingestion/",
"recipe": "source:\n type: dremio\n config:\n # Coordinates\n hostname: null\n port: null\n #true if https, otherwise false\n tls: true\n\n #For cloud instance\n #is_dremio_cloud: True\n #dremio_cloud_project_id: <project_id>\n\n #Credentials with personal access token\n authentication_method: PAT\n password: pass\n\n #Or Credentials with basic auth\n #authentication_method: password\n #username: null\n #password: null\n\n stateful_ingestion:\n enabled: true"
}
]
Binary file added datahub-web-react/src/images/dremiologo.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -705,6 +705,7 @@ Please see our [Integrations page](https://datahubproject.io/integrations) if yo
| [datahub-lineage-file](./generated/ingestion/sources/file-based-lineage.md) | _no additional dependencies_ | Lineage File source |
| [datahub-business-glossary](./generated/ingestion/sources/business-glossary.md) | _no additional dependencies_ | Business Glossary File source |
| [dbt](./generated/ingestion/sources/dbt.md) | _no additional dependencies_ | dbt source |
| [dremio](./generated/ingestion/sources/dremio.md) | `pip install 'acryl-datahub[dremio]'` | Dremio Source |
| [druid](./generated/ingestion/sources/druid.md) | `pip install 'acryl-datahub[druid]'` | Druid Source |
| [feast](./generated/ingestion/sources/feast.md) | `pip install 'acryl-datahub[feast]'` | Feast source (0.26.0) |
| [glue](./generated/ingestion/sources/glue.md) | `pip install 'acryl-datahub[glue]'` | AWS Glue source |
Expand Down
11 changes: 11 additions & 0 deletions metadata-ingestion/docs/sources/dremio/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
### Concept Mapping

Here's a table for **Concept Mapping** between Dremio and DataHub to provide a clear overview of how entities and concepts in Dremio are mapped to corresponding entities in DataHub:

| Source Concept | DataHub Concept | Notes |
| -------------------------- | --------------- | ---------------------------------------------------------- |
| **Physical Dataset/Table** | `Dataset` | Subtype: `Table` |
| **Virtual Dataset/Views** | `Dataset` | Subtype: `View` |
| **Spaces** | `Container` | Mapped to DataHub’s `Container` aspect. Subtype: `Space` |
| **Folders** | `Container` | Mapped as a `Container` in DataHub. Subtype: `Folder` |
| **Sources** | `Container` | Represented as a `Container` in DataHub. Subtype: `Source` |
29 changes: 29 additions & 0 deletions metadata-ingestion/docs/sources/dremio/dremio.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
### Starter Receipe for Dremio Cloud Instance

```
source:
type: dremio
config:
# Authentication details
authentication_method: PAT # Use Personal Access Token for authentication
password: <your_api_token> # Replace <your_api_token> with your Dremio Cloud API token
is_dremio_cloud: True # Set to True for Dremio Cloud instances
dremio_cloud_project_id: <project_id> # Provide the Project ID for Dremio Cloud
# Enable query lineage tracking
include_query_lineage: True
#Optional
source_mappings:
- platform: s3
source_name: samples
# Optional
schema_pattern:
allow:
- "<source_name>.<table_name>"
sink:
# Define your sink configuration here
```
25 changes: 25 additions & 0 deletions metadata-ingestion/docs/sources/dremio/dremio_pre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
### Setup

This integration pulls metadata directly from the Dremio APIs.

You'll need to have a Dremio instance up and running with access to the necessary datasets, and API access should be enabled with a valid token.

The API token should have the necessary permissions to **read metadata** and **retrieve lineage**.

#### Steps to Get the Required Information

1. **Generate an API Token**:

- Log in to your Dremio instance.
- Navigate to your user profile in the top-right corner.
- Select **Generate API Token** to create an API token for programmatic access.

2. **Permissions**:

- The token should have **read-only** or **admin** permissions that allow it to:
- View all datasets (physical and virtual).
- Access all spaces, folders, and sources.
- Retrieve dataset and column-level lineage information.

3. **Verify External Data Source Permissions**:
- If Dremio is connected to external data sources (e.g., AWS S3, relational databases), ensure that Dremio has access to the credentials required for querying those sources.
34 changes: 34 additions & 0 deletions metadata-ingestion/docs/sources/dremio/dremio_recipe.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,34 @@
source:
type: dremio
config:
# Coordinates
hostname: localhost
port: 9047
tls: true

# Credentials with personal access token(recommended)
authentication_method: PAT
password: pass
# OR Credentials with basic auth
# authentication_method: password
# username: user
# password: pass

#For cloud instance
#is_dremio_cloud: True
#dremio_cloud_project_id: <project_id>

include_query_lineage: True

#Optional
source_mappings:
- platform: s3
source_name: samples

#Optional
schema_pattern:
allow:
- "<source_name>.<table_name>"

sink:
# sink configs
3 changes: 3 additions & 0 deletions metadata-ingestion/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,7 @@
"delta-lake": {*data_lake_profiling, *delta_lake},
"dbt": {"requests"} | dbt_common | aws_common,
"dbt-cloud": {"requests"} | dbt_common,
"dremio": {"requests"} | sql_common,
"druid": sql_common | {"pydruid>=0.6.2"},
"dynamodb": aws_common | classification_lib,
# Starting with 7.14.0 python client is checking if it is connected to elasticsearch client. If its not it throws
Expand Down Expand Up @@ -616,6 +617,7 @@
"clickhouse-usage",
"cockroachdb",
"delta-lake",
"dremio",
"druid",
"elasticsearch",
"feast",
Expand Down Expand Up @@ -714,6 +716,7 @@
"s3 = datahub.ingestion.source.s3:S3Source",
"dbt = datahub.ingestion.source.dbt.dbt_core:DBTCoreSource",
"dbt-cloud = datahub.ingestion.source.dbt.dbt_cloud:DBTCloudSource",
"dremio = datahub.ingestion.source.dremio.dremio_source:DremioSource",
"druid = datahub.ingestion.source.sql.druid:DruidSource",
"dynamodb = datahub.ingestion.source.dynamodb.dynamodb:DynamoDBSource",
"elasticsearch = datahub.ingestion.source.elastic_search:ElasticsearchSource",
Expand Down
Empty file.
Loading

0 comments on commit 5d17ecb

Please sign in to comment.