DataHub Roadmap

The DataHub Roadmap has a new home!

Please refer to the new DataHub Roadmap for the most up-to-date details of what we are working on!

If you have suggestions about what we should consider in future cycles, feel free to submit a feature request and/or upvote existing feature requests so we can get a sense of level of importance!

Historical Roadmap

This following represents the progress made on historical roadmap items as of January 2022. For incomplete roadmap items, we have created Feature Requests to gauge current community interest & impact to be considered in future cycles. If you see something that is still of high-interest to you, please up-vote via the Feature Request portal link and subscribe to the post for updates as we progress through the work in future cycles.

Q4 2021 [Oct - Dec 2021]

Data Lake Ecosystem Integration

Spark Delta Lake - View in Feature Reqeust Portal
Apache Iceberg - Included in Q1 2022 Roadmap - Community-Driven Metadata Ingestion Sources
Apache Hudi - View in Feature Request Portal

Metadata Trigger Framework

View in Feature Request Portal

Stateful sensors for Airflow
Receive events for you to send alerts, email
Slack integration

ML Ecosystem

Features (Feast)
Models (Sagemaker)
Notebooks - View in Feature Request Portal](https://feature-requests.datahubproject.io/admin/p/jupyter-integration)

Metrics Ecosystem

View in Feature Request Portal

Measures, Dimensions
Relationships to Datasets and Dashboards

Data Mesh oriented features

Data Product modeling
Analytics to enable Data Meshification

Collaboration

View in Feature Reqeust Portal

Conversations on the platform
Knowledge Posts (Gdocs, Gslides, Gsheets)

Q3 2021 [Jul - Sept 2021]

Data Profiling and Dataset Previews

Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)

Support for data profiling and preview extraction through ingestion pipeline (column samples, not rows)

Data Quality

Included in Q1 2022 Roadmap - Display Data Quality Checks in the UI

Support for data profiling and time-series views
Support for data quality visualization
Support for data health score based on data quality results and pipeline observability
Integration with systems like Great Expectations, AWS deequ, dbt test etc.

Fine-grained Access Control for Metadata

Support for role-based access control to edit metadata
Scope: Access control on entity-level, aspect-level and within aspects as well.

Column-level lineage

Included in Q1 2022 Roadmap - Column Level Lineage

Metadata Model
SQL Parsing

Operational Metadata

Partitioned Datasets - - View in Feature Request Portal
Support for operational signals like completeness, freshness etc.

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Production-grade Helm charts for Kubernetes-based deployment
How-to guides for deploying DataHub to all the major cloud providers
- AWS
- Azure
- GCP

Product Analytics for DataHub

Helping you understand how your users are interacting with DataHub
Integration with common systems like Google Analytics etc.

Usage-Based Insights

Display frequently used datasets, etc.
Improved search relevance through usage data

Role-based Access Control

Support for fine-grained access control for metadata operations (read, write, modify)
Scope: Access control on entity-level, aspect-level and within aspects as well.
This provides the foundation for Tag Governance, Dataset Preview access control etc.

No-code Metadata Model Additions

Use Case: Developers should be able to add new entities and aspects to the metadata model easily

No need to write any code (in Java or Python) to store, retrieve, search and query metadata
No need to write any code (in GraphQL or UI) to visualize metadata

Q1 2021 [Jan - Mar 2021]

React UI

Build a new UI based on React
Deprecate open-source support for Ember UI

Python-based Metadata Integration

Build a Python-based Ingestion Framework
Support common people repositories (LDAP)
Support common data repositories (Kafka, SQL databases, AWS Glue, Hive)
Support common transformation sources (dbt, Looker)
Support for push-based metadata emission from Python (e.g. Airflow DAGs)

Dashboards and Charts

Support for dashboard and chart entity page
Support browse, search and discovery

SSO for Authentication

Support for Authentication (login) using OIDC providers (Okta, Google etc)

Business Glossary

Support for business glossary model (definition + storage)
Browse taxonomy
UI support for attaching business terms to entities and fields

Jobs, Flows / Pipelines

Use case: Search and Discover your Pipelines (e.g. Airflow DAGs) and understand data lineage with datasets

Support for Metadata Models + Backend Implementation
Metadata Integrations with systems like Airflow.

Data Profiling and Dataset Previews

Use Case: See sample data for a dataset and statistics on the shape of the data (column distribution, nullability etc.)

Support for data profiling and preview extraction through ingestion pipeline
Out of scope for Q1: Access control of data profiles and sample data

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

roadmap.md

roadmap.md

DataHub Roadmap

The DataHub Roadmap has a new home!

Historical Roadmap

Q4 2021 [Oct - Dec 2021]

Data Lake Ecosystem Integration

Metadata Trigger Framework

ML Ecosystem

Metrics Ecosystem

Data Mesh oriented features

Collaboration

Q3 2021 [Jul - Sept 2021]

Data Profiling and Dataset Previews

Data Quality

Fine-grained Access Control for Metadata

Column-level lineage

Operational Metadata

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Product Analytics for DataHub

Usage-Based Insights

Role-based Access Control

No-code Metadata Model Additions

Q1 2021 [Jan - Mar 2021]

React UI

Python-based Metadata Integration

Dashboards and Charts

SSO for Authentication

Tags

Business Glossary

Jobs, Flows / Pipelines

Data Profiling and Dataset Previews

Files

roadmap.md

Latest commit

History

roadmap.md

File metadata and controls

DataHub Roadmap

The DataHub Roadmap has a new home!

Historical Roadmap

Q4 2021 [Oct - Dec 2021]

Data Lake Ecosystem Integration

Metadata Trigger Framework

ML Ecosystem

Metrics Ecosystem

Data Mesh oriented features

Collaboration

Q3 2021 [Jul - Sept 2021]

Data Profiling and Dataset Previews

Data Quality

Fine-grained Access Control for Metadata

Column-level lineage

Operational Metadata

Q2 2021 (Apr - Jun 2021)

Cloud Deployment

Product Analytics for DataHub

Usage-Based Insights

Role-based Access Control

No-code Metadata Model Additions

Q1 2021 [Jan - Mar 2021]

React UI

Python-based Metadata Integration

Dashboards and Charts

SSO for Authentication

Tags

Business Glossary

Jobs, Flows / Pipelines

Data Profiling and Dataset Previews