DataHub v0.8.28
Release Highlights
Notable UI-Based Features
Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.
View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.
User Groups, Policies, and Tags have a new look!
- The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
- We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
- The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).
Notable Metadata Model & Ingestion-Based Features
First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.
Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.
Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.
Miscellaneous Metadata Ingestion Updates:
- Incubating: PowerBI Ingestion Source
- BigQuery Profiling: ability to disable profiling by partition
- Tableau improvements: Workbooks are now modeled as “Containers”
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
- fix(ui): minor ui fixes by @jjoyce0510 in #4325
- fix(lineage): Fix issue where downstream of datajobs do not appear by @dexter-mh-lee in #4326
- fix(ingestion): Insulate 'datahub' and child loggers from external modules. by @rslanka in #4324
- feat(aws-docs): Add section on attaching policies to the datahub-actions pod by @dexter-mh-lee in #4334
- feat(ingest): transformers - add support for processing MCP-s by @swaroopjagadish in #4337
- Allow elasticsearch to authenticate without
username
andpassword
by @salihcaan in #4329 - docs: Update postgres.md by @BoyuanZhangDE in #4292
- fix(ci): wait more for add/remove user test by @gabe-lyons in #4339
- feat(impact analysis): bugfixes for Impact Analysis by @gabe-lyons in #4336
- fix(ingestion): add logging, make job more resilient to errors by @anshbansal in #4331
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
- @salihcaan made their first contribution in #4329
- @BoyuanZhangDE made their first contribution in #4292
Full Changelog: v0.8.27...v0.8.28