Releases: datahub-project/datahub
Datahub v0.8.30
V0.8.30
Release Highlights
- Fix for OIDC encryption bug from v0.8.29
- Adds platform instance id to the container id generation, and support for migrating the old container ids to the new ones via the
datahub migrate
CLI.
Notable UI-Based Features
- Showing recent searches in autocomplete.
What's Changed
- fix(ui): some small ui fixes for lineage by @gabe-lyons in #4381
- fix(docs): change cabify link by @maaaikoool in #4373
- Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker by @Ankit-Keshari-Vituity in #4359
- feat(GE): add option to disable sql parsing, use default parser by @mayurinehate in #4377
- fix(removed): Make sure removed entities do not appear on recommendations by @dexter-mh-lee in #4353
- fix(browse): fix browse double click issue by @gabe-lyons in #4382
- fix(oidc): Update group membership each login (and make group extraction disabled by default) by @jjoyce0510 in #4380
- feat(ingestion): add java protobuf schema ingestion by @leifker in #4178
- Docs/update docs by @RyanHolstien in #4393
- Revert "Fixed Bug: Alpha slider doesn’t move, only the color slider is movabe in tag color picker" by @gabe-lyons in #4390
- feat(ingestion): improve logging, docs for bigquery, snowflake, redshift by @anshbansal in #4344
- fix(ingest) Azure AD: support nested groups (#4367) by @cccs-eric in #4368
- fix: add missing logo by @anshbansal in #4386
- feat(spark-lineage): add support to custom env and platform_instance by @MugdhaHardikar-GSLab in #4208
- fix(containers) - configure domain resolver for containers by @aditya-radhakrishnan in #4404
- feat(*): Support setting owner type when assigning ownership by @jjoyce0510 in #4354
- fix: telemetry failure should not cause CLI failure by @anshbansal in #4406
- feat(autocomplete): Show recent searches + improved autocomplete by @jjoyce0510 in #4400
- fix(ingestion): Fix mypy error stateful committable & restore mypy version. by @rslanka in #4408
- build(markupsafe): update markupsafe pinning for Airflow compatibility by @set5think in #4388
- feat(search): Add flag to enable caching on search service by @dexter-mh-lee in #4335
- fix(query_combiner): add try block to handle queries of type str by @WaStCo in #4397
- fix(ingestion): read all tables from redshift by @Abhiram98 in #4345
- fix(ingestion): Invoke SqlLineageSQLParser's implementation in a separate process by @rslanka in #4391
- fix(ingest): handle endpoints without 200 response in openapi by @JorgenEvens in #4332
- feat(ingestion): Add the ability to query the latest timeseries aspect value via the get_cli. by @rslanka in #4395
- Refactoring the quries into a single one to get the search results on Home Page by @Ankit-Keshari-Vituity in #4372
- feat(lineage): hide soft deleted nodes in lineage & adds banner in entity page by @gabe-lyons in #4410
- fix(lineage): Move lineage registry to entity-registry module by @dexter-mh-lee in #4412
- feat(cli) Changes rollback behaviour to apply soft deletes by default by @pedro93 in #4358
- fix(looker): various looker fixes by @gabe-lyons in #4394
- fix(oidc): Fixing OIDC encryption bug in v0.8.29 by @jjoyce0510 in #4418
- feat(oidc): Adding support for extracting single string groups claim by @jjoyce0510 in #4419
- fix: change log levels to debug by @anshbansal in #4411
- tests(cypress): reduce cypress flakiness by retrying login on failure by @gabe-lyons in #4423
- fix(ingest): extract redshift platform correctly from sqlalchemy uri by @mayurinehate in #4421
- build: Fix line endings for Windows check-out by @mattmatravers in #4370
- feat(gql): make gql layer resistant to unresolvable relationships by @gabe-lyons in #4424
- fix(ingestion) containers: Adding platform instance to container keys by @treff7es in #4279
- fix: don't set None default by @anshbansal in #4422
- Flexible search on soft delete by @pedro93 in #4405
- fix(no-code metadata models in ui): fixes bug with rendering renderSpec aspects by @gabe-lyons in #4430
New Contributors
- @set5think made their first contribution in #4388
- @Abhiram98 made their first contribution in #4345
- @JorgenEvens made their first contribution in #4332
- @mattmatravers made their first contribution in #4370
Full Changelog: v0.8.29...v0.8.30
DataHub v0.8.29
v0.8.29
NOTICE
This version is affected by an OIDC (SSO) related issue with the following stack trace:
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend Caused by: java.security.InvalidKeyException: Invalid AES key length: 30 bytes
datahub-datahub-frontend-8d7f7cf6f-xvjwm datahub-frontend at com.sun.crypto.provider.AESCrypt.init(AESCrypt.java:87)
DataHub core team is working to address this. For now, we recommend staying on 0.8.28 if you are using OIDC actively!
Release Highlights
Fix for MAE & MCE consumer healthcheck
Upgrade to Java 11 and Gradle 6
Full Commit Log
- #4360 @maaaikoool Add cabify as adopter
- #4365 @dexter-mh-lee fix(vulnerabilities): Fix vulnerabilities in datahub-frontend
- #4361 @jjoyce0510 fix(ui): Supporting unknown data platform type
- #4363 @rslanka feat(ingest): Add memory leak detection capability to the datahub cli command.
- #4366 @RyanHolstien fix(metadata-jobs): fix root context for springboot
- #4340 @leifker feat(build): upgrade to gradle 6 for toolchain to support java 11
- #4357 @anshbansal feat: change quickstart to use head tag for actions
- #4356 @treff7es fix(ingest): bigquery - Fixing missing attribute error if credential was not set
- #4319 @vcs9 feat(ingest): mysql - add database_alias functionality
- #4352 @dexter-mh-lee fix(ci): fix model generation workflow
- #4351 @jjoyce0510 fix(frontend): Fix common OIDC issues
- #4111 @treff7es fix(ingest) bigquery-usage: Adding credential support for bigquery usage
- #4343 @Ankit-Keshari-Vituity Fixed the Small Project issue
- #4350 @MugdhaHardikar-GSLab fix(config-parsing): add support for variable expansion for in variables in between string
- #4330 @anshbansal fix(hive): clean protocol for hive source
- #4338 @maggiehays doc(platforms) adding PowerBI logo to docs website'
- #4342 @anshbansal feat(quickstart): restart actions pod in case of failures
- #4347 @mayurinehate fix(GE): fix dependencies for GE DataHubValidationAction, logic for s…
- #4349 @gabe-lyons query for custom properties on containers
- #4341 @anshbansal fix(doc): remove duplicate entry for permission
- #9 @shirshanka fix(ci): fix datahub jar publish action
- #8 @shirshanka feat(ci): fix jar publish action
- #7 @czbernard fix(ci): fixing tag computation for docker image build
- #6 @shirshanka feat(ci): adding dockerfile and action for datahub-airflow image
- #5 @shirshanka fix(ci): pin python version to 3.9.9 for release action
- #4 @dexter-mh-lee fix(ci): docker-ingestion - update acryl workflow
- #3 @shirshanka fix(pypi): fixing package metadata to reflect source and changelog correctly
- #2 @shirshanka fix(ci): check for tagged reference to kick off pypi push
DataHub v0.8.28
Release Highlights
Notable UI-Based Features
Quickly view, search, and filter the downstream dependencies of any Entity! By using the Impact Analysis Lineage view, you can now see the full set of downstream entities that may be impacted by a change to a given entity. You can also search, filter, and export the list of entities to CSV; try it for yourself here.
View Dataset- and Column-Level Data Validation outcomes in DataHub. We now support surfacing outcomes from Great Expectations validations in Dataset Entities! Easily view the full history of validation outcomes to understand the trustworthiness of your data.
User Groups, Policies, and Tags have a new look!
- The User Group page has a new look, allowing you to assign an email address, Slack Channel, Group Owner, and more. Easily add/remove Group Members from the UI - test it out here.
- We refreshed the Policies Page, allowing you to see Policy membership and status at a glance.
- The Tag Details page has been overhauled! You can now edit the definition, assigned owners, and tag color via the UI (try it here).
Notable Metadata Model & Ingestion-Based Features
First Milestone: Column-Level Lineage is complete! The Metadata Model now supports “fine-grained” lineage for Datasets; see documentation here for details, including adding fine-grained lineage to a dataset or a datajob.
Define Dataset-to-Dataset lineage via YAML. As demonstrated in the February 2022 Town Hall, you can now set Dataset-level lineage via YAML. This is great for teams that have more bespoke lineage needs that cannot be auto-extracted by the current set of supported ingestion sources.
Track all changes to entities using the Timeline API. This unified timeline of changes to entities in the metadata graph provides a robust picture of how your metadata has evolved over time. Upcoming work will support surfacing this detail via the DataHub UI. See the overview from Town Hall here.
Miscellaneous Metadata Ingestion Updates:
- Incubating: PowerBI Ingestion Source
- BigQuery Profiling: ability to disable profiling by partition
- Tableau improvements: Workbooks are now modeled as “Containers”
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 i...
DataHub Release Candidate v0.8.28 (rc1)
DataHub v0.8.28 Release Candidate 1
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
Full Changelog: v0.8.27...v0.8.28rc1
Release Candidate v0.8.28
Release Candidate for Version 0.8.28.
What's Changed
- doc(adoption): adds Adevinta as DataHub adopter by @sgomezvillamor in #4227
- fix(policies): Remove multiple privileges for GENERATE_PERSONAL_ACCESS_TOKEN by @jjoyce0510 in #4239
- Added Drawer to show the tag profile data by @Ankit-Keshari-Vituity in #4132
- fix(ui) Misc styling fixes (truncated filter values, updating tag color on clickaway) by @jjoyce0510 in #4246
- feat(ingest): switch telemetry endpoint to Mixpanel by @kevinhu in #4238
- docs: Add Udemy & Adevinta logos by @maggiehays in #4247
- fix(ingest) kafka-connect: Pass the env variable as part of making dataset by @arunvasudevan in #4244
- feat(lineage): Column level lineage model by @rslanka in #4248
- feat(docker): add multiplatform docker support for arm64 (m1) by @zhaofengnian18 in #4221
- fix(ingest): fix tableau sheets external url ingestion by @maaaikoool in #4231
- fix(docs): fix broken link by @anshbansal in #4242
- fix(docs): fix reference to the credential step in okta guide by @bskim45 in #4243
- feat(ingest): add ability to provide lineage described from within a file by @eddyv in #4116
- Add support for url_prefix in elastisearch source by @pppsunil in #4214
- feat(search): supporting chinese glossaryterm full text retrieval(#3914) by @Huyueeer in #3956
- feat(platform): Schema version history timeline. by @rslanka in #4252
- fix(mae-comsumer): wrong aspect name in usage event transformer by @zhoxie-cisco in #4249
- feat(ml): Add searchable annotation for features in feature table by @dexter-mh-lee in #4216
- fix(ingest): fix telemetry profile emission by @kevinhu in #4253
- bug(logo): add platform to chart relationship query by @RyanHolstien in #4255
- chore(docs): cleanup location of guide, gitignore generated from git by @anshbansal in #4256
- feat(ingest): Spark-free data lake ingestion by @kevinhu in #4131
- Openapi new auth by @vlavorini in #4086
- feat(docs) add fine-grained lineage docs and examples by @ksrinath in #4260
- feat(ingest): add option to copy URN, fix graphql docs by @anshbansal in #4209
- chore(managed ingestion): add variables for default val, update vals by @anshbansal in #4186
- docs(ui): Adding guide for adding users to DataHub. by @jjoyce0510 in #4262
- fix(docs): doc build failing by @anshbansal in #4267
- fix(doc) fix spelling mistake in dataset doc by @ksrinath in #4264
- feat(ingest): add lineage_client_project_id field to the BigQuery config by @vcs9 in #4138
- feat(graphql): Adding resolved users and groups to policies by @jjoyce0510 in #4272
- bug(schema_version_history): fix semantic version ordering by @RyanHolstien in #4271
- fix(recs ui): fixing tag color in recommendations by @jjoyce0510 in #4274
- fix(ingestion): Fix snowflake lineage + logging & reporting improvements. by @rslanka in #4276
- fix(docs): Error in running airflow locally by @buggythepirate in #4259
- feat: powerbi source plugin by @mohdsiddique in #4201
- test(timeline): fix smoke test by @RyanHolstien in #4285
- fix(ingest) - always display CLI version by @aditya-radhakrishnan in #4282
- feat(lineage) Bigquery: Supporting v2 audit metadata on Bigquery by @treff7es in #4233
- fix(recipe-parsing): fix recipe config parsing for $ by @MugdhaHardikar-GSLab in #4258
- docs: add details to update users when using helm by @anshbansal in #4268
- feat(model): adds PRE in the FabricType enum by @sgomezvillamor in #4226
- fix(ingestion): Fix snowflake view upstream lineages to eliminate false edges. by @rslanka in #4284
- fix(docs): fix frontend docs to replace port 9001 -> 9002 by @gabe-lyons in #4280
- fix(docker): use exec form to start container main process by @gmcoringa in #4245
- fix(ingestion): revert positional arg change by @anshbansal in #4266
- feat(profiling) - Bigquery: Ability to disable partition profiling by @treff7es in #4228
- fix(ingest): clarify s3/s3a requirements and platform defaults by @kevinhu in #4263
- Remove <> from add-users.md doc by @jjoyce0510 in #4293
- fix(ingestion): Fix bigquery stateful ingestion checkpoint reconstruction. by @rslanka in #4295
- fix(telemetry): telemetry fail should not cause the CLI to fail by @anshbansal in #4302
- fix(search): Update urn tokenizer to tokenize on periods and slashes by @dexter-mh-lee in #4085
- Fixed the UI issue: Height issue of editor and the spacing issue between logo and description by @Ankit-Keshari-Vituity in #4300
- fix(vulnerabilities): Fix new vulnerabilities by upgrading libraries by @dexter-mh-lee in #4297
- docs(readmes): Update module READMEs to reflect the current state of the world by @jjoyce0510 in #4294
- Resign the policy tab by @Ankit-Keshari-Vituity in #4232
- refactor(extractor): Move extractors to entity-registry by @dexter-mh-lee in #4307
- refactor(ui) Minor policies styling improvements. by @jjoyce0510 in #4309
- feat(ui): Introducing New group profile by @jjoyce0510 in #4308
- refactor(ui): Simplify process of adding user.props (w/ docs) by @jjoyce0510 in #4296
- feat (ingest) Kafka-connect: Adding Auth to Kafka Connect API by @arunvasudevan in #4298
- fix(doc): Add warning on using AWS glue schema registry by @dexter-mh-lee in #4306
- fix(ingestion) Removing python restriction by @treff7es in #4312
- fix(ingest) bigquery: Remove unneeded warning by @treff7es in #4317
- doc: improve doc on adding source by @anshbansal in #4316
- fix: revert changes to OpenApi casing by @anshbansal in #4291
- feat(assertions): Adding Assertions Entity & Great Expectations BETA by @jjoyce0510 in #4305
- feat(tableau): emit workbook as container entity in tableau source, some minor fixes in tableau source by @mayurinehate in #4261
- fix(ui) Misc UI fixes & styling improvements. by @jjoyce0510 in #4311
- fix(tags) - map tags to globalTags for entities by @aditya-radhakrishnan in #4310
- fix(quickstart): Pin actions pod + add volume mount for datahub-frontend by @jjoyce0510 in #4318
- fix(ui): Correct display name for users in UI by @jjoyce0510 in #4323
- feat(Impact Analysis): Support impact analysis to check all downstreams of given entity by @dexter-mh-lee in #4322
New Contributors
- @zhaofengnian18 made their first contribution in #4221
- @bskim45 made their first contribution in #4243
- @eddyv made their first contribution in #4116
- @Huyueeer made their first contribution in #3956
- @vcs9 made their first contribution in #4138
- @mohdsiddique made their first contribution in #4201
- @gmcoringa made their first contribution in #4245
Full Changelog: v0.8.27...RC-v0.8.28
DataHub v0.8.27
Release Highlights
Notable UI-Based Features
-
The User Page has a new look! You can now quickly filter & search for entities owned by a User, update/edit the user profile, and see details of which Groups the User belongs to. See it in action here.
-
Search for Entities by Owner - Easily filter search results by User/Group Owner
-
Edit existing Glossary Terms - you can now edit/update Glossary Term descriptions via the UI. Future work will allow creating Terms from the UI as well - stay tuned!
-
Improved Metadata Analytics - keep tabs on your DataHub entities across Domains, Platforms, Glossary Terms, Environments, & more. Check out the new & improved Analytics tab!
Notable Metadata Model & Ingestion-Based Features
-
ClickHouse integration is now incubating! This is a 100% Community-led integration - huge shoutout to @ne1r0n & @havramar for pushing initial code & moving this work through!
-
Kafka Stateful Ingestion - shoutout to @claudio-benfatto for building this out!
-
Extract Airflow Task Description - big thanks to @guidoturtu for the contrib!
-
BigQuery: profile latest Partition/Shard - We know that Data Profiling can be computationally expensive for partitioned/sharded BQ instances. We now support profiling only the latest partition/shard to minimize processing load.
Notable Docs Updates
-
NEW! Tips for Searching within DataHub - Ever wondered how to make the most of Searching within DataHub? Check out this doc put together by @xiphl
-
Improvements to Metadata Model Docs - This is a huge win for the Community - we’re taking a big step toward providing auto-generated & curated docs related to the Metadata Model - take a look here.
What's Changed
- feat(deprecation): Entity Deprecation Backend by @jjoyce0510 in #4073
- Fixed auto complete pr coments by @Ankit-Keshari-Vituity in #4072
- fix(ingestion): enforce correct behaviour for commit policy by @claudio-benfatto in #4092
- fix(aggregate): Fix NPE in aggregate api by @dexter-mh-lee in #4095
- add Haibo corp by @wangqinghuan in #4082
- fix(ingestion): Add psutil dependency required for stateful ingestion reporting. by @rslanka in #4099
- docs(kafka): add example for using domains, change for clarity by @anshbansal in #4100
- feat(ui): Add display name & title to editable corp user properties. by @jjoyce0510 in #4097
- fix(ingestion): Enhance BigQuery source logging. by @rslanka in #4101
- fix(glossary terms): fix add glossary term flow by @gabe-lyons in #4106
- (docs) Add Zynga & Tableau logos by @maggiehays in #4109
- fix(ingestion): Add sql lineage to redshift-usage plugin by @dexter-mh-lee in #4103
- feat(ui): Add svg datahub satellite loading logo by @eburairu in #4067
- fix(ingestion): resolve oracle issue with large view definitions by @hsheth2 in #4027
- fix(ingest): ignore Postgres information_schema tables by default by @kevinhu in #4069
- fix(ingest) - close event loops in Okta source and add additional debug logging by @aditya-radhakrishnan in #4077
- chore(ingest): remove unused groupby_unsorted utility by @hsheth2 in #4011
- fix(docs): fixing metadata model doc generation script and updating png by @swaroopjagadish in #4120
- fix(ci): fix formatting in doc generation action yaml by @swaroopjagadish in #4121
- fix(ci): fix formatting for action yaml by @swaroopjagadish in #4122
- feat(Tags/Terms): Backend support for tag & term mutations by @jjoyce0510 in #4096
- docs(backup): add doc for taking backup by @anshbansal in #3917
- fix(docs): make intro to metadata ingestion easier for beginners by @anshbansal in #4039
- fix(ingest) Athena: db filter was not applied by @treff7es in #4127
- fix(ui) - move book logo to right of glossary term by @aditya-radhakrishnan in #4125
- fix(docs) Fix doc on modelDocUpload by @daha in #4112
- fix(cypress): force clicks on tag mutation test by @gabe-lyons in #4102
- feat(ingest) Athena: Getting table properties for Athena datasets by @treff7es in #4123
- fix(logging): Fix Restli Logging Filter to print full stack trace on error by @dexter-mh-lee in #4136
- docs : markdown fixes for db retention table by @satyamkrishna in #4133
- docs : markdown fixes for db retention table by @satyamkrishna in #4148
- feat(ingestion): Kafka stateful ingestion by @claudio-benfatto in #4028
- fix(docs): update graphql docs to reference new graphql file by @gabe-lyons in #4139
- Feature/oss/update to v2 endpoints by @RyanHolstien in #4128
- fix(cli): add timeout for telemetry calls by @anshbansal in #4135
- chore(cli): update default cli version pinned in the UI based ingestion by @anshbansal in #4150
- fix(docs): fix example of delta lake by @anshbansal in #4149
- fix(ui): Fix cutoff profiling axis labels by @jjoyce0510 in #4154
- feat(ingest): Glue - Support for domains and containers by @treff7es in #4110
- feat(ui): Host platform images on datahub-web-react by @ngamanda in #4118
- bug(seedData): adds a key to the root user seed data and fixes corner case check for missing key aspects by @RyanHolstien in #4162
- UI Fix: Modal close on Enter press, autofocus on modal, added split panel, alignment of button by @Ankit-Keshari-Vituity in #4155
- feat(ui): Edit glossary term descriptions via UI by @jjoyce0510 in #4156
- Update querying-entities.md -> Documentation Error by @buggythepirate in #4157
- refactor(metadata-io/test): common ElasticsearchContainer and ability to override from environment. by @stephenp-gr in #4152
- feat(ingestion): Add support for snowflake view lineage. by @rslanka in #4163
- Update the doc to including options to include Views by @cuong-pham in #4164
- fix(ingest): Use lower-case dataset names in the dataset urns for all SQL-styled datasets. by @rslanka in #4140
- chore(ingestion): upgrade mypy by @hsheth2 in #4141
- ci(ingestion): fix airflow 1 deps for tox by @hsheth2 in #4083
- fix(ingest) Glue: Removing sqlalchemy dependency from glue by @treff7es in #4168
- fix(ingest) Athena: Generating propert containers for Athena by @treff7es in #4167
- Feature/users and groups UI updated as per new design by @ShubhamThakre in #4134
- chore(docs): various cleanup for docs-website by @hsheth2 in #4143
- bugfix(logging): reduce log noise from authentication chain by @RyanHolstien in #4173
- bug(glossaryTermLabels): fix glossary term labels missing and add cypress test by @RyanHolstien in #4171
- fixes(ui): Misc UI fixes + Adding Owners to Search Filters by @jjoyce0510 in #4175
- BugFixes/user-and-groups-minor-ui-fixes by @ShubhamThakre in #4181
- feat(groups): Adding editable group properties in the backend by @jjoyce0510 in #4166
- fix(python build): Pinning markupsafe by @treff7es in #4188
- feat(analytics): Improve analytics page by adding more charts regarding metadata ingested by @dexter-mh-lee in #4176
- docs(model): auto-generated docs and hand-written docs for the metada… by @swaroopjagadish in #4189
- minor fixes(ui): Small UI display fixes by @jjoyce0510 in #4190
- fix(ui): Return empty search response on invalid characters in search by @jjoyce0510 in #4193
- refactor(spark-line...
DataHub v0.8.26
This is a Bugfix release meant to address the issue with adding Glossary Terms to Dataset fields present in version 0.8.25.
Release Highlights
- Fixing bug where Glossary Terms cannot be added to Dataset fields in previous release version.
DataHub v0.8.25
Known Issues
- Adding Glossary Terms to schema fields does not work with this version due to a bug. Upgrade to v0.8.26 for the fix.
Release Highlights
Buckle up, folks! v0.8.25 brings some very exciting (and highly-requested!) updates.
Notable UI-Based Features
- UI-based Ingestion - as demoed in December Town Hall, we now support creating, configuring, scheduling, & executing batch metadata ingestion using the DataHub user interface. This makes getting metadata into DataHub easier by minimizing the overhead required to operate custom integration pipelines.
- Data Domains - DataHub now supports grouping data assets into logical collections called Domains. Domains are curated, top-level folders or categories where related assets can be explicitly grouped. Read the guide here!
- Data Containers are now supported! This is the physical grouping of entities, ex. a Schema is a container of 1 or more Datasets; a Dashboard is a container of 1 or more Charts.
Notable Metadata Model & Ingestion-Based Features
- Data Quality test results are now supported in the DataHub metadata model. This is the first milestone toward surfacing Dataset & Column-level Data Quality results in the UI (read full scope of work here). Future releases will include a Great Expectations integration & UI support - we’re on track to complete this in Q1 as planned.
- Avro files are now supported in the Data Lake File ingestion source
- Ingest metadata from multiple instances of the same platform type. This has been a very common use case within the Community - you can now differentiate multiple instances of the same platform type! If you already have pre-existing entries, use the
datahub
migrate command to migrate them over to platform instances. - Ignore users from Top Users calculation
- BigQuery - Data Profiling on only the latest partition/shard
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
Notable Fixes
- Fix to support
View in Looker
* feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985 - fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
DataHub Usage Guides
- docs(domains): Adding a User Guide for Domains by @jjoyce0510 in #4038
- docs(ingest): Adding UI ingestion guide by @jjoyce0510 in #4048
What's Changed
- fix(vulnerability): Upgrade gms base image by @dexter-mh-lee in #3962
- logging(frontend): Improve OIDC debug logs by @jjoyce0510 in #3967
- docs(delete): add curl request example to delete entity by @anshbansal in #3928
- fix(ingestion): match default username for Azure OIDC and Azure ingestion source by @iasoon in #3926
- Feature/dynamic platform icons by @RyanHolstien in #3968
- refactor(ingestion): remove duplicate aspect type by @hsheth2 in #3972
- fix(example): fix typo by @anshbansal in #3907
- fix(ingestion): Restrict python to <=3.9.9 by @treff7es in #3961
- feat(build): remove requirement for git directory for builds by @swaroopjagadish in #3977
- fix(ingestion): tighten conditions for restli json transformation by @hsheth2 in #3973
- fix(ingestion): don't dump variables for config errors by @hsheth2 in #3974
- Bugfix/increase socket timeout by @RyanHolstien in #3982
- feat(ingest): support for Avro data lake files by @kevinhu in #3913
- fix(build): exclude old log4j core by @RickardCardell in #3966
- fix(quickstart): Pin Quickstart version to v0.8.23. by @jjoyce0510 in #3983
- feat(looker): Adding optional Looker external url base url config by @jjoyce0510 in #3985
- fix(graphql): support group display name in ownership by @thomasplarsson in #3979
- fix(quickstart): Assign correct mysql-setup container for M1 and remove "head" default version. by @jjoyce0510 in #3987
- feat(embedded search results): support custom endpoints in embedded search result by @gabe-lyons in #3986
- fix(docker): datahub-gms - build in native, copy to target by @swaroopjagadish in #3992
- fix(ci): moving defaults back to head now that docker builds are green by @swaroopjagadish in #3993
- feat(ui): UI-based ingestion (as featured in Dec Townhall) by @jjoyce0510 in #3975
- quickstart: Adding UI ingestion to quickstart YAML by @jjoyce0510 in #3994
- feat(domains): Adding backend for Asset Domains (p1) by @jjoyce0510 in #3952
- Bug: a bug fix to bigquery_to_datahub.yml file by @dipeshmaurya in #3988
- fix(ingest): check if feature data type is present by @maaaikoool in #3932
- feat(platform-instance): a simple client-only change to support platf… by @swaroopjagadish in #3996
- docs(metadata-model): Adding to Metadata model docs by @jjoyce0510 in #3998
- Add Stash Logo & new Source Icons by @maggiehays in #4002
- feat(domains): UI for Asset Domains (p2) by @jjoyce0510 in #3995
- docs: add missing back tick for metadata-ingestion/README.md by @nickwu241 in #4003
- Bugfix/add missing classes by @RyanHolstien in #4000
- fix(superset): fix connection for redshift by @anshbansal in #3944
- fix(setup): fix setup for M1 by @anshbansal in #3958
- docs:add Optum logo by @maggiehays in #4005
- Refining Metadata Model docs further by @jjoyce0510 in #4001
- fix(docker): Alpine based multiplatform docker build for kafka-setup by @treff7es in #3991
- Bugfix/graph concurrency issue by @RyanHolstien in #4007
- feat(ingest): Add additional snowflake auth by @MikeSchlosser16 in #4009
- fix(ci): Reverting unnecessary domain test changes by @jjoyce0510 in #4013
- fix(metrics): Add metrics for mcl hooks by @dexter-mh-lee in #4008
- feat(platform) - Update FabricType enum to represent more fabrics by @aditya-radhakrishnan in #3997
- feat(ingest): emit flags and stats for profiling telemetry by @kevinhu in #3969
- fix(formatting): fix linting lib version requirement by @anshbansal in #3939
- fix(docs): fix business glossary docs by @anshbansal in #3916
- fix(profiling): Enabling profiling for low cardinality number columns by @treff7es in #3990
- fix(docs): update gms link by @lhvubtqn in #3927
- fix(ingest): lint fix a few files by @swaroopjagadish in #4016
- fix(ingest): adding platform instance urn to data platform instance aspects by @swaroopjagadish in #4015
- feat(ingest): use trino python client for sqlalchemy, supports python… by @mayurinehate in #3888
- fix(spark-lineage): select mock server port dynamically for unit test by @MugdhaHardikar-GSLab in #4018
- (feat)(Business Glossary) add tabular schema and new UI for business glossary by @saxo-lalrishav in #3813
- Test/add concurrency issue smoke test by @RyanHolstien in #4014
- feat(glossary-terms): Index glossary term custom properties by @jjoyce0510 in #3960
- feat(ingestion): Adding ability to ignore users from top users...
DataHub v0.8.24
Release Highlights
- Adding support for nested Glue schemas
- Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
- Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
- Miscellaneous bug fixes & improvements
What's Changed
- fix(workflow) docker-ingestion is failing bc of an invalid sed command by @dexter-mh-lee in #3896
- refactor(graphql): Migrating Datasets, Charts, Dashboards, Jobs, Flows to Entity V2 endpoint by @jjoyce0510 in #3897
- fix(ingest): populate system metadata for all metadata events (mcp, mcpw) by @swaroopjagadish in #3900
- perf: add/change scripts for tests by @anshbansal in #3840
- fix(glossary): owner should be optional as per docs by @anshbansal in #3858
- feat(ingestion): Support for nested glue schemas by @rslanka in #3895
- docs: change roadmap link by @jeffmerrick in #3904
- feat(kafka): support confluent references by @anshbansal in #3862
- docs (elasticsearch): config error by @JIWEI0 in #3901
- feat(ingestion): Data lake profiling by @kevinhu in #3656
- refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable by @senni0418 in #3870
- fix(ingest): nifi - replace hardcode password with config variable by @lhvubtqn in #3902
- feat(authentication): propagate expired token exceptions to end user by @gabe-lyons in #3894
- fix(docs): update data lake docs with path_spec details by @kevinhu in #3905
- ci(smoke-test): make tags&terms smoke test wait for ingestion to complete by @gabe-lyons in #3812
- Revert "fix(glossary): owner should be optional as per docs (#3858)" by @anshbansal in #3910
- fix(ingest): operational stats - check if optional fields are present by @aditya-radhakrishnan in #3911
- fix(typo): fix typo in docs by @anshbansal in #3908
- refactor(gql/ui): Misc refactorings by @jjoyce0510 in #3921
- feat(config): make check for frontend instead of gms more robust by @anshbansal in #3919
- feat(spark-lineage): simplified jars, config, auto publish to maven by @swaroopjagadish in #3924
- Bugfix/telemetry soft fail by @RyanHolstien in #3934
- fix(log): fix log levels and formats by @anshbansal in #3943
- docs(metadata-ingestion): fix command for running fast unit tests by @anshbansal in #3942
- fix(ui): update login title css to fit on one line by @aditya-radhakrishnan in #3922
- fix(docs): Clarify available no-code rendering formats in DataQualityRules.pdl by @gabe-lyons in #3912
- docs(links): add links to some recent case studies and blog posts by @anshbansal in #3941
- fix(docs): fix openapi docs by @anshbansal in #3940
- Adding Snappy Lib and JKS File by @arunvasudevan in #3898
- Feature/Issue resolved- Improve table stats readability in UI by @ShubhamThakre in #3889
- refactor(ui): Allow DocumentationTab to optionally use updateDescription mutation by @jjoyce0510 in #3935
- (docs)add moloco logo by @maggiehays in #3945
- refactor(bootstrap data): Add usage and profiles to bootstrap_mce.json by @jjoyce0510 in #3947
- docs(metadata): update relationship query in docs by @gabe-lyons in #3951
- fix(ingestion): Snowflake Usage should continue to emit usage workunits with include_operational_stats enabled. by @rslanka in #3949
- feat(ingestion): Add support for extracting S3->Snowflake and S3->Glue lineages. by @rslanka in #3946
- fix(graphQL): Fixing set ordering in batchGet of entities by @jjoyce0510 in #3950
- feat(elastic-search): changing default bulk index request batch to 1000 by @swaroopjagadish in #3957
- docs (metadata modeling): Fix broken links and doc fixes by @arunvasudevan in #3954
New Contributors
- @JIWEI0 made their first contribution in #3901
- @senni0418 made their first contribution in #3870
- @lhvubtqn made their first contribution in #3902
- @ShubhamThakre made their first contribution in #3889
Full Changelog: v0.8.23...v0.8.24
DataHub v0.8.23
Release Highlights
- Fix critical Dashboard / Charts bug from 0.8.22, where Chart inputs were not being ingested successfully.
- Adding currently deployed version to the UI (under top-right dropdown menu). Also available via the GMS /config endpoint.
- Robustness improvements to DataHub Java Client Package
- Introducing a new Elasticsearch ingestion connector!
- Misc bug fixes & improvements.
What's Changed
- build: include correct version in metadata-ingestion docker image by @hsheth2 in #3857
- fix(metabase): fix crashes on missing values by @iasoon in #3859
- fix(datahub-client): fix shadow jar build, correct spark-lineage url … by @swaroopjagadish in #3871
- feat(git-version): Add version to the UI and config endpoint by @dexter-mh-lee in #3866
- fix(build): fix shadow jar checker to allow new git.properties by @swaroopjagadish in #3875
- feat(metadata-ingestion): Make datahub-rest client more robust by configurable retries. (#3826) by @RickardCardell in #3860
- fix(github-workflow): Remove duplicate context in kafka setup workflow by @dexter-mh-lee in #3876
- docs(azure-ad): correct default value for username attr by @iasoon in #3861
- docs: fix endpoint URL by @anshbansal in #3852
- fix(cli): disable telemetry in CLI tests by @kevinhu in #3877
- feat(metabase): allow configuring how database engines get mapped to platforms by @iasoon in #3869
- doc(graphql): add some examples by @anshbansal in #3867
- fix(search): Fix issue with filters and autocomplete by @dexter-mh-lee in #3868
- fix(build): remove jcenter from gradle build by @aditya-radhakrishnan in #3882
- (docs)Roadmap, Townhall, & Feature Request link updates by @maggiehays in #3873
- doc(kafka): add permissions required for confluent cloud by @anshbansal in #3850
- feat(ingest): ingestion-specific telemetry by @kevinhu in #3881
- Add AWS MSK Iam Auth Jar to GMS by @arunvasudevan in #3872
- docs(ingestion) azure: specify required permission type by @iasoon in #3886
- feat(ingestion) dbt: support spark sql types by @iasoon in #3880
- update dependency for bigquery. by @varunbharill in #3874
- fix(field-extraction): Fix extraction for unions by @dexter-mh-lee in #3892
- fix(ingest): sqlparser - Not lowercasing looker source's special table name by @treff7es in #3891
- feat(ingest): Support for spectrum external array types by @treff7es in #3890
- feat(Ingestion): Add Elasticsearch Source by @rslanka in #3893
Full Changelog: v0.8.22...v0.8.23