DataHub v0.8.24
Release Highlights
- Adding support for nested Glue schemas
- Adding Data Lake Files ingestion source to support data profiling for local files and files stored in AWS S3; supported file types are CSV, TSV, Parquet, and JSON
- Improvements to readability in UI to format large numbers, including: adding thousands separators & rounding large numbers to millions with raw value available via tooltip
- Miscellaneous bug fixes & improvements
What's Changed
- fix(workflow) docker-ingestion is failing bc of an invalid sed command by @dexter-mh-lee in #3896
- refactor(graphql): Migrating Datasets, Charts, Dashboards, Jobs, Flows to Entity V2 endpoint by @jjoyce0510 in #3897
- fix(ingest): populate system metadata for all metadata events (mcp, mcpw) by @swaroopjagadish in #3900
- perf: add/change scripts for tests by @anshbansal in #3840
- fix(glossary): owner should be optional as per docs by @anshbansal in #3858
- feat(ingestion): Support for nested glue schemas by @rslanka in #3895
- docs: change roadmap link by @jeffmerrick in #3904
- feat(kafka): support confluent references by @anshbansal in #3862
- docs (elasticsearch): config error by @JIWEI0 in #3901
- feat(ingestion): Data lake profiling by @kevinhu in #3656
- refactor(search): refactor NUM_RETRIES in esindexbuilder to be configurable by @senni0418 in #3870
- fix(ingest): nifi - replace hardcode password with config variable by @lhvubtqn in #3902
- feat(authentication): propagate expired token exceptions to end user by @gabe-lyons in #3894
- fix(docs): update data lake docs with path_spec details by @kevinhu in #3905
- ci(smoke-test): make tags&terms smoke test wait for ingestion to complete by @gabe-lyons in #3812
- Revert "fix(glossary): owner should be optional as per docs (#3858)" by @anshbansal in #3910
- fix(ingest): operational stats - check if optional fields are present by @aditya-radhakrishnan in #3911
- fix(typo): fix typo in docs by @anshbansal in #3908
- refactor(gql/ui): Misc refactorings by @jjoyce0510 in #3921
- feat(config): make check for frontend instead of gms more robust by @anshbansal in #3919
- feat(spark-lineage): simplified jars, config, auto publish to maven by @swaroopjagadish in #3924
- Bugfix/telemetry soft fail by @RyanHolstien in #3934
- fix(log): fix log levels and formats by @anshbansal in #3943
- docs(metadata-ingestion): fix command for running fast unit tests by @anshbansal in #3942
- fix(ui): update login title css to fit on one line by @aditya-radhakrishnan in #3922
- fix(docs): Clarify available no-code rendering formats in DataQualityRules.pdl by @gabe-lyons in #3912
- docs(links): add links to some recent case studies and blog posts by @anshbansal in #3941
- fix(docs): fix openapi docs by @anshbansal in #3940
- Adding Snappy Lib and JKS File by @arunvasudevan in #3898
- Feature/Issue resolved- Improve table stats readability in UI by @ShubhamThakre in #3889
- refactor(ui): Allow DocumentationTab to optionally use updateDescription mutation by @jjoyce0510 in #3935
- (docs)add moloco logo by @maggiehays in #3945
- refactor(bootstrap data): Add usage and profiles to bootstrap_mce.json by @jjoyce0510 in #3947
- docs(metadata): update relationship query in docs by @gabe-lyons in #3951
- fix(ingestion): Snowflake Usage should continue to emit usage workunits with include_operational_stats enabled. by @rslanka in #3949
- feat(ingestion): Add support for extracting S3->Snowflake and S3->Glue lineages. by @rslanka in #3946
- fix(graphQL): Fixing set ordering in batchGet of entities by @jjoyce0510 in #3950
- feat(elastic-search): changing default bulk index request batch to 1000 by @swaroopjagadish in #3957
- docs (metadata modeling): Fix broken links and doc fixes by @arunvasudevan in #3954
New Contributors
- @JIWEI0 made their first contribution in #3901
- @senni0418 made their first contribution in #3870
- @lhvubtqn made their first contribution in #3902
- @ShubhamThakre made their first contribution in #3889
Full Changelog: v0.8.23...v0.8.24