Skip to content

Commit

Permalink
Merge branch 'datahub-project:master' into master
Browse files Browse the repository at this point in the history
  • Loading branch information
anshbansal authored May 3, 2024
2 parents a1e9c07 + c00ddb2 commit 9508c47
Show file tree
Hide file tree
Showing 111 changed files with 4,712 additions and 2,702 deletions.
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ HOSTED_DOCS_ONLY-->
</p>
<!-- -->

# DataHub: The Metadata Platform for the Modern Data Stack
# DataHub: The Data Discovery Platform for the Modern Data Stack
## Built with ❤️ by <img src="https://datahubproject.io/img/acryl-logo-light-mark.png" width="25"/> [Acryl Data](https://acryldata.io) and <img src="https://datahubproject.io/img/LI-In-Bug.png" width="25"/> [LinkedIn](https://engineering.linkedin.com)
[![Version](https://img.shields.io/github/v/release/datahub-project/datahub?include_prereleases)](https://github.com/datahub-project/datahub/releases/latest)
[![PyPI version](https://badge.fury.io/py/acryl-datahub.svg)](https://badge.fury.io/py/acryl-datahub)
[![build & test](https://github.com/datahub-project/datahub/workflows/build%20&%20test/badge.svg?branch=master&event=push)](https://github.com/datahub-project/datahub/actions?query=workflow%3A%22build+%26+test%22+branch%3Amaster+event%3Apush)
[![Docker Pulls](https://img.shields.io/docker/pulls/acryldata/datahub-gms.svg)](https://hub.docker.com/r/acryldata/datahub-gms)
[![Slack](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://slack.datahubproject.io)
[![Slack](https://img.shields.io/badge/slack-join_chat-white.svg?logo=slack&style=social)](https://datahubproject.io/slack?utm_source=docs&utm_medium=docs&utm_campaign=docs_page_link)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](https://github.com/datahub-project/datahub/blob/master/docs/CONTRIBUTING.md)
[![GitHub commit activity](https://img.shields.io/github/commit-activity/m/datahub-project/datahub)](https://github.com/datahub-project/datahub/pulls?q=is%3Apr)
[![License](https://img.shields.io/github/license/datahub-project/datahub)](https://github.com/datahub-project/datahub/blob/master/LICENSE)
Expand Down Expand Up @@ -61,7 +61,7 @@ HOSTED_DOCS_ONLY-->
## Introduction

DataHub is an open-source metadata platform for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our
DataHub is an open-source data catalog for the modern data stack. Read about the architectures of different metadata systems and why DataHub excels [here](https://engineering.linkedin.com/blog/2020/datahub-popular-metadata-architectures-explained). Also read our
[LinkedIn Engineering blog post](https://engineering.linkedin.com/blog/2019/data-hub), check out our [Strata presentation](https://speakerdeck.com/shirshanka/the-evolution-of-metadata-linkedins-journey-strata-nyc-2019) and watch our [Crunch Conference Talk](https://www.youtube.com/watch?v=OB-O0Y6OYDE). You should also visit [DataHub Architecture](docs/architecture/architecture.md) to get a better understanding of how DataHub is implemented.

## Features & Roadmap
Expand Down Expand Up @@ -106,7 +106,7 @@ We welcome contributions from the community. Please refer to our [Contributing G

## Community

Join our [Slack workspace](https://slack.datahubproject.io) for discussions and important announcements. You can also find out more about our upcoming [town hall meetings](docs/townhalls.md) and view past recordings.
Join our [Slack workspace](https://datahubproject.io/slack?utm_source=docs&utm_medium=docs&utm_campaign=docs_page_link) for discussions and important announcements. You can also find out more about our upcoming [town hall meetings](docs/townhalls.md) and view past recordings.

## Adoption

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@
import com.linkedin.view.DataHubViewInfo;
import graphql.schema.DataFetcher;
import graphql.schema.DataFetchingEnvironment;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.CompletableFuture;
import java.util.stream.Collectors;
Expand Down Expand Up @@ -68,15 +69,21 @@ public CompletableFuture<AggregateResults> get(DataFetchingEnvironment environme
final List<String> facets =
input.getFacets() != null && input.getFacets().size() > 0 ? input.getFacets() : null;

List<String> finalEntities =
maybeResolvedView != null
? SearchUtils.intersectEntityTypes(
entityNames, maybeResolvedView.getDefinition().getEntityTypes())
: entityNames;
if (finalEntities.size() == 0) {
return createEmptyAggregateResults();
}

try {
return mapAggregateResults(
context,
_entityClient.searchAcrossEntities(
context.getOperationContext().withSearchFlags(flags -> searchFlags),
maybeResolvedView != null
? SearchUtils.intersectEntityTypes(
entityNames, maybeResolvedView.getDefinition().getEntityTypes())
: entityNames,
finalEntities,
sanitizedQuery,
maybeResolvedView != null
? SearchUtils.combineFilters(
Expand Down Expand Up @@ -112,4 +119,10 @@ static AggregateResults mapAggregateResults(

return results;
}

AggregateResults createEmptyAggregateResults() {
final AggregateResults result = new AggregateResults();
result.setFacets(new ArrayList<>());
return result;
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -75,14 +75,20 @@ public CompletableFuture<SearchResults> get(DataFetchingEnvironment environment)
start,
count);

List<String> finalEntities =
maybeResolvedView != null
? SearchUtils.intersectEntityTypes(
entityNames, maybeResolvedView.getDefinition().getEntityTypes())
: entityNames;
if (finalEntities.size() == 0) {
return SearchUtils.createEmptySearchResults(start, count);
}

return UrnSearchResultsMapper.map(
context,
_entityClient.searchAcrossEntities(
context.getOperationContext().withSearchFlags(flags -> searchFlags),
maybeResolvedView != null
? SearchUtils.intersectEntityTypes(
entityNames, maybeResolvedView.getDefinition().getEntityTypes())
: entityNames,
finalEntities,
sanitizedQuery,
maybeResolvedView != null
? SearchUtils.combineFilters(
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
import com.linkedin.datahub.graphql.QueryContext;
import com.linkedin.datahub.graphql.generated.EntityType;
import com.linkedin.datahub.graphql.generated.FacetFilterInput;
import com.linkedin.datahub.graphql.generated.SearchResults;
import com.linkedin.datahub.graphql.types.common.mappers.SearchFlagsInputMapper;
import com.linkedin.datahub.graphql.types.entitytype.EntityTypeMapper;
import com.linkedin.metadata.query.SearchFlags;
Expand Down Expand Up @@ -314,4 +315,15 @@ public static List<String> getEntityNames(List<EntityType> inputTypes) {
(inputTypes == null || inputTypes.isEmpty()) ? SEARCHABLE_ENTITY_TYPES : inputTypes;
return entityTypes.stream().map(EntityTypeMapper::getName).collect(Collectors.toList());
}

public static SearchResults createEmptySearchResults(final int start, final int count) {
final SearchResults result = new SearchResults();
result.setStart(start);
result.setCount(count);
result.setTotal(0);
result.setSearchResults(new ArrayList<>());
result.setSuggestions(new ArrayList<>());
result.setFacets(new ArrayList<>());
return result;
}
}
4 changes: 4 additions & 0 deletions docs-website/docusaurus.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -320,6 +320,10 @@ module.exports = {
if (existingPath.includes('/docs')) {
return [
existingPath.replace('/docs', '/docs/next'),
existingPath.replace('/docs', '/docs/0.13.0'),
existingPath.replace('/docs', '/docs/0.12.1'),
existingPath.replace('/docs', '/docs/0.11.0'),
existingPath.replace('/docs', '/docs/0.10.5'),
];
}
return undefined; // Return a falsy value: no redirect created
Expand Down
17 changes: 17 additions & 0 deletions docs-website/src/components/SlackUtm/index.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
import React, { useState, useMemo } from "react";
import styles from "./styles.module.scss";
import { LikeOutlined, DislikeOutlined, CheckCircleOutlined } from "@ant-design/icons";
import { v4 as uuidv4 } from "uuid";

const SlackUtm = ({ pageId }) => {
return (
<div className={styles.slackUtm}>
<div className={styles.slackUtm}>
<hr />
Need more help? Join the conversation in <a href={`https://datahubproject.io/slack?utm_source=docs&utm_medium=footer&utm_campaign=docs_footer&utm_content=${pageId}`}>Slack!</a>
</div>
</div>
);
};

export default SlackUtm;
3 changes: 3 additions & 0 deletions docs-website/src/components/SlackUtm/styles.module.scss
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
.slackUtm {
padding: 0.5rem 0rem;
}
4 changes: 2 additions & 2 deletions docs-website/src/pages/_components/CardCTAs/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ const cardsContent = [
},
{
label: "Data Contracts",
title: "End-to-end Reliability in Data",
title: "Data Contracts: End-to-end Reliability in Data",
url: "https://www.acryldata.io/blog/data-contracts-in-datahub-combining-verifiability-with-holistic-data-management?utm_source=datahub&utm_medium=referral&utm_content=blog",
},
{
label: "Shift Left",
title: "Developer-friendly Data Governance",
title: "Data Governance and Lineage Impact Analysis",
url: "https://www.acryldata.io/blog/the-3-must-haves-of-metadata-management-part-2?utm_source=datahub&utm_medium=referral&utm_content=blog",
},
];
Expand Down
2 changes: 1 addition & 1 deletion docs-website/src/pages/_components/Hero/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ const Hero = ({}) => {
<div>
<h1 className="hero__title">The #1 Open Source Metadata Platform</h1>
<p className="hero__subtitle">
DataHub is an extensible metadata platform that enables data discovery, data observability and federated governance to help tame the
DataHub is an extensible data catalog that enables data discovery, data observability and federated governance to help tame the
complexity of your data ecosystem.
</p>
<p className="hero__subtitle">
Expand Down
13 changes: 7 additions & 6 deletions docs-website/src/pages/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ function Home() {
return !siteConfig.customFields.isSaas ? (
<Layout
title={siteConfig.tagline}
description="DataHub is a data discovery application built on an extensible metadata platform that helps you tame the complexity of diverse data ecosystems."
description="DataHub is a data discovery application built on an extensible data catalog that helps you tame the complexity of diverse data ecosystems."
>
<Hero />
<Features />
Expand Down Expand Up @@ -70,9 +70,10 @@ function Home() {
</h1>
{/* <hr style={{ border: "2px solid black", width: "20rem" }}></hr> */}
<p style={{ fontSize: "18px" }}>
Explore DataHub's journey from search and discovery tool at
LinkedIn to the #1 open source metadata platform, through the
lens of its founder and some amazing community members.
Explore DataHub's journey from search and data discovery tool at
LinkedIn to the #1 open source metadata management platform,
through the lens of its founder and some amazing community
members.
</p>
</div>
</div>
Expand Down Expand Up @@ -143,8 +144,8 @@ function Home() {
</h2>
<p>
DataHub is the one-stop shop for documentation, schemas,
ownership, lineage, pipelines, data quality, usage information,
and more.
ownership, data lineage, pipelines, data quality, usage
information, and more.
</p>
</div>
<div className="col col--6 col--offset-1">
Expand Down
2 changes: 2 additions & 0 deletions docs-website/src/theme/DocItem/Footer/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ import EditThisPage from "@theme/EditThisPage";
import TagsListInline from "@theme/TagsListInline";
import styles from "./styles.module.css";
import Feedback from "../../../components/Feedback";
import SlackUtm from "../../../components/SlackUtm";

function TagsRow(props) {
return (
Expand Down Expand Up @@ -42,6 +43,7 @@ export default function DocItemFooter() {
return (
<>
<footer className={clsx(ThemeClassNames.docs.docFooter, "docusaurus-mt-lg")}>
<SlackUtm pageId={unversionedId}/>
{canDisplayTagsRow && <TagsRow tags={tags} />}
{canDisplayEditMetaRow && (
<EditMetaRow
Expand Down
3 changes: 1 addition & 2 deletions docs-website/versions.json
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
[
"0.13.1",
"0.13.0"
"0.13.1"
]
2 changes: 1 addition & 1 deletion docs/_feature-guide-template.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Response in plain text
-->

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*


### Related Features

Expand Down
4 changes: 2 additions & 2 deletions docs/act-on-metadata/impact-analysis.md
Original file line number Diff line number Diff line change
Expand Up @@ -71,7 +71,7 @@ Follow these simple steps to understand the full dependency chain of your data e
* [searchAcrossLineage](../../graphql/queries.md#searchacrosslineage)
* [searchAcrossLineageInput](../../graphql/inputObjects.md#searchacrosslineageinput)

Looking for an example of how to use `searchAcrossLineage` to read lineage? Look [here](../api/tutorials/lineage.md#read-lineage)
Looking for an example of how to use `searchAcrossLineage` to read data lineage? Look [here](../api/tutorials/lineage.md#read-lineage)

### DataHub Blog

Expand All @@ -88,7 +88,7 @@ This means you have not yet ingested Lineage metadata for that entity. Please se

We currently limit the list of dependencies to 10,000 records; we suggest applying filters to narrow the result set if you hit that limit.

*Need more help? Join the conversation in [Slack](http://slack.datahubproject.io)!*


### Related Features

Expand Down
2 changes: 1 addition & 1 deletion docs/actions/sources/kafka-event-source.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,4 +90,4 @@ messages that are received while the Action is running.

2. Is there a way to asynchronously commit offsets back to Kafka?

Currently, all consumer offset commits are made synchronously for each message received. For now we've optimized for correctness over performance. If this commit policy does not accommodate your organization's needs, certainly reach out on [Slack](https://slack.datahubproject.io/).
Currently, all consumer offset commits are made synchronously for each message received. For now we've optimized for correctness over performance.
4 changes: 2 additions & 2 deletions docs/api/datahub-apis.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ Here's an overview of what each API can do.
| Create a Dataset | 🚫 |[[Guide]](/docs/api/tutorials/datasets.md) ||
| Delete a Dataset (Soft Delete) |[[Guide]](/docs/api/tutorials/datasets.md#delete-dataset) |[[Guide]](/docs/api/tutorials/datasets.md#delete-dataset) ||
| Delete a Dataset (Hard Delete) | 🚫 |[[Guide]](/docs/api/tutorials/datasets.md#delete-dataset) ||
| Search a Dataset | |||
| Search a Dataset |[[Guide]](/docs/how/search.md#graphql) |||
| Read a Dataset Deprecation ||||
| Read Dataset Entities (V2) ||||
| Create a Tag |[[Guide]](/docs/api/tutorials/tags.md#create-tags) |[[Guide]](/docs/api/tutorials/tags.md#create-tags) ||
Expand Down Expand Up @@ -116,4 +116,4 @@ Here's an overview of what each API can do.
| Create Dataset Lineage with MCPW & Rest Emitter | 🚫 |[[Code]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_mcpw_rest.py) ||
| Create Dataset Lineage with Rest Emitter | 🚫 |[[Code]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_emitter_rest.py) ||
| Create DataJob with Dataflow | 🚫 |[[Code]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow.py) [[Simple]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow_new_api_simple.py) [[Verbose]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/lineage_job_dataflow_new_api_verbose.py) ||
| Create Programmatic Pipeline | 🚫 |[[Code]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py) ||
| Create Programmatic Pipeline | 🚫 |[[Code]](https://github.com/datahub-project/datahub/blob/master/metadata-ingestion/examples/library/programatic_pipeline.py) ||
2 changes: 0 additions & 2 deletions docs/api/graphql/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,5 +153,3 @@ With the following error codes officially supported:
| 404 | NOT_FOUND | The resource is not found. |
| 500 | SERVER_ERROR | An internal error has occurred. Check your server logs or contact your DataHub administrator. |

> Visit our [Slack channel](https://slack.datahubproject.io) to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just
> stop by to say 'Hi'.
3 changes: 0 additions & 3 deletions docs/api/graphql/overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,3 @@ that may be performed using the API.

- Available Operations: [Queries](/graphql/queries.md) (Reads) & [Mutations](/graphql/mutations.md) (Writes)
- Schema Types: [Objects](/graphql/objects.md), [Input Objects](/graphql/inputObjects.md), [Interfaces](/graphql/interfaces.md), [Unions](/graphql/unions.md), [Enums](/graphql/enums.md), [Scalars](/graphql/scalars.md)

> Visit our [Slack channel](https://slack.datahubproject.io) to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just
> stop by to say 'Hi'.
3 changes: 0 additions & 3 deletions docs/api/graphql/token-management.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,3 @@ curl --location --request POST 'http://localhost:8080/api/graphql' \
```

This endpoint will return a boolean detailing whether the operation was successful. In case of failure, an error message will appear explaining what went wrong.

> Visit our [Slack channel](https://slack.datahubproject.io) to ask questions, tell us what we can do better, & make requests for what you'd like to see in the future. Or just
> stop by to say 'Hi'.
6 changes: 3 additions & 3 deletions docs/api/tutorials/lineage.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';

# Lineage
# Data Lineage

## Why Would You Use Lineage?

Lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.
Data lineage is used to capture data dependencies within an organization. It allows you to track the inputs from which a data asset is derived, along with the data assets that depend on it downstream.

For more information about lineage, refer to [About DataHub Lineage](/docs/generated/lineage/lineage-feature-guide.md).
For more information about data lineage, refer to [About DataHub Lineage](/docs/generated/lineage/lineage-feature-guide.md).

### Goal Of This Guide

Expand Down
Loading

0 comments on commit 9508c47

Please sign in to comment.