Skip to content

Commit

Permalink
Merge branch 'master' into feat(ingestion/neo4j)
Browse files Browse the repository at this point in the history
  • Loading branch information
k-bartlett authored Oct 10, 2024
2 parents 53c2463 + b5e0833 commit bbccdca
Show file tree
Hide file tree
Showing 79 changed files with 2,786 additions and 119 deletions.
3 changes: 3 additions & 0 deletions .github/workflows/airflow-plugin.yml
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,9 @@ jobs:
- python-version: "3.11"
extra_pip_requirements: "apache-airflow~=2.9.3 -c https://raw.githubusercontent.com/apache/airflow/constraints-2.9.3/constraints-3.11.txt"
extra_pip_extras: plugin-v2
- python-version: "3.11"
extra_pip_requirements: "apache-airflow~=2.10.2 -c https://raw.githubusercontent.com/apache/airflow/constraints-2.10.2/constraints-3.11.txt"
extra_pip_extras: plugin-v2
fail-fast: false
steps:
- name: Set up JDK 17
Expand Down
99 changes: 99 additions & 0 deletions .github/workflows/docker-unified.yml
Original file line number Diff line number Diff line change
Expand Up @@ -480,6 +480,39 @@ jobs:
context: .
file: ./docker/kafka-setup/Dockerfile
platforms: linux/amd64,linux/arm64/v8
kafka_setup_scan:
permissions:
contents: read # for actions/checkout to fetch code
security-events: write # for github/codeql-action/upload-sarif to upload SARIF results
actions: read # only required for a private repository by github/codeql-action/upload-sarif to get the Action run status
name: "[Monitoring] Scan Kafka Setup images for vulnerabilities"
runs-on: ubuntu-latest
needs: [ setup, kafka_setup_build ]
if: ${{ needs.setup.outputs.kafka_setup_change == 'true' || (needs.setup.outputs.publish == 'true' || needs.setup.outputs.pr-publish == 'true') }}
steps:
- name: Checkout # adding checkout step just to make trivy upload happy
uses: acryldata/sane-checkout-action@v3
- name: Download image
uses: ishworkh/docker-image-artifact-download@v1
if: ${{ needs.setup.outputs.publish != 'true' && needs.setup.outputs.pr-publish != 'true' }}
with:
image: ${{ env.DATAHUB_KAFKA_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.25.0
env:
TRIVY_OFFLINE_SCAN: true
with:
image-ref: ${{ env.DATAHUB_KAFKA_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
format: "template"
template: "@/contrib/sarif.tpl"
output: "trivy-results.sarif"
severity: "CRITICAL,HIGH"
ignore-unfixed: true
vuln-type: "os,library"
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: "trivy-results.sarif"

mysql_setup_build:
name: Build and Push DataHub MySQL Setup Docker Image
Expand All @@ -501,6 +534,39 @@ jobs:
context: .
file: ./docker/mysql-setup/Dockerfile
platforms: linux/amd64,linux/arm64/v8
mysql_setup_scan:
permissions:
contents: read # for actions/checkout to fetch code
security-events: write # for github/codeql-action/upload-sarif to upload SARIF results
actions: read # only required for a private repository by github/codeql-action/upload-sarif to get the Action run status
name: "[Monitoring] Scan MySQL Setup images for vulnerabilities"
runs-on: ubuntu-latest
needs: [ setup, mysql_setup_build ]
if: ${{ needs.setup.outputs.mysql_setup_change == 'true' || (needs.setup.outputs.publish == 'true' || needs.setup.outputs.pr-publish == 'true') }}
steps:
- name: Checkout # adding checkout step just to make trivy upload happy
uses: acryldata/sane-checkout-action@v3
- name: Download image
uses: ishworkh/docker-image-artifact-download@v1
if: ${{ needs.setup.outputs.publish != 'true' && needs.setup.outputs.pr-publish != 'true' }}
with:
image: ${{ env.DATAHUB_MYSQL_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.25.0
env:
TRIVY_OFFLINE_SCAN: true
with:
image-ref: ${{ env.DATAHUB_MYSQL_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
format: "template"
template: "@/contrib/sarif.tpl"
output: "trivy-results.sarif"
severity: "CRITICAL,HIGH"
ignore-unfixed: true
vuln-type: "os,library"
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: "trivy-results.sarif"

elasticsearch_setup_build:
name: Build and Push DataHub Elasticsearch Setup Docker Image
Expand All @@ -522,6 +588,39 @@ jobs:
context: .
file: ./docker/elasticsearch-setup/Dockerfile
platforms: linux/amd64,linux/arm64/v8
elasticsearch_setup_scan:
permissions:
contents: read # for actions/checkout to fetch code
security-events: write # for github/codeql-action/upload-sarif to upload SARIF results
actions: read # only required for a private repository by github/codeql-action/upload-sarif to get the Action run status
name: "[Monitoring] Scan ElasticSearch setup images for vulnerabilities"
runs-on: ubuntu-latest
needs: [ setup, elasticsearch_setup_build ]
if: ${{ needs.setup.outputs.elasticsearch_setup_change == 'true' || (needs.setup.outputs.publish == 'true' || needs.setup.outputs.pr-publish == 'true' ) }}
steps:
- name: Checkout # adding checkout step just to make trivy upload happy
uses: acryldata/sane-checkout-action@v3
- name: Download image
uses: ishworkh/docker-image-artifact-download@v1
if: ${{ needs.setup.outputs.publish != 'true' && needs.setup.outputs.pr-publish != 'true' }}
with:
image: ${{ env.DATAHUB_ELASTIC_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
- name: Run Trivy vulnerability scanner
uses: aquasecurity/trivy-action@0.25.0
env:
TRIVY_OFFLINE_SCAN: true
with:
image-ref: ${{ env.DATAHUB_ELASTIC_SETUP_IMAGE }}:${{ needs.setup.outputs.unique_tag }}
format: "template"
template: "@/contrib/sarif.tpl"
output: "trivy-results.sarif"
severity: "CRITICAL,HIGH"
ignore-unfixed: true
vuln-type: "os,library"
- name: Upload Trivy scan results to GitHub Security tab
uses: github/codeql-action/upload-sarif@v2
with:
sarif_file: "trivy-results.sarif"

datahub_ingestion_base_build:
name: Build and Push DataHub Ingestion (Base) Docker Image
Expand Down
4 changes: 2 additions & 2 deletions build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -107,8 +107,8 @@ project.ext.externalDependency = [
'antlr4Runtime': 'org.antlr:antlr4-runtime:4.9.3',
'antlr4': 'org.antlr:antlr4:4.9.3',
'assertJ': 'org.assertj:assertj-core:3.11.1',
'avro': 'org.apache.avro:avro:1.11.3',
'avroCompiler': 'org.apache.avro:avro-compiler:1.11.3',
'avro': 'org.apache.avro:avro:1.11.4',
'avroCompiler': 'org.apache.avro:avro-compiler:1.11.4',
'awsGlueSchemaRegistrySerde': 'software.amazon.glue:schema-registry-serde:1.1.17',
'awsMskIamAuth': 'software.amazon.msk:aws-msk-iam-auth:2.0.3',
'awsS3': 'software.amazon.awssdk:s3:2.26.21',
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -323,7 +323,13 @@ export const EntityProfile = <T, U>({
{showBrowseBar && <EntityProfileNavBar urn={urn} entityType={entityType} />}
{entityData?.status?.removed === true && (
<Alert
message="This entity is not discoverable via search or lineage graph. Contact your DataHub admin for more information."
message={
<>
This entity is marked as soft-deleted, likely due to stateful ingestion or a manual
deletion command, and will not appear in search or lineage graphs. Contact your DataHub
admin for more information.
</>
}
banner
/>
)}
Expand Down
1 change: 0 additions & 1 deletion docker/airflow/docker-compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
#
# Feel free to modify this file to suit your needs.
---
version: '3'
x-airflow-common:
&airflow-common
image: ${AIRFLOW_IMAGE_NAME:-acryldata/airflow-datahub:latest}
Expand Down
1 change: 0 additions & 1 deletion docker/cassandra/docker-compose.cassandra.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Override to use Cassandra as a backing store for datahub-gms.
---
version: '3.8'
services:
cassandra:
hostname: cassandra
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose-with-cassandra.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

# NOTE: This file does not build! No dockerfiles are set. See the README.md in this directory.
---
version: '3.9'
services:
datahub-frontend-react:
hostname: datahub-frontend-react
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose-without-neo4j.override.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
version: '3.9'
services:
datahub-gms:
env_file: datahub-gms/env/docker-without-neo4j.env
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose-without-neo4j.postgres.override.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Override to use PostgreSQL as a backing store for datahub-gms.
---
version: '3.9'
services:
datahub-gms:
env_file:
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose-without-neo4j.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

# NOTE: This file will cannot build! No dockerfiles are set. See the README.md in this directory.
---
version: '3.9'
services:
datahub-frontend-react:
hostname: datahub-frontend-react
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.consumers-without-neo4j.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Service definitions for standalone Kafka consumer containers.
version: '3.9'
services:
datahub-gms:
environment:
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.consumers.dev.yml
Original file line number Diff line number Diff line change
@@ -1,4 +1,3 @@
version: '3.9'
services:
datahub-mae-consumer:
image: acryldata/datahub-mae-consumer:debug
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.consumers.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
# Service definitions for standalone Kafka consumer containers.
version: '3.9'
services:
datahub-gms:
environment:
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.dev.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@
# To make a JVM app debuggable via IntelliJ, go to its env file and add JVM debug flags, and then add the JVM debug
# port to this file.
---
version: '3.9'
services:
datahub-frontend-react:
image: acryldata/datahub-frontend-react:head
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.kafka-setup.yml
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# Empty docker compose for kafka-setup as we have moved kafka-setup back into the main compose
version: '3.9'
services:
1 change: 0 additions & 1 deletion docker/docker-compose.override.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Default override to use MySQL as a backing store for datahub-gms (same as docker-compose.mysql.yml).
---
version: '3.9'
services:
datahub-gms:
env_file: datahub-gms/env/docker.env
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.tools.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Tools useful for operating & debugging DataHub.
---
version: '3.8'
services:
kafka-rest-proxy:
image: confluentinc/cp-kafka-rest:7.4.0
Expand Down
1 change: 0 additions & 1 deletion docker/docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@

# NOTE: This file does not build! No dockerfiles are set. See the README.md in this directory.
---
version: '3.9'
services:
datahub-frontend-react:
hostname: datahub-frontend-react
Expand Down
1 change: 0 additions & 1 deletion docker/ingestion/docker-compose.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
version: '3.5'
services:
ingestion:
build:
Expand Down
1 change: 0 additions & 1 deletion docker/mariadb/docker-compose.mariadb.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Override to use MariaDB as a backing store for datahub-gms.
---
version: '3.8'
services:
mariadb:
hostname: mariadb
Expand Down
1 change: 0 additions & 1 deletion docker/monitoring/docker-compose.consumers.monitoring.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
version: '3.8'
services:
datahub-mae-consumer:
environment:
Expand Down
1 change: 0 additions & 1 deletion docker/monitoring/docker-compose.monitoring.yml
Original file line number Diff line number Diff line change
@@ -1,5 +1,4 @@
---
version: '3.9'
services:
datahub-frontend-react:
environment:
Expand Down
1 change: 0 additions & 1 deletion docker/mysql/docker-compose.mysql.yml
Original file line number Diff line number Diff line change
@@ -1,6 +1,5 @@
# Override to use MySQL as a backing store for datahub-gms.
---
version: '3.8'
services:
mysql:
hostname: mysql
Expand Down
1 change: 0 additions & 1 deletion docker/quickstart/docker-compose-m1.quickstart.yml
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,6 @@ services:
volumes:
- zkdata:/var/lib/zookeeper/data
- zklogs:/var/lib/zookeeper/log
version: '3.9'
volumes:
broker: null
esdata: null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,6 @@ services:
volumes:
- zkdata:/var/lib/zookeeper/data
- zklogs:/var/lib/zookeeper/log
version: '3.9'
volumes:
broker: null
esdata: null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,6 @@ services:
volumes:
- zkdata:/var/lib/zookeeper/data
- zklogs:/var/lib/zookeeper/log
version: '3.9'
volumes:
broker: null
esdata: null
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -55,4 +55,3 @@ services:
image: ${DATAHUB_MCE_CONSUMER_IMAGE:-acryldata/datahub-mce-consumer}:${DATAHUB_VERSION:-head}
ports:
- 9090:9090
version: '3.9'
1 change: 0 additions & 1 deletion docker/quickstart/docker-compose.consumers.quickstart.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,3 @@ services:
image: ${DATAHUB_MCE_CONSUMER_IMAGE:-acryldata/datahub-mce-consumer}:${DATAHUB_VERSION:-head}
ports:
- 9090:9090
version: '3.9'
Original file line number Diff line number Diff line change
@@ -1,2 +1 @@
services: {}
version: '3.9'
1 change: 0 additions & 1 deletion docker/quickstart/docker-compose.monitoring.quickstart.yml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,5 @@ services:
- 9089:9090
volumes:
- ../monitoring/prometheus.yaml:/etc/prometheus/prometheus.yml
version: '3.9'
volumes:
grafana-storage: null
1 change: 0 additions & 1 deletion docker/quickstart/docker-compose.quickstart.yml
Original file line number Diff line number Diff line change
Expand Up @@ -291,7 +291,6 @@ services:
volumes:
- zkdata:/var/lib/zookeeper/data
- zklogs:/var/lib/zookeeper/log
version: '3.9'
volumes:
broker: null
esdata: null
Expand Down
5 changes: 0 additions & 5 deletions docker/quickstart/generate_docker_quickstart.py
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,6 @@ def modify_docker_config(base_path, docker_yaml_config):
elif volumes[i].startswith("./"):
volumes[i] = "." + volumes[i]

# 10. Set docker compose version to 3.
# We need at least this version, since we use features like start_period for
# healthchecks (with services dependencies based on them) and shell-like variable interpolation.
docker_yaml_config["version"] = "3.9"


def dedup_env_vars(merged_docker_config):
for service in merged_docker_config["services"]:
Expand Down
14 changes: 14 additions & 0 deletions docs/how/search.md
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,20 @@ If you want to:
- ```/q customProperties: encoding*``` [Sample results](https://demo.datahubproject.io/search?page=1&query=%2Fq%20customProperties%3A%20encoding%2A)
- Dataset Properties are indexed in ElasticSearch the manner of key=value. Hence if you know the precise key-value pair, you can search using ```"key=value"```. However, if you only know the key, you can use wildcards to replace the value and that is what is being done here.

- Find an entity with an **unversioned** structured property
- ```/q structuredProperties.io_acryl_privacy_retentionTime01:60```
- This will return results for an **unversioned** structured property's qualified name `io.acryl.private.retentionTime01` and value `60`.
- ```/q _exists_:structuredProperties.io_acryl_privacy_retentionTime01```
- In this example, the query will return any entity which has any value for the **unversioned** structured property with qualified name `io.acryl.private.retentionTime01`.

- Find an entity with a **versioned** structured property
- ```/q structuredProperties._versioned.io_acryl_privacy_retentionTime.20240614080000.number:365```
- This query will return results for a **versioned** structured property with qualified name `io.acryl.privacy.retentionTime`, version `20240614080000`, type `number` and value `365`.
- ```/q _exists_:structuredProperties._versioned.io_acryl_privacy_retentionTime.20240614080000.number```
- Returns results for a **versioned** structured property with qualified name `io.acryl.privacy.retentionTime`, version `20240614080000` and type `number`.
- ```/q structuredProperties._versioned.io_acryl_privacy_retentionTime.\*.\*:365```
- Returns results for a **versioned** structured property with any version and type with a values of `365`

- Find a dataset with a column name, **latitude**
- ```/q fieldPaths: latitude``` [Sample results](https://demo.datahubproject.io/search?page=1&query=%2Fq%20fieldPaths%3A%20latitude)
- fieldPaths is the name of the attribute that holds the column name in Datasets.
Expand Down
1 change: 1 addition & 0 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
| capture_tags_info | true | If true, the tags field of the DAG will be captured as DataHub tags. |
| capture_executions | true | If true, we'll capture task runs in DataHub in addition to DAG definitions. |
| materialize_iolets | true | Create or un-soft-delete all entities referenced in lineage. |
| render_templates | true | If true, jinja-templated fields will be automatically rendered to improve the accuracy of SQL statement extraction. |
| datajob_url_link | taskinstance | If taskinstance, the datajob url will be taskinstance link on airflow. It can also be grid. |
| |
| graceful_exceptions | true | If set to true, most runtime errors in the lineage backend will be suppressed and will not cause the overall task to fail. Note that configuration issues will still throw exceptions. |
Expand Down
2 changes: 1 addition & 1 deletion metadata-ingestion-modules/airflow-plugin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ def get_long_description():
# We remain restrictive on the versions allowed here to prevent
# us from being broken by backwards-incompatible changes in the
# underlying package.
"openlineage-airflow>=1.2.0,<=1.18.0",
"openlineage-airflow>=1.2.0,<=1.22.0",
},
}

Expand Down
Loading

0 comments on commit bbccdca

Please sign in to comment.