Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ingestion failed to complete, or completed with errors. postgres #11596

Open
pricingblock-project opened this issue Oct 11, 2024 · 4 comments
Open
Labels
bug Bug report

Comments

@pricingblock-project
Copy link

Describe the bug

Ingestion failed to complete, or completed with errors.

To Reproduce
Steps to reproduce the behavior:

  1. Manager data sources-> create new source
  2. Add postgres recipe
  3. config the params
  4. click the finish and run button
  5. See error

Expected behavior
running

Screenshots
If applicable, add screenshots to help explain your problem.
image

Desktop (please complete the following information):

OS: [e.g. ubuntu]
Browser [e.g. chrome]
Version [e.g. 22] chrome version: 120.0.6099.216, docker version: Docker version 24.0.7, build afdd53b, ubuntu version: Ubuntu 20.04.1 LTS

Additional context

  • image version for docker
$ docker ps
CONTAINER ID   IMAGE                                   COMMAND                   CREATED          STATUS                    PORTS                                                           NAMES
825442db2545   acryldata/datahub-actions:head          "/bin/sh -c 'dockeri…"   27 minutes ago   Up 24 minutes                                                                             datahub-datahub-actions-1
0a506f30bae8   acryldata/datahub-frontend-react:head   "/bin/sh -c ./start.…"   27 minutes ago   Up 24 minutes (healthy)   0.0.0.0:9002->9002/tcp, :::9002->9002/tcp                       datahub-datahub-frontend-react-1
082308b01892   acryldata/datahub-gms:head              "/bin/sh -c /datahub…"   27 minutes ago   Up 25 minutes (healthy)   0.0.0.0:8080->8080/tcp, :::8080->8080/tcp                       datahub-datahub-gms-1
dcc0724722b1   confluentinc/cp-schema-registry:7.4.0   "/etc/confluent/dock…"   27 minutes ago   Up 26 minutes (healthy)   0.0.0.0:8081->8081/tcp, :::8081->8081/tcp                       datahub-schema-registry-1
9776819c3c96   confluentinc/cp-kafka:7.4.0             "/etc/confluent/dock…"   27 minutes ago   Up 27 minutes (healthy)   0.0.0.0:9092->9092/tcp, :::9092->9092/tcp                       datahub-broker-1
5f54e55bb366   confluentinc/cp-zookeeper:7.4.0         "/etc/confluent/dock…"   27 minutes ago   Up 27 minutes (healthy)   2888/tcp, 0.0.0.0:2181->2181/tcp, :::2181->2181/tcp, 3888/tcp   datahub-zookeeper-1
4e1de58297e5   elasticsearch:7.10.1                    "/tini -- /usr/local…"   27 minutes ago   Up 27 minutes (healthy)   0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp             datahub-elasticsearch-1
13332a162d11   mysql:8.2                               "docker-entrypoint.s…"   27 minutes ago   Up 27 minutes (healthy)   0.0.0.0:3306->3306/tcp, :::3306->3306/tcp, 33060/tcp            datahub-mysql-1


** logs **

~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': 'c2584159-d0ad-483f-bceb-d2f664ea1fc5',
 'infos': ['2024-10-11 09:46:32.792952 INFO: Starting execution for task with name=RUN_INGEST',
           "2024-10-11 09:46:38.929017 INFO: Failed to execute 'datahub ingest', exit code 1",
           '2024-10-11 09:46:38.929279 INFO: Caught exception EXECUTING task_id=c2584159-d0ad-483f-bceb-d2f664ea1fc5, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 400, in '
           'execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv is already set up
venv setup time = 0 sec
This version of datahub supports report-to functionality
+ exec datahub ingest run -c /tmp/datahub/ingest/c2584159-d0ad-483f-bceb-d2f664ea1fc5/recipe.yml --report-to /tmp/datahub/logs/c2584159-d0ad-483f-bceb-d2f664ea1fc5/artifacts/ingestion_report.json
[2024-10-11 09:46:37,997] INFO     {datahub.cli.ingest_cli:149} - DataHub CLI version: 0.14.0.4
[2024-10-11 09:46:38,007] INFO     {datahub.ingestion.run.pipeline:255} - No sink configured, attempting to use the default datahub-rest sink.
[2024-10-11 09:46:38,032] INFO     {datahub.ingestion.run.pipeline:272} - Sink configured successfully. DataHubRestEmitter: configured to talk to http://datahub-gms:8080
[2024-10-11 09:46:38,424] ERROR    {datahub.entrypoints:218} - Command failed: Failed to find a registered source for type postgres: postgres is disabled; try running: pip install 'acryl-datahub[postgres]'
Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 126, in _ensure_not_lazy
    plugin_class = import_path(path)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 56, in import_path
    item = importlib.import_module(module_name)
  File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 883, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/source/sql/postgres.py", line 6, in <module>
    import psycopg2  # noqa: F401
ModuleNotFoundError: No module named 'psycopg2'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 136, in _add_init_error_context
    yield
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 287, in __init__
    source_class = source_registry.get(self.source_type)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/api/registry.py", line 176, in get
    raise ConfigurationError(
datahub.configuration.common.ConfigurationError: postgres is disabled; try running: pip install 'acryl-datahub[postgres]'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/entrypoints.py", line 205, in main
    sys.exit(datahub(standalone_mode=False, **kwargs))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 462, in wrapper
    raise e
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/telemetry/telemetry.py", line 411, in wrapper
    res = func(*args, **kwargs)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 203, in run
    ret = loop.run_until_complete(run_ingestion_and_check_upgrade())
  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/cli/ingest_cli.py", line 172, in run_ingestion_and_check_upgrade
    pipeline = Pipeline.create(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 417, in create
    return cls(
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 284, in __init__
    with _add_init_error_context(
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/datahub/ingestion/run/pipeline.py", line 140, in _add_init_error_context
    raise PipelineInitError(f"Failed to {step}: {e}") from e
datahub.ingestion.run.pipeline.PipelineInitError: Failed to find a registered source for type postgres: postgres is disabled; try running: pip install 'acryl-datahub[postgres]'

** Recipe**

run_id: 'urn:li:dataHubExecutionRequest:c2584159-d0ad-483f-bceb-d2f664ea1fc5'
source:
  type: postgres
  config:
    include_tables: true
    database: timeseries
    password: '${timeseries_188}'
    profiling:
      enabled: true
      profile_table_level_only: true
    host_port: '192.168.50.188:5432'
    include_views: true
    stateful_ingestion:
      enabled: true
    username: postgres
pipeline_name: 'urn:li:dataHubIngestionSource:89372b09-06a3-482f-8724-b0188479d56b'
@pricingblock-project pricingblock-project added the bug Bug report label Oct 11, 2024
@pricingblock-project
Copy link
Author

pricingblock-project commented Oct 11, 2024

another log

~~~~ Execution Summary - RUN_INGEST ~~~~
Execution finished with errors.
{'exec_id': 'cbfb6c8f-082b-49c5-9235-0487a0b7075d',
 'infos': ['2024-10-11 10:13:43.339194 INFO: Starting execution for task with name=RUN_INGEST',
           "2024-10-11 10:21:58.047478 INFO: Failed to execute 'datahub ingest', exit code 2",
           '2024-10-11 10:21:58.048656 INFO: Caught exception EXECUTING task_id=cbfb6c8f-082b-49c5-9235-0487a0b7075d, name=RUN_INGEST, '
           'stacktrace=Traceback (most recent call last):\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/default_executor.py", line 139, in execute_task\n'
           '    task_event_loop.run_until_complete(task_future)\n'
           '  File "/usr/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete\n'
           '    return future.result()\n'
           '  File "/datahub-ingestion/.venv/lib/python3.10/site-packages/acryl/executor/execution/sub_process_ingestion_task.py", line 400, in '
           'execute\n'
           '    raise TaskError("Failed to execute \'datahub ingest\'")\n'
           "acryl.executor.execution.task.TaskError: Failed to execute 'datahub ingest'\n"],
 'errors': []}

~~~~ Ingestion Logs ~~~~
Obtaining venv creation lock...
Acquired venv creation lock
venv doesn't exist.. minting..
Using Python 3.10.12 interpreter at: /usr/bin/python3
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Resolved 3 packages in 3.87s
Prepared 3 packages in 42.25s
Installed 3 packages in 774ms
 + pip==24.2
 + setuptools==75.1.0
 + wheel==0.44.0
+ uv pip install 'acryl-datahub[datahub-rest,datahub-kafka,postgres]==0.14.1'
Resolved 170 packages in 1m 14s
error: Failed to prepare distributions
  Caused by: Failed to fetch wheel: pandas==2.2.3
  Caused by: Failed to extract archive
  Caused by: Failed to download distribution due to network timeout. Try increasing UV_HTTP_TIMEOUT (current value: 30s).

docker logs -f datahub-datahub-gms-1 |grep postgres

2024-10-11 10:52:15,221 [ForkJoinPool.commonPool-worker-5] INFO  c.l.m.entity.EntityServiceImpl:947 - Ingesting aspects batch to database: AspectsBatchImpl{items=[ChangeMCP{changeType=UPSERT, urn=urn:li:dataHubIngestionSource:ebb4e10e-6006-4790-9998-2ee55ff30a62, aspectName='dataHubIngestionSourceInfo', recordTemplate={name=timeseries_188, schedule={timezone=Asia/Shanghai, interval=0 0 * * *}, type=postgres, config={recipe={"source":{"type":"postgres","config":{"host_port":"192.168.50.188:5432","database":"timeseries","username":"postgres","include_tables":true,"incl..., systemMetadata={lastObserved=1728643935210, version=1, properties={appSource=ui}}}, ChangeMCP{changeType=CREATE, urn=urn:li:dataHubIngestionSource:ebb4e10e-6006-4790-9998-2ee55ff30a62, aspectName='dataHubIngestionSourceKey', recordTemplate={id=ebb4e10e-6006-4790-9998-2ee55ff30a62}, systemMetadata={lastObserved=1728643935210, version=1, properties={appSource=ui}}}]}
2024-10-11 10:52:17,397 [ForkJoinPool.commonPool-worker-13] INFO  c.l.m.entity.EntityServiceImpl:947 - Ingesting aspects batch to database: AspectsBatchImpl{items=[ChangeMCP{changeType=UPSERT, urn=urn:li:dataHubExecutionRequest:1c634b5b-0111-4328-a7fc-4f74dfe62ff7, aspectName='dataHubExecutionRequestInput', recordTemplate={args={recipe={"run_id":"urn:li:dataHubExecutionRequest:1c634b5b-0111-4328-a7fc-4f74dfe62ff7","source":{"type":"postgres","config":{"include_tables":true,"database":"timeseries","password":"${timeseries_188}","profiling":{"enabled":true,"profile_table_l..., systemMetadata={lastObserved=1728643937386, version=1, properties={appSource=ui}}}, ChangeMCP{changeType=CREATE, urn=urn:li:dataHubExecutionRequest:1c634b5b-0111-4328-a7fc-4f74dfe62ff7, aspectName='dataHubExecutionRequestKey', recordTemplate={id=1c634b5b-0111-4328-a7fc-4f74dfe62ff7}, systemMetadata={lastObserved=1728643937386, version=1, properties={appSource=ui}}}]}
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388
Creating virtualenv at: /tmp/datahub/ingest/venv-postgres-3cbb1ad0ed8a0388

@pricingblock-project pricingblock-project changed the title Ingestion failed to complete, or completed with errors. Ingestion failed to complete, or completed with errors. postgres Oct 11, 2024
@david-leifker
Copy link
Collaborator

Please restart the actions pod and retry after checking your networking for access to the pypi repo.
Caused by: Failed to download distribution due to network timeout.

@sprybee
Copy link

sprybee commented Oct 25, 2024

docker容器无法访问pypi仓库导致,可以通过
1、docker exec -it (datahub-actions:head的容器ID) /bin/bash 进入容器
2、手动执行:pip3 install psycopg2-binary -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

@jjoyce0510
Copy link
Collaborator

Hi folks - did this end up working? If yes I'll go ahead and close the ticket.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

4 participants