Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OOMKilled for BigQuery ingestion from UI #11597

Closed
edulodgify opened this issue Oct 11, 2024 · 4 comments
Closed

OOMKilled for BigQuery ingestion from UI #11597

edulodgify opened this issue Oct 11, 2024 · 4 comments
Assignees
Labels
bug Bug report

Comments

@edulodgify
Copy link

edulodgify commented Oct 11, 2024

Describe the bug
After doing some tests with the “default” setup of datahub for k8s, all installed with helm chart. We have observed some issues with BigQuery ingestion from the UI, All these tests have been done with a small dataset.
We have tried to do the ingestion from CLI, since we will have to run it from third party tools like mage, for this case we have not had any problem, sometimes it is a little slow but it has always finished well. Note that we observed some warning with the message:

Cannot traverse scope _u_12.data_source with type '<class 'sqlglot.expressions.Column'>'
but this has not affected the ingestion at any time and it has always finished.
When we try to do the ingestion from the UI, we have not had any problem with tableu and dbt, but with BigQuery we see that it gets “stuck” shortly after starting and never finishes. Moreover, no matter how much we try to kill the process manually we can't get it to die, we see how it keeps increasing and increasing until it reaches the limit of the container and then it restarts.

image

The process is really stuck from almost the beginning, for the screenshot that we see the container was stuck at 8:16, in fact the last execution log that we see is the following one

[eef09359-6c70-4abf-942a-3131df168b88 logs] [2024-10-10 08:16:02,521] WARNING {sqlglot.optimizer.scope:548} - Cannot traverse scope _u_12.data_source with type '<class 'sqlglot.expressions.Column'>'

it seems that there is a memory leak or a task that is not well configured.
We don't think it is due to lack of resources because when we have executed that same recipe manually from inside the container with the command

datahub ingest -c ./bigquery.yaml
datahub UI uses command

datahub ingest run -c /tmp/datahub/ingest/513534ba-0e6a-4d1c-a71a-84efd17d50a1/recipe.yml --report-to /tmp/datahub/ingest/513534ba-0e6a-4d1c-a71a-84efd17d50a1/ingestion_report.json

this pipeline has finished without problems, in this screenshot you can see the resources consumed

image

Sink (datahub-rest) report:
{'total_records_written': 43,
'records_written_per_second': 0,
'warnings': [],
'failures': [],
'start_time': '2024-10-10 08:04:46.719552 (6 minutes and 5.01 seconds ago)',
'current_time': '2024-10-10 08:10:51.729373 (now)',
'total_duration_in_seconds': 365.01,
'max_threads': 15,
'gms_version': 'v0.14.0.2',
'pending_requests': 0,
'main_thread_blocking_timer': '0.063 seconds'}
Pipeline finished successfully; produced 43 events in 5 minutes and 59.12 seconds.
datahub@datahub-acryl-datahub-actions-6bc87bfd9b-d78vl:~$ exit

manual ingest vs UI ingest

image

To Reproduce
Steps to reproduce the behavior:

  1. Go to ingestion
  2. Click on create new source
  3. Use the following YAML
    source:
    type: bigquery
    config:
    include_table_lineage: true
    include_usage_statistics: true
    include_tables: true
    include_views: true
    profiling:
    enabled: true
    profile_table_level_only: true
    stateful_ingestion:
    enabled: false
    credential:
    project_id: lodgify-datalab-1
    private_key: "-----BEGIN PRIVATE KEY-----\nmysupersecurekey\n-----END PRIVATE KEY-----\n"
    private_key_id: privatekey
    client_email: datahub@random-project.iam.gserviceaccount.com
    client_id: '1111111111111111111111111'
    dataset_pattern:
    allow:
    - ^personio

Expected behavior
task does not get stuck and the memory usage continues increasing until it reaches the limit and restarts the container

Environment:

  • OS: v1.30.2-gke.1587003
  • Browser all
  • Version v0.14.0.2
@edulodgify edulodgify added the bug Bug report label Oct 11, 2024
@edulodgify edulodgify changed the title A short description of the bug OOMKilled for BigQuery ingestion from UI Oct 11, 2024
@david-leifker
Copy link
Collaborator

One thing to try. Please run the ingestion via cli in the docker container from the same venv that the UI process uses. This venv would be in the /tmp/datahub/ingest/venv-<name of the source>-<other stuff>. Does the memory leak occur in this case?

@david-leifker
Copy link
Collaborator

Possibly related #11147

@edulodgify
Copy link
Author

HI, i've reading issue #11147 and yes probably is the same issue. I'll add my logs anonymized in both tickets, just in case it could help to identify the memory issue
acryl.log

@jjoyce0510
Copy link
Collaborator

Please retry on v0.14.1.6 of DataHub CLI. There should be a fix for this in there!

Closing this ticket, feel free to reopen if you have additional questions or concerns!

Cheers
John

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Bug report
Projects
None yet
Development

No branches or pull requests

4 participants