Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest): run sqllineage in process by default #11650

Merged
merged 2 commits into from
Oct 17, 2024

Conversation

hsheth2
Copy link
Collaborator

@hsheth2 hsheth2 commented Oct 16, 2024

I'm not exactly sure what the root cause here was, but it seems like our usage of multiprocessing was causing the main thread to hang on exit. Because we have test order randomization enabled for unit tests, it happened inconsistently and I used --random-order-seed=598371 to reproduce it - it required a specific order of tests to be run. Additionally, this bug would only manifest on Linux, which is why my attempts to repro the issue on mac did not work.

Thread 5918 (idle): "MainThread"
    poll (multiprocessing/popen_fork.py:27)
    wait (multiprocessing/popen_fork.py:43)
    join (multiprocessing/process.py:149)
    _exit_function (multiprocessing/util.py:357)

Main changes:

  • Avoids all multiprocessing related stuff by default in both src and tests
  • Defaults to in-process SQL parsing for the legacy sqllineage parser. This only impact the redash source
  • Refactors redash to use ThreadedIteratorExecutor instead of multiprocessing
  • Removes the sql_parser config option from lookml, since it was no longer supported.

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

This only impacts the redash source, but also seemed to have impacted
the tests.
@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 16, 2024
Copy link

Hello @hsheth2 😄

Thank you so much for opening a pull request!

Image
You can check out your contributor card and see all your past stats here!

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Oct 17, 2024
@hsheth2 hsheth2 merged commit 8b42ac8 into master Oct 17, 2024
99 of 101 checks passed
@hsheth2 hsheth2 deleted the redash-sql-parse-in-process branch October 17, 2024 03:47
aviv-julienjehannet pushed a commit to aviv-julienjehannet/datahub that referenced this pull request Oct 21, 2024
epatotski pushed a commit to acryldata/datahub that referenced this pull request Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants