Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(data-warehouse): Use a seperate thread pool executor for warehouse pipeline #25831

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Gilbert09
Copy link
Member

Problem

  • When we copy files from the local file system to S3 at the end of the DLT pipeline, the process blocks the main event loop and thus blocks the temporal heartbeater from beating - causing the pipeline to fail on large jobs
  • Turns out using asyncio.to_thread uses the same default thread pool as the heartbeater

Changes

  • Use a dedicated thread pool for the DLT pipeline so that the S3 file uploads don't block the heartbeater

@Gilbert09 Gilbert09 requested a review from a team October 25, 2024 16:54
Copy link
Contributor

@rossgray rossgray left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. To be honest I would have expected it to work fine before too especially since this is one of the suggested approaches in the Python docs. Though I have to admit a lot of asyncio stuff is a bit of a mystery and there's no better substitute to testing out in reality. So as long as you've tested it and it seems to fix the underlying issue then I'm happy with this change.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants