Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Use Psycopg3 COPY #451

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

SpaceCondor
Copy link
Contributor

This PR is aimed at using the Psycopg3 COPY functionality to speed up batch inserts.

Psycopg3 has the benefit of not needing to be file-based for COPY operations, so it avoids many of the issues with escaping values mentioned here:

#370

It also drastically decreases memory usage since we don't need to create a file in-memory to then serve to the database. All binding is done on server side (unlike psycopg2 which does it client-side).

This would require that we switch to psycopg3 as mentioned here:

#433

@@ -33,10 +33,11 @@ packages = [

[tool.poetry.dependencies]
python = ">=3.8"
faker = {version = "~=30.0", optional = true}
psycopg2-binary = "2.9.9"
faker = {version = "~=29.0", optional = true}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to downgrade this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope! My branch was slightly outdated 😅

pyproject.toml Outdated Show resolved Hide resolved
Comment on lines +39 to +40
psycopg = "^3.2.3"
psycopg-binary = "^3.2.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason we both the source and binary packages?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon This can be changed to just psycopg. Although now I am wondering if a user wants to use psycopg[c] or psycopg[binary] what would be the suggestion?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dbt-labs/dbt-postgres#96 is probably a good case study. Most users in the data space can't or don't want to build C extensions, so we'll probably prefer psycopg[binary].


# Use copy to run the copy statement.
# https://www.psycopg.org/psycopg3/docs/basic/copy.html
with connection.connection.cursor().copy(copy_statement) as copy: # type: ignore[attr-defined]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens at this point if someone sets postgresql+psycopg2 for dialect+driver?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@edgarrmondragon It would raise an exception. In the current main branch I don't think using anything aside from postgresql+psycopg2 would work anyway, so this being configurable doesn't add much.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not 100% sure, but I don't think we use driver-specific APIs and rely on SQLAlchemy DDL/DML in all places, so I would expect most drivers to work. Maybe I'm wrong.

Co-authored-by: Edgar Ramírez Mondragón <16805946+edgarrmondragon@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants