Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

query :: conversion to dataframe is really slow #85

Open
mpo-vliz opened this issue Jul 24, 2024 · 1 comment
Open

query :: conversion to dataframe is really slow #85

mpo-vliz opened this issue Jul 24, 2024 · 1 comment
Assignees
Labels
components.query query related issues linked to pykg2tbl enhancement New feature or request

Comments

@mpo-vliz
Copy link
Contributor

mpo-vliz commented Jul 24, 2024

Trying out the sema.query (previous pykg2tbl with some larger resultsets making dumping (via pandas.Dataframe) to csv run for ever.

Detail logging shows the time seems to be spent in the conversion from query-result into dataframe.

We should look into making that more efficient (probably by doing less in memory copying?)
Apparently there has been some work in this are:

@mpo-vliz mpo-vliz added enhancement New feature or request components.query query related issues linked to pykg2tbl labels Jul 24, 2024
@mpo-vliz mpo-vliz self-assigned this Jul 24, 2024
@mpo-vliz
Copy link
Contributor Author

applied in 99263fd

the positive effects on performance are nothing less than dramatically enormous --> for a large resultset (150k rows) this cut the conversion to df from >800k millis to a mere 4 millis!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
components.query query related issues linked to pykg2tbl enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant