Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Summary and count causes performance issues on large datasets #37

Open
markbrough opened this issue Nov 20, 2022 · 3 comments
Open

Summary and count causes performance issues on large datasets #37

markbrough opened this issue Nov 20, 2022 · 3 comments

Comments

@markbrough
Copy link

With very large datasets (e.g. 13m rows), summary and count appear to significantly slow down the response:

babbage/babbage/cube.py

Lines 89 to 96 in 9416105

# Count
count = count_results(self, prep(cuts,
drilldowns=drilldowns,
columns=[1])[0])
# Summary
summary = first_result(self, prep(cuts,
aggregates=aggregates)[0].limit(1))

Without generating summary and count, it's 2-3 times faster to return the response.

It would be useful to make returning these properties optional. E.g. by adding an optional &simple parameter to the request.

@jbothma
Copy link

jbothma commented Nov 20, 2022 via email

@markbrough
Copy link
Author

Maybe include_fields might be confusing, e.g. compared with the different dimensions etc? Presuming that the default should be to include all properties, how about exclude, with an optional list, e.g. &exclude="count";"summary"

@markbrough
Copy link
Author

Because I need this for something quickly, I implemented it already the way I described above - but happy to hear your feedback on the above and then I can adjust:
https://github.com/markbrough/babbage/tree/37-simple-request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants