Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix an N^2 in batch hydration #172

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

bshepherdson
Copy link

This hadn't been noticed previously since N is generally small, eg. 50 or 60 at most. Working with a Metabase dashboard that loaded 10000 fields (100 each from 100 tables) in a single select + hydrate, I found this was consuming about 3 seconds.

The slow path was effectively a very slow conj: taking the acc vector, and then for each annotated-instances doing:

(recur (vec (concat acc nil [(first hydrated-instances)]))
       ...)

which is pouring the whole vector into a lazy sequence and back into a vector, once for each instance in the list.

The new version uses a custom transducer to do the same process in O(n) time (and without any seq overhead, as a bonus).

Copy link
Owner

@camsaul camsaul left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the failing tests?

@alexander-yakushev alexander-yakushev force-pushed the batch-hydrate-perf branch 3 times, most recently from 80aa97d to ad2791e Compare September 12, 2024 20:34
Copy link

codecov bot commented Sep 12, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 83.58%. Comparing base (3729d20) to head (4748350).

Additional details and impacted files
@@           Coverage Diff           @@
##           master     #172   +/-   ##
=======================================
  Coverage   83.58%   83.58%           
=======================================
  Files          37       37           
  Lines        2498     2498           
  Branches      212      212           
=======================================
  Hits         2088     2088           
  Misses        198      198           
  Partials      212      212           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@alexander-yakushev
Copy link
Contributor

alexander-yakushev commented Sep 12, 2024

Bump. I've rewritten the implementation of merging original instances with batch-hydrated instances again. I'm building an index that serves both as the indicator which instances should be taken into the batch to be hydrated and which not; and also it enumerates the instances-to-by-hydrated, so that on the second pass of this index, we can extract the hydrated instances by their position in the batch.

The tests are now fixed.

This hadn't been noticed previously since N is generally small, eg. 50
or 60 at most. Working with a Metabase dashboard that loaded 10000
fields (100 each from 100 tables) in a single `select` + `hydrate`, I
found this was consuming about 3 seconds.

The slow path was effectively a very slow `conj`: taking the `acc`
vector, and then for each `annotated-instances` doing:

```clojure
(recur (vec (concat acc nil [(first hydrated-instances)]))
       ...)
```

which is pouring the whole vector into a lazy sequence and back into a
vector, once for each instance in the list.

The new version first creates an "index" of instances that need hydration, and
later uses that index to map the instances in the hydrated instances batch to
their places in the initial collection.

Co-authored-by: Braden Shepherdson <braden@metabase.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants