Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create index on download/data object association #1411

Merged
merged 2 commits into from
Oct 10, 2024

Conversation

naglepuff
Copy link
Collaborator

Fix #1410

Problem

Biosample search is really. slow. Its response contains not only biosample-level information, but also related data generations, workflow runs, and data objects. There are several improvements that can be made including speeding up existing queries, reducing the total number of queries, etc.

Changes

This PR adds an index on the data_object_id column for table bulk_download_data_object. This speeds up the subqueries to get download statistics for data objects during biosample search.

Testing

You can verify that migration created the correct index by connecting to your local database and running
\d bulk_download_data_object (this shows information about a table, including its indices). If you don't see the new index, try running docker compose run backend nmdc-server migrate and docker compose run backend nmdc-server migrate --ingest-db.

In your local development, run some biosample searches (from swagger, from the data portal proper, from cURL, whatever). After switching to this branch and running the migration, you will likely see an increase in speed for these queries, depending on how big your bulk_download_data_object table is.

This should speed up biosample search, since it returns download
statistics for each data object for each biosample.
@eecavanna
Copy link
Collaborator

eecavanna commented Oct 10, 2024

Thank you for implementing this!

Adding @pkalita-lbl, @shreddd, and @sierra-moxon as reviewers as I will be OOO until 12pm PT.

Once reviewed, I am comfortable with this PR being merged into main even though we have a release scheduled for early next week, given that team members have discussed the creation of this index over the past day or so (and have experimented with such an index, although they didn't create it via Alembic).

@naglepuff naglepuff merged commit b6574ef into main Oct 10, 2024
2 checks passed
@naglepuff naglepuff deleted the 1410-bulk-download-association-index branch October 10, 2024 17:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create index on column data_object_id for table bulk_download_data_object
3 participants