Create index on download/data object association #1411

naglepuff · 2024-10-10T17:15:35Z

Problem

Biosample search is really. slow. Its response contains not only biosample-level information, but also related data generations, workflow runs, and data objects. There are several improvements that can be made including speeding up existing queries, reducing the total number of queries, etc.

Changes

This PR adds an index on the data_object_id column for table bulk_download_data_object. This speeds up the subqueries to get download statistics for data objects during biosample search.

Testing

You can verify that migration created the correct index by connecting to your local database and running
\d bulk_download_data_object (this shows information about a table, including its indices). If you don't see the new index, try running docker compose run backend nmdc-server migrate and docker compose run backend nmdc-server migrate --ingest-db.

In your local development, run some biosample searches (from swagger, from the data portal proper, from cURL, whatever). After switching to this branch and running the migration, you will likely see an increase in speed for these queries, depending on how big your bulk_download_data_object table is.

This should speed up biosample search, since it returns download statistics for each data object for each biosample.

eecavanna · 2024-10-10T17:21:09Z

Thank you for implementing this!

Adding @pkalita-lbl, @shreddd, and @sierra-moxon as reviewers as I will be OOO until 12pm PT.

Once reviewed, I am comfortable with this PR being merged into main even though we have a release scheduled for early next week, given that team members have discussed the creation of this index over the past day or so (and have experimented with such an index, although they didn't create it via Alembic).

Create index on download/data object association

bc3b108

This should speed up biosample search, since it returns download statistics for each data object for each biosample.

naglepuff requested a review from eecavanna October 10, 2024 17:15

eecavanna requested review from shreddd, sierra-moxon and pkalita-lbl October 10, 2024 17:21

eecavanna assigned naglepuff Oct 10, 2024

pkalita-lbl approved these changes Oct 10, 2024

View reviewed changes

Remove unused import

2292ee2

naglepuff merged commit b6574ef into main Oct 10, 2024
2 checks passed

naglepuff deleted the 1410-bulk-download-association-index branch October 10, 2024 17:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create index on download/data object association #1411

Create index on download/data object association #1411

naglepuff commented Oct 10, 2024

eecavanna commented Oct 10, 2024 •

edited

Loading

Create index on download/data object association #1411

Create index on download/data object association #1411

Conversation

naglepuff commented Oct 10, 2024

Problem

Changes

Testing

eecavanna commented Oct 10, 2024 • edited Loading

eecavanna commented Oct 10, 2024 •

edited

Loading