Create index on download/data object association #1411
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fix #1410
Problem
Biosample search is really. slow. Its response contains not only biosample-level information, but also related data generations, workflow runs, and data objects. There are several improvements that can be made including speeding up existing queries, reducing the total number of queries, etc.
Changes
This PR adds an index on the
data_object_id
column for tablebulk_download_data_object
. This speeds up the subqueries to get download statistics for data objects during biosample search.Testing
You can verify that migration created the correct index by connecting to your local database and running
\d bulk_download_data_object
(this shows information about a table, including its indices). If you don't see the new index, try runningdocker compose run backend nmdc-server migrate
anddocker compose run backend nmdc-server migrate --ingest-db
.In your local development, run some biosample searches (from swagger, from the data portal proper, from cURL, whatever). After switching to this branch and running the migration, you will likely see an increase in speed for these queries, depending on how big your
bulk_download_data_object
table is.