Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement queries for cmr and hydrocron #198

Closed
frankinspace opened this issue Jun 27, 2024 · 2 comments · Fixed by #207
Closed

Implement queries for cmr and hydrocron #198

frankinspace opened this issue Jun 27, 2024 · 2 comments · Fixed by #207
Assignees
Labels
enhancement New feature or request

Comments

@frankinspace
Copy link
Member

frankinspace commented Jun 27, 2024

@frankinspace frankinspace added the enhancement New feature or request label Jun 27, 2024
@frankinspace frankinspace changed the title Implement track status lambda Implement queries for cmr and hydrocron Jun 27, 2024
@nikki-t nikki-t self-assigned this Jul 2, 2024
@nikki-t
Copy link
Collaborator

nikki-t commented Jul 18, 2024

I was able to implement the CMR query which returns a list of granule URs present in CMR for a specified time range. I started to implement the Hydrocron query but ran into some confusion around DynamoDB queries.

I would like to query the hydrocron tables in a way that returns all granuleUR with a range_start_time between a specified range which I can determine from the CMR query. So something this:

items = hydrocron_table.query(
            IndexName="GranuleURIndex",
            KeyConditionExpression=(Key("range_start_time").between(start_time, end_time))
        )

But when I try to do this the query produces an error and complains that I have not included a "key schema element" in the query:

botocore.exceptions.ClientError: An error occurred (ValidationException) when calling the Query operation: Query condition missed key schema element

From my understanding you cannot query DynamoDB on a date range only and instead would need to perform a scan which is costly.

For now I am going to query Hydrocron by granuleUR which I can get by iterating over the CMR query results to perform the query. This will allow me to automatically see what is in CMR but not in Hydrocron.

I won't be able to easily determine what is in Hydrocron but not in CMR (or overlap between the two). I think we would to set up another global secondary index on the "range_start_time" with a range key of "granuleUR" if we want to facilitate a different query. Although I am not sure that quite fits with the DynamoDB use case.

@torimcd - Does this fit with your understanding of DynamoDB queries?

Any objections to de-prioritizing determining what is in Hydrocron but not CMR since we were only going to log those results for now?

@nikki-t
Copy link
Collaborator

nikki-t commented Jul 18, 2024

We will de-scope determining overlap and what exists in Hydrocron but not CMR for now. Create #206 to track future work.

@nikki-t nikki-t mentioned this issue Jul 19, 2024
4 tasks
@nikki-t nikki-t linked a pull request Jul 19, 2024 that will close this issue
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

2 participants