Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enchancing Metadata Support for SOAR #118

Merged
merged 24 commits into from
Jul 6, 2024
Merged

Conversation

NucleonGodX
Copy link
Contributor

This Pull Request is part of Google Summer of Code (GSoC) 2024, focused on enhancing metadata support for SOAR in sunpy-soar (as discussed in issue #46 ). This initial phase involves integrating attributes from the existing sunpy.net attribute system. Specifically, join methods have been developed, and the Detector attribute has been successfully implemented for remote-sensing instruments.

@nabobalis
Copy link
Contributor

I'll review this in a bit more detail but can you add unit tests to ensure that detector works in a query as well as the construct methods create the expected queries?

@NucleonGodX
Copy link
Contributor Author

I've added tests, let me know if its fine.

sunpy_soar/client.py Outdated Show resolved Hide resolved
Instrument table name.
"""
instrument_name = query_item.split("=")[1][1:-1].split("-")[0].upper()
if instrument_name == "STIX":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a dict that maps the full name to the "table" name would be "cleaner" python code.

sunpy_soar/client.py Outdated Show resolved Hide resolved
@@ -20,6 +20,8 @@ class SOARClient(BaseClient):
Client to access the Solar Orbiter Archive (SOAR).
"""

join_needed = False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I like the idea of this being part of the client, can this not be moved else where?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could achieve a solution by making a basic function for it, let me know if its better!

@NucleonGodX NucleonGodX marked this pull request as draft May 31, 2024 18:09
sunpy_soar/client.py Outdated Show resolved Hide resolved
@NucleonGodX NucleonGodX marked this pull request as ready for review May 31, 2024 19:14
sunpy_soar/client.py Outdated Show resolved Hide resolved
sunpy_soar/client.py Outdated Show resolved Hide resolved
sunpy_soar/client.py Show resolved Hide resolved
sunpy_soar/client.py Show resolved Hide resolved
sunpy_soar/client.py Show resolved Hide resolved
sunpy_soar/tests/test_sunpy_soar.py Outdated Show resolved Hide resolved
level = a.Level(1)
res = Fido.search(instrument & time & level)
assert "Detector" in res[0].columns
assert res.file_num == 35
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Testing the number of results is maybe not very robust, this number might change in SOAR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just followed how tests are already written in test cases. But I do understand your point!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check for at least 35 files?

@NucleonGodX
Copy link
Contributor Author

Based on my research and the insights provided by Eric, I have implemented the wavelength functionality. I believe we should discuss the details of this implementation and further changes needed in this week's meeting.

@nabobalis
Copy link
Contributor

Can you add a changelog entry.

@NucleonGodX
Copy link
Contributor Author

NucleonGodX commented Jun 12, 2024

time = a.Time("2022-03-01 00:00", "2022-03-02 00:00")
level = a.Level(1)
product = a.soar.Product("EUI-FSI174-IMAGE")
detector= a.Detector("FSI")
result = Fido.search(instrument & time & level & product & detector)
print(result)

For some reason using detector= "FSI" in query results in an API error

raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 408 Client Error:  for url: http://soar.esac.esa.int/soar-sl-tap/tap/sync?REQUEST=doQuery&LANG=ADQL&FORMAT=json&QUERY=SELECT+h1.instrument,%20h1.descriptor,%20h1.level,%20h1.begin_time,%20h1.end_time,%20h1.data_item_id,%20h1.filesize,%20h1.filename,%20h1.soop_name,%20h2.detector,%20h2.wavelength,%20h2.dimension_index+FROM+v_sc_data_item%20AS%20h1%20JOIN%20v_eui_sc_fits%20AS%20h2%20USING%20(data_item_oid)+WHERE+h1.instrument='EUI'+AND+h1.begin_time%3E='2022-03-01+00:00:00'+AND+h1.begin_time%3C='2022-03-02+00:00:00'+AND+h2.dimension_index='1'+AND+h1.level='L1'+AND+h1.descriptor='eui-fsi174-image'+AND+h2.Detector='FSI'

if the query is done without taking detector in it, it gives normal response
like for,
result = Fido.search(instrument & time & level & product)

Even with an invalid detector in the query we get the normal 0 results output.

And for other detectors also it works perfectly fine for example

time = a.Time("2022-03-01 00:00", "2022-05-02 00:00")
detector= a.Detector("HRI_EUV")
result = Fido.search(instrument & time & level & detector)
22706 Results from the SOARClient:

Instrument     Data product    Level ... Detector Wavelength
                                     ...
---------- ------------------- ----- ... -------- ----------
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       ...                 ...   ... ...      ...        ...
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0
       EUI eui-hrieuv174-image    L1 ...  HRI_EUV      174.0

assert "Detector" in res[0].columns
assert res.file_num >= 35

# test for invalid detector..
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# test for invalid detector..
# Invalid detector

Copy link
Contributor

@nabobalis nabobalis Jun 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be duplicated?

I also think it should be its own test.

assert res.file_num == 0


def test_search_wavelength_column_wavelength():
Copy link
Contributor

@nabobalis nabobalis Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like 3 separate tests in one? I think this is the same for the test below as well?

I would prefer if we can keep each unit test as focused as possible.
It makes fixing one easier if one part breaks but not another.

@ebuchlin
Copy link

Hi again, I should have mentioned that I'm the Solar Orbiter Archive Scientist. The v_sc_data_item.sensor is indexed, and so works better than v_eui_sc_fits. Also, there is a ticket to fix the lack of end time in the v_eui_sc_fits view. I can prioritise it now I know you need it.

Thanks for the advice to use this indexed column. Maybe indexed columns could be indicated in the Tables, Views, and Columns help page? I understand that timeouts can be expected when using WHERE conditions on non-indexed columns.

sunpy_soar/client.py Outdated Show resolved Hide resolved
sunpy_soar/tests/test_sunpy_soar.py Show resolved Hide resolved
@hayesla
Copy link
Member

hayesla commented Jul 4, 2024

@NucleonGodX I think the last thing to do is to not return the PHI wavelengths now until we resolve the issue with the A vs nm. So I think we should just not return it in this PR. And similarly we should not return this for SPICE. Like we discussed on our call yesterday/

So for these both, we should maybe not even return the wavelength column. Let me know if this isnt clear.

Apart from these this PR is good to go!

def _(wlk, attr, params): # NOQA: ARG001
wavemin = attr.min.value
wavemax = attr.max.value
params.append(f"Wavemin='{wavemin}'+AND+h2.Wavemax='{wavemax}'")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing something, but isn't wavemin from h2 too?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if it is just parsed as a pattern in client.py, shouldn't it be consistent with wavemax?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as we are giving both wavemin and wavemax as a single parameter for query, I think its one of the simpler ways to implement it. If its still necessary to try something else, let me know :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it was just a thought about consistency between Wavemin and Wavemax.

where_part, from_part, select_part = SOARClient.add_join_to_query(query, data_table, instrument_table)
else:
from_part = data_table
select_part = "*"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need * here, as we select not all columns when a join is needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is for instruments where we don't need the join.

def test_search_wavelength_detector_column():
instrument = a.Instrument("EUI")
time = a.Time("2021-02-01", "2021-02-02")
level = a.Level(1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also a test for LL data?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually there is test_search_low_latency() (for MAG data; executes the query), and the new test_join_low_latency_query() (for EUI; only tests the query string, without executing it). So maybe this is enough?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, will reverse this change!

Comment on lines 66 to 71
# For PHI, we specify the wavemin as a parameter.
# This enables us to retrieve columns containing wavemin and their corresponding wavemax values.
# For SPICE we get data in form of wavemin/wavemax columns, but only for the first spectral window.
# Therefore, to make sure this data is not misleading to the user we do not return any values.
if "phi" in instrument_table:
parameter = f"Wavemin='{wavemin_match.group(1)}'"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At the moment we don't consider wavelength for PHI and SPICE (see this comment).

@NucleonGodX NucleonGodX requested a review from ebuchlin July 6, 2024 18:43
@nabobalis
Copy link
Contributor

Thanks for the reviews @ebuchlin and @hayesla.

I will run the CI and merge if it passes.

@nabobalis nabobalis merged commit f92ed86 into sunpy:main Jul 6, 2024
13 checks passed
@nabobalis
Copy link
Contributor

nabobalis commented Jul 6, 2024

Thanks for the first stage of the project @NucleonGodX, you can now rebase the other two PRs to remove the commits that came from this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants