Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimum change to reflect new bucket locations #18

Merged
merged 3 commits into from
Aug 8, 2024
Merged

Conversation

metazool
Copy link
Collaborator

@metazool metazool commented Jul 30, 2024

See #12 for context (changes to the layout of the object store).

  • Updates to the bucket locations (separate collections for FlowCam and flow cytometer images)
  • intake metadata and catalog (just a file listing) have moved into the image buckets alongside the cyto_app metadata
  • Adds a basic test that checks files can be read from the new buckets, only runs if connection details are set in .env

This is a small and quite blunt change - i think any more work on making it generic should defer until

  • Revisit how the "decollage" process works in cyto_app so that the spatio-temporal metadata that is available with the sources is preserved along with the individual images
  • Have a much closer look at the flow cytometer data - if in theory we can get spectrograms out of it as well, then just pushing the images through a classifier model is losing potential...

To test

pip install -e .
py.test 

The tests for new bucket locations will skip unless you have ENDPOINT and the JASMIN access tokens set in .env (they should skip in the pipeline but that's ok imo, just doing a slow file listing)

@Kzra

@metazool
Copy link
Collaborator Author

metazool commented Jul 30, 2024

Considered adding documentation for the approach to intake here (expected to really complete #12) but better to keep PRs small and actually finish them :D

Copy link

☂️ Python Coverage

current status: ✅

Overall Coverage

Lines Covered Coverage Threshold Status
123 105 85% 0% 🟢

New Files

File Coverage Status
cyto_ml/tests/test_object_store.py 60% 🟢
TOTAL 60% 🟢

Modified Files

File Coverage Status
cyto_ml/data/s3.py 64% 🟢
cyto_ml/tests/conftest.py 100% 🟢
TOTAL 82% 🟢

updated for commit: 73d5c1e by action🐍

@metazool
Copy link
Collaborator Author

Updated to fix a test failure where .env is present but with arbitrary string instead of ENDPOINT location

Should bring this back in sync with the state of cyto-ML project @Kzra - are you happy to add an approving review? Please also see an issue there about preserving more of the metadata during the decollaging process :D

@metazool
Copy link
Collaborator Author

metazool commented Aug 8, 2024

Much appreciated, @albags !

@metazool metazool merged commit 33764e3 into main Aug 8, 2024
4 checks passed
@metazool metazool deleted the move_buckets branch October 2, 2024 08:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants