Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Packaged way adding a detritus classifier to image processing #32

Open
5 of 6 tasks
metazool opened this issue Sep 11, 2024 · 2 comments
Open
5 of 6 tasks

Packaged way adding a detritus classifier to image processing #32

metazool opened this issue Sep 11, 2024 · 2 comments
Labels
demonstrate feature that we need to be able to show

Comments

@metazool
Copy link
Collaborator

metazool commented Sep 11, 2024

  • Streamlit demo shows k-means / etc clustering based on embeddings that has a reasonably clear "this is detritus" cluster
  • Sets an outline for the extent to which these images can be discarded before ever going to s3 storage

Workflow for generating a classifier: s3 image collection -> Extract and store embeddings -> Fit a clustering model -> save the resulting artifact for reuse in annotation workflow

Could be Luigi or this is an opportunity to try and get started with pyorderly, or is it an opportunity to test this walkthrough of DVC and work with CML

Outline:

  • Set up DVC for the training data as an external source
  • Use that instead of intake to drive the script that does embedding extraction
  • Try DVC pipeline stages to run the embedding script as an alternative to learning Luigi
  • Then step back, make sure the streamlit demo runs and stop worrying about deploying it in a structured way, just look at the clusters again
  • Pick a cluster label out of the air on the basis of what the demo shows and add a pipeline stage to pickle a K-means model which can then be used to add detritus labels in Simple automation for the image decollage process #31
  • Decide whether it's worth adding more metadata to chromadb (labels, image sizes)
@metazool metazool added the demonstrate feature that we need to be able to show label Sep 11, 2024
@metazool
Copy link
Collaborator Author

Taking intake out involves changing a few places where intake_xarray.ImageSource was being used to load images for the scivision model but it looks worth doing, results will be much more readable

@metazool
Copy link
Collaborator Author

metazool commented Oct 8, 2024

This is partly completed in #36 - simplest possible DVC pipeline that fits a Kmeans model for an image collection and saves it for reuse - with a web interface for exploring the contents of the different clusters to judge by eye which is primarily detritus

You can see there's still an open question about where the metadata goes. I thought about adding a tag right into the EXIF headers, or into the metadata that describes a lot of detail about each image's properties that the microscope exports. It depends what is most useful to the ongoing application! And also how this will be used - is the tagging an extra stage in a Luigi pipeline that's processing and uploading images to an object store, or is it a distinct pipeline that's indexing and analysing images once they've been uploaded?

So I've left it open for now - it needs another use case probably, like the phenocam images, show the wider picture

cc @albags @Kzra

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
demonstrate feature that we need to be able to show
Projects
None yet
Development

No branches or pull requests

1 participant