Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop scivision #1

Closed
wants to merge 14 commits into from
Closed
Show file tree
Hide file tree
Changes from 9 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@
**/__pycache__/
vectors/
*.ipynb
*.egg-info/
venv/
1 change: 1 addition & 0 deletions .python-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
3.12
40 changes: 30 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,30 +6,50 @@ It's a companion project to an R-shiny based image annotation app that is not ye

## Installation

### Python environment setup
### Environment and package installation

Use anaconda or miniconda to create a python environment using the included `environment.yml`
#### Using pip

Create a fresh virtual environment in the repository root using Python >=3.12 and (e.g.) `venv`:

```
conda env create -f environment.yml
python -m venv venv
```

Please note that this is specifically pinned to python 3.9 due to dependency versions; we make experimental use of the [CEFAS plankton model available through SciVision](https://sci.vision/#/model/resnet50-plankton), which in turn uses an older version of pytorch that isn't packaged above python 3.9.
Next, install the package using `pip`:

### Object store connection
```
python -m pip install .
```

`.env` contains environment variable names for S3 connection details for the [JASMIN object store](https://github.com/NERC-CEH/object_store_tutorial/). Fill these in with your own credentials. If you're not sure what the `ENDPOINT` should be, please reach out to one of the project contributors listed below.
Most likely you are interested in developing and/or experimenting, so you will probably want to install the package in 'editable' mode (`-e`), along with dev tools and jupyter notebook functionality

```
python -m pip install -e .[all]
```

### Package installation
#### Using conda

Get started by cloning this repository and running
Use anaconda or miniconda to create a python environment using the included `environment.yml`

`pip install -e .`
```
conda env create -f environment.yml
conda activate cyto_ml
```

Next install this package _without dependencies_:

```
python -m pip install --no-deps -e .
```

### Object store connection

`.env` contains environment variable names for S3 connection details for the [JASMIN object store](https://github.com/NERC-CEH/object_store_tutorial/). Fill these in with your own credentials. If you're not sure what the `ENDPOINT` should be, please reach out to one of the project contributors listed below.

### Running tests

`python -m pytest` or `py.test`
`pytest` or `py.test`

## Contents

Expand Down
57 changes: 0 additions & 57 deletions cyto_ml/models/scivision.py

This file was deleted.

25 changes: 14 additions & 11 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -1,22 +1,25 @@
name: cyto_39
name: cyto_ml
channels:
- pytorch
- conda-forge
- defaults
channel_priority: flexible
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it fails for me with strict priority

dependencies:
- python=3.9
- pytorch=1.10.0
- mkl=2024.0
- chromadb=0.5.3
- python=3.12
- pytorch
- black
- chromadb
- flake8
- intake-xarray
- scikit-image
- intake=0.7
- isort
- jupyterlab
- jupytext
- pandas
- pytest
- python-dotenv
- s3fs
- jupyterlab
- jupytext
- scikit-image
- xarray
- pip
- pip:
- scivision
- git+https://github.com/alan-turing-institute/plankton-cefas-scivision@main
- git+https://github.com/jmarshrossney/resnet50-cefas
35 changes: 31 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,39 @@
[build-system]
requires = ["setuptools >= 61.0"]
build-backend = "setuptools.build_meta"

[project]
name = "cyto_ml"
version = "0.1"
version = "0.2.0"
requires-python = ">=3.12"
description = "This package supports the processing and analysis of plankton sample data"
readme = "README.md"
requires-python = "==3.9.*"
dependencies = [
"chromadb",
"intake==0.7.0",
"intake-xarray",
"pandas",
"python-dotenv",
"s3fs",
"scikit-image", # secretly required by intake-xarray as default reader
"torch",
"xarray",
"resnet50-cefas@git+https://github.com/jmarshrossney/resnet50-cefas",
]

[tool.setuptools]
py-modules = []
[project.optional-dependencies]
jupyter = ["jupyterlab", "jupytext"]
dev = ["pytest", "black", "flake8", "isort"]
all = ["cyto_ml[jupyter,dev]"]

[tool.jupytext]
formats = "ipynb,md"

[tool.pytest.ini_options]
filterwarnings = [
"ignore::DeprecationWarning",
]

[tool.black]
target-version = ["py312"]
line-length = 88
1 change: 1 addition & 0 deletions scripts/intake_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
Via https://gallery.pangeo.io/repos/pangeo-data/pangeo-tutorial-gallery/intake.html#Build-an-intake-catalog

"""

import os
from cyto_ml.data.intake import intake_yaml
from cyto_ml.data.s3 import s3_endpoint, image_index
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@

logging.basicConfig(level=logging.INFO)
# TODO make this sensibly configurable, not confusingly hardcoded
STORE = os.path.join(os.path.abspath(os.path.dirname(__file__)), "../../vectors")
STORE = os.path.join(os.path.abspath(os.path.dirname(__file__)), "../../../vectors")

client = chromadb.PersistentClient(
path=STORE,
Expand Down
File renamed without changes.
35 changes: 35 additions & 0 deletions src/cyto_ml/models/scivision.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
import torch
from torchvision.transforms.v2.functional import to_image, to_dtype
from xarray import DataArray

def prepare_image(image: DataArray):
"""
Take an xarray of image data and prepare it to pass through the model
a) Converts the image data to a PyTorch tensor
b) Accepts a single image or batch (no need for torch.stack)
"""
# Computes the DataArray and returns a numpy array
image_numpy = image.to_numpy()

# Convert the image data to a PyTorch tensor
tensor_image = to_dtype(
to_image(image_numpy), # permutes HWC -> CHW
torch.float32,
scale=True, # rescales [0, 255] -> [0, 1]
)
Comment on lines +16 to +20
Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ToTensor is deprecated and I prefer the explicitness of this! I also think was always just a composition of these two transformations anyway.

assert torch.all((tensor_image >= 0.0) & (tensor_image <= 1.0))

if tensor_image.dim() == 3:
# Single image, add a batch dimension
tensor_image = tensor_image.unsqueeze(0)

assert tensor_image.dim() == 4

return tensor_image


def flat_embeddings(features: torch.Tensor):
"""Utility function that takes the features returned by the model in truncate_model
And flattens them into a list suitable for storing in a vector database"""
# TODO: this only returns the 0th tensor in the batch...why?
return features[0].detach().tolist()
9 changes: 3 additions & 6 deletions cyto_ml/tests/conftest.py → tests/conftest.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,7 @@
import os
import pytest
from cyto_ml.models.scivision import (
load_model,
truncate_model,
SCIVISION_URL,
)

from resnet50_cefas import load_model


@pytest.fixture
Expand All @@ -30,7 +27,7 @@ def image_batch(image_dir):

@pytest.fixture
def scivision_model():
return truncate_model(load_model(SCIVISION_URL))
return load_model(strip_final_layer=True)


@pytest.fixture
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from torch import Tensor
from cyto_ml.models.scivision import prepare_image, flat_embeddings


def test_embeddings(scivision_model, single_image):
features = scivision_model(prepare_image(ImageSource(single_image).to_dask()))

Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
Expand Up @@ -9,8 +9,8 @@


def test_single_image(single_image):

image_data = ImageSource(single_image).to_dask()

# Tensorise the image (potentially normalise if we have useful values)
prepared_image = prepare_image(image_data)

Expand All @@ -25,7 +25,6 @@ def test_image_batch(image_batch):
We either pad them (and process a lot of blank space) or stick to single image input
"""
# Load a batch of plankton images

image_data = ImageSource(image_batch).to_dask()

with pytest.raises(ValueError) as err:
Expand Down
File renamed without changes.