Drop scivision #1

jmarshrossney · 2024-08-24T08:40:01Z

I am proposing to drop scivision as a dependency, but to keep an eye on how it develops in case we want to adopt their model-loading protocol in future.

Why?

Bluntly, there are no functional benefits to using scivision, for us, at this point in time.

As far as I can tell the only reason to keep using it is because we support what they are trying to do and want to inject momentum into the project rather than remove it. That is a good reason, but given that we are in the very early stages of a project and don't know quite the direction we want to take it, I think it's reasonable for us to make this decision down the line.

Another reason for waiting is that intake (which scivision uses heavily) is currently going through a full-scale rewrite, and the maintainers say (see docs)

All old “sources”, whether still working or not, should be considered deprecated.

My instinct is to wait and see how this plays out and if/when scivision gets updated.

Finally, the single model in the scivision catalogue that we are interested in locks us into a very old version of Python (3.9).

What changes

I forked the cefas/turing model, updated it to Python 3.12 and stripped out all the stuff that either didn't work or that we didn't need.

So at first approximation the only changes in the code should be that instead of loading the pretrained model from the original repo via scivision.load_pretrained_model, we instead just pip install the fork.

jmarshrossney · 2024-08-24T11:07:59Z

Note to explain why a bunch of things now appear broken with intake-xarray: it is not yet compatible with intake version 2.x, but because this dependency is given as intake >=0.6.6 in setup.py it nevertheless installs 2.x. The 'fix' is to manually pin intake==0.7.0 here.

This, among some other issues, motivates me to drop intake-xarray for this project until (a) we have a concrete use for xarray functionality in the first place, and then (b) when both intake and intake-xarray are updated to v2 and compatible with each other again.

jmarshrossney · 2024-08-24T13:35:23Z

More notes to self: Trying to get things working within a conda environment, and learning about even more ways that this can break for non-intuitive reasons...

Apparently something as simple as

conda create -n test python=3.12
conda activate test
conda install pytorch
conda install scikit-image

fails for me - libmamba cannot find a solution.

I have conda 24.5.0 and my .condarc is:

solver: libmamba
channels:
  - pytorch
  - conda-forge
channel_priority: strict

It turns out that if I switch the channel_priority back to flexible then it works. I'm just surprised it's so easy to break things when strict channel priority is recommended, partly because it can supposedly "reduce package incompatibility problems".

jmarshrossney · 2024-08-24T13:57:09Z

environment.yml

 channels:
  - pytorch
  - conda-forge
-  - defaults
+channel_priority: flexible


Because it fails for me with strict priority

jmarshrossney · 2024-08-24T13:59:59Z

src/cyto_ml/models/scivision.py

+    tensor_image = to_dtype(
+        to_image(image_numpy),  # permutes HWC -> CHW
+        torch.float32,
+        scale=True,  # rescales [0, 255] -> [0, 1]
+    )


ToTensor is deprecated and I prefer the explicitness of this! I also think was always just a composition of these two transformations anyway.

jmarshrossney · 2024-08-24T14:22:01Z

I'm now turning to the notebooks, and I'm reading @metazool 's observations about scivision in ImageEmbeddings.ipynb

The scivision wrapper depends on this being an xarray Dataset with settable attributes, rather than a DataArray

Setting exif_tags: True (Dataset) or False (DataArray) is what controls this [...]

[...] But now we're dependent on image height and width metadata being set in the EXIF tags to use the predict interface, this is set in the model description through scivision, this is brittle [...]

[...] In this case we don't want to use the predict interface anyway (one of N class labels) - we want the features that go into the last fully-connected layer

Kind of seems like you're way ahead of me!

So it's clear we want to bypass the scivision wrapper entirely to play with the penultimate layer 'embeddings', but I was curious to see what predict is doing under the hood in case it could be useful in future.

Basically I don't fully understand what the vision is, but it doesn't really matter because it's not finished and we probably shouldn't use this unless we are intentionally trying to contribute to the project (which, to repeat, I'm not against in principle, just not right now!)

I really don't want to be unfair about anyone else's work (my github is entirely full of terrible unfinished work) - and from all I've gathered scivision seems like a great idea in principle - but this doesn't quite seem ready for downstream use.

To give just one example PretrainedModel.predict takes arbitrary positional arguments and silently drops all but the first. According to the blame the entire PretrainedModel class was added in a single commit in 2021 and has never been updated, so make of that what you will..

jmarshrossney · 2024-08-24T14:29:08Z

So now we come to the load_dataset part of scivision, which is also being used in the notebooks.

This is a thin wrapper around (the old) intake.open_catalog - see source.

For now at least we only need to think about intake configurations as filepaths, not github repository URLs with a .scivision directory, so we can cut out almost everything...

jmarshrossney · 2024-08-24T14:35:11Z

notebooks/ImageEmbeddings.md

 ```

 ```python
-dataset = load_dataset(f"{os.environ.get('ENDPOINT', '')}/metadata/intake.yml")
-model = load_pretrained_model("https://github.com/alan-turing-institute/plankton-cefas-scivision")
+dataset = open_catalog(f"{os.environ.get('ENDPOINT', '')}/metadata/intake.yml")


huh.. it looks like intake.open_catalog was the only useful thing scivision.load_dataset was doing for us..

metazool · 2024-08-27T13:10:55Z

I've found my way back here from the subsequent PR into the main project NERC-CEH#24

I'm impressed with mix of attention to microdetail and critical abstraction that you've put into working through this! I'll write more about possible future choices of model and of fork location in the discussion there, but broadly agree with the analysis here; scivision is one to keep an eye on, but not to cause ourselves trouble in the short term by a direct integration...

jmarshrossney added 6 commits August 22, 2024 12:32

don't move to gpu inside prepare_data

724c096

switch to src layout

65cc62a

undo accidental nesting of test folder

9600b33

run black

5807fed

change python -m pytest to pytest in README

49b1d21

update python version; deps in pyproject

70b82c2

jmarshrossney added 2 commits August 24, 2024 13:08

load scivision model from forked repo

bc7c43b

move vector store; minor adjustments

6c7aaf2

test install/setup using pip and conda

c03d9f2

jmarshrossney commented Aug 24, 2024

View reviewed changes

minimally remove scivision from ImageEmbeddings notebook

d84c768

jmarshrossney commented Aug 24, 2024

View reviewed changes

jmarshrossney added 2 commits August 24, 2024 16:15

update remainder of ImageEmbeddings.md

daa6733

add matplotlib dependence for notebooks

32bdc1f

jmarshrossney changed the title ~~Drop scivision (for now)~~ Drop scivision Aug 24, 2024

jmarshrossney mentioned this pull request Aug 24, 2024

Drop scivision NERC-CEH/plankton_ml#24

Merged

run black; update flake8 config to ignore directories

2bf3586

Merge branch 'main' into drop-scivision

a21719c

jmarshrossney force-pushed the main branch from 49b1d21 to e1f6d58 Compare August 29, 2024 08:39

jmarshrossney closed this Aug 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop scivision #1

Drop scivision #1

jmarshrossney commented Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney Aug 24, 2024

jmarshrossney Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney Aug 24, 2024

metazool commented Aug 27, 2024

Drop scivision #1

Drop scivision #1

Conversation

jmarshrossney commented Aug 24, 2024

Why?

What changes

jmarshrossney commented Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney Aug 24, 2024

Choose a reason for hiding this comment

jmarshrossney Aug 24, 2024

Choose a reason for hiding this comment

jmarshrossney commented Aug 24, 2024

jmarshrossney commented Aug 24, 2024

jmarshrossney Aug 24, 2024

Choose a reason for hiding this comment

metazool commented Aug 27, 2024