-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Drop scivision #1
Conversation
Note to explain why a bunch of things now appear broken with This, among some other issues, motivates me to drop |
More notes to self: Trying to get things working within a conda environment, and learning about even more ways that this can break for non-intuitive reasons... Apparently something as simple as conda create -n test python=3.12
conda activate test
conda install pytorch
conda install scikit-image fails for me - libmamba cannot find a solution. I have solver: libmamba
channels:
- pytorch
- conda-forge
channel_priority: strict It turns out that if I switch the |
channels: | ||
- pytorch | ||
- conda-forge | ||
- defaults | ||
channel_priority: flexible |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because it fails for me with strict
priority
tensor_image = to_dtype( | ||
to_image(image_numpy), # permutes HWC -> CHW | ||
torch.float32, | ||
scale=True, # rescales [0, 255] -> [0, 1] | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ToTensor
is deprecated and I prefer the explicitness of this! I also think was always just a composition of these two transformations anyway.
I'm now turning to the notebooks, and I'm reading @metazool 's observations about scivision in
Kind of seems like you're way ahead of me! So it's clear we want to bypass the scivision wrapper entirely to play with the penultimate layer 'embeddings', but I was curious to see what Basically I don't fully understand what the vision is, but it doesn't really matter because it's not finished and we probably shouldn't use this unless we are intentionally trying to contribute to the project (which, to repeat, I'm not against in principle, just not right now!) I really don't want to be unfair about anyone else's work (my github is entirely full of terrible unfinished work) - and from all I've gathered scivision seems like a great idea in principle - but this doesn't quite seem ready for downstream use. To give just one example |
So now we come to the This is a thin wrapper around (the old) For now at least we only need to think about intake configurations as filepaths, not github repository URLs with a |
``` | ||
|
||
```python | ||
dataset = load_dataset(f"{os.environ.get('ENDPOINT', '')}/metadata/intake.yml") | ||
model = load_pretrained_model("https://github.com/alan-turing-institute/plankton-cefas-scivision") | ||
dataset = open_catalog(f"{os.environ.get('ENDPOINT', '')}/metadata/intake.yml") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
huh.. it looks like intake.open_catalog
was the only useful thing scivision.load_dataset
was doing for us..
I've found my way back here from the subsequent PR into the main project NERC-CEH#24 I'm impressed with mix of attention to microdetail and critical abstraction that you've put into working through this! I'll write more about possible future choices of model and of fork location in the discussion there, but broadly agree with the analysis here; |
I am proposing to drop
scivision
as a dependency, but to keep an eye on how it develops in case we want to adopt their model-loading protocol in future.Why?
Bluntly, there are no functional benefits to using scivision, for us, at this point in time.
As far as I can tell the only reason to keep using it is because we support what they are trying to do and want to inject momentum into the project rather than remove it. That is a good reason, but given that we are in the very early stages of a project and don't know quite the direction we want to take it, I think it's reasonable for us to make this decision down the line.
Another reason for waiting is that
intake
(whichscivision
uses heavily) is currently going through a full-scale rewrite, and the maintainers say (see docs)My instinct is to wait and see how this plays out and if/when
scivision
gets updated.Finally, the single model in the scivision catalogue that we are interested in locks us into a very old version of Python (3.9).
What changes
I forked the cefas/turing model, updated it to Python 3.12 and stripped out all the stuff that either didn't work or that we didn't need.
So at first approximation the only changes in the code should be that instead of loading the pretrained model from the original repo via
scivision.load_pretrained_model
, we instead justpip install
the fork.