Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull ollama model while creating new instances #113

Closed
wants to merge 4 commits into from

Conversation

destrex271
Copy link
Contributor

No description provided.

Comment on lines +235 to +242
let res = runtime.block_on(async {
instance.pull_model();
});

let _ = match res {
Err(e) => error!("ERROR: {:?}", e.to_string()),
_ => info!("Model pulled successfully!"),
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How much overhead is it going to be making a call to pull model on every chat completion request? Would it be possible to only try pulling the model if we first fail a chat completion request because the model has not been pulled? That might even be something better configured server side instead of in the extension 🤔

Copy link
Contributor Author

@destrex271 destrex271 Jun 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pull_model call will only account for a single request but yes this can introduce an overhead.
We can go with the flow that you suggested, of first calling the model and then pulling it if it fails.

That might even be something better configured server side instead of in the extension

According to me its better if this stays within the extension just for the ease of use.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree the convenience is nice...

The server in ./vector-serve always pulls a model if it does not already exist (unless it is disabled via env var). So the model, if it does not exist, is typically downloaded during the vectorize.table call, when that function calls the server to get the dim of the embedding model. Calling the vector-serve endpoint for model info ends up triggering the download. Maybe there is something similar we can do here, where the call to get the model info for Ollama models is what triggers the pull?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okk, I'll try to implement something similar to this, we already have the function just need to see the right place to put it

@ChuckHend
Copy link
Member

@destrex271 - are you ok if we close this re-open again later if it is still something we want to implement?

@destrex271
Copy link
Contributor Author

@destrex271 - are you ok if we close this re-open again later if it is still something we want to implement?

Oops! Forgot to close it earlier !
Yep, its probably a bad idea to actually have this step

@destrex271
Copy link
Contributor Author

Closing #113 also

@destrex271 destrex271 closed this Oct 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants