-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pull ollama model while creating new instances #113
Conversation
let res = runtime.block_on(async { | ||
instance.pull_model(); | ||
}); | ||
|
||
let _ = match res { | ||
Err(e) => error!("ERROR: {:?}", e.to_string()), | ||
_ => info!("Model pulled successfully!"), | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much overhead is it going to be making a call to pull model on every chat completion request? Would it be possible to only try pulling the model if we first fail a chat completion request because the model has not been pulled? That might even be something better configured server side instead of in the extension 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pull_model call will only account for a single request but yes this can introduce an overhead.
We can go with the flow that you suggested, of first calling the model and then pulling it if it fails.
That might even be something better configured server side instead of in the extension
According to me its better if this stays within the extension just for the ease of use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree the convenience is nice...
The server in ./vector-serve always pulls a model if it does not already exist (unless it is disabled via env var). So the model, if it does not exist, is typically downloaded during the vectorize.table
call, when that function calls the server to get the dim of the embedding model. Calling the vector-serve endpoint for model info ends up triggering the download. Maybe there is something similar we can do here, where the call to get the model info for Ollama models is what triggers the pull?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okk, I'll try to implement something similar to this, we already have the function just need to see the right place to put it
@destrex271 - are you ok if we close this re-open again later if it is still something we want to implement? |
Oops! Forgot to close it earlier ! |
Closing #113 also |
No description provided.