From 7b8c9d8dcc1f604137041c86fb9f0a845dc5fa85 Mon Sep 17 00:00:00 2001 From: Nick Stogner Date: Thu, 29 Aug 2024 23:11:52 -0400 Subject: [PATCH] Add bare bones doc about managing models (#149) Helps clarify #145 --- README.md | 1 + docs/model-management.md | 72 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 73 insertions(+) create mode 100644 docs/model-management.md diff --git a/README.md b/README.md index 7d8959a1..7ce3675e 100644 --- a/README.md +++ b/README.md @@ -108,6 +108,7 @@ Any vLLM or Ollama model can be served by KubeAI. Some examples of popular model ## Guides * [Cloud Installation](./docs/cloud-install.md) - Deploy on Kubernetes clusters in the cloud +* [Model Management](./docs/model-management.md) - Manage ML models ## OpenAI API Compatibility diff --git a/docs/model-management.md b/docs/model-management.md new file mode 100644 index 00000000..47c12655 --- /dev/null +++ b/docs/model-management.md @@ -0,0 +1,72 @@ +# Model Management + +KubeAI uses Model [Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) to configure what ML models are available in the system. + +Example: + +```yaml +apiVersion: kubeai.org/v1 +kind: Model +metadata: + name: llama-3.1-8b-instruct-fp8-l4 +spec: + features: ["TextGeneration"] + owner: neuralmagic + url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8 + engine: VLLM + args: + - --max-model-len=16384 + - --max-num-batched-token=16384 + - --gpu-memory-utilization=0.9 + minReplicas: 0 + maxReplicas: 3 + resourceProfile: L4:1 +``` + +### Listing Models + +You can view all installed models through the Kubernetes API using `kubectl get models` (use the `-o yaml` flag for more details). + +You can also list all models via the OpenAI-compatible `/v1/models` endpoint: + +```bash +curl http://your-deployed-kubeai-endpoint/openai/v1/models +``` + +### Installing a predefined Model using Helm + +When you are defining your Helm values, you can install a predefined Model by setting `enabled: true`: + +```yaml +models: + catalog: + llama-3.1-8b-instruct-fp8-l4: + enabled: true +``` + +You can also optionally override settings for a given model: + +```yaml +models: + catalog: + llama-3.1-8b-instruct-fp8-l4: + enabled: true + env: + MY_CUSTOM_ENV_VAR: "some-value" +``` + +### Adding Custom Models + +You can add your own model by defining a Model yaml file and applying it using `kubectl apply -f model.yaml`. + +If you have a running cluster with KubeAI installed you can inspect the schema for a Model using `kubectl explain`: + +```bash +kubectl explain models +kubectl explain models.spec +kubectl explain models.spec.engine +``` + +### Model Management UI + +We are considering adding a UI for managing models in a running KubeAI instance. Give the [GitHub Issue](https://github.com/substratusai/kubeai/issues/148) a thumbs up if you would be interested in this feature. \ No newline at end of file