From 7b8c9d8dcc1f604137041c86fb9f0a845dc5fa85 Mon Sep 17 00:00:00 2001
From: Nick Stogner <nstogner@users.noreply.github.com>
Date: Thu, 29 Aug 2024 23:11:52 -0400
Subject: [PATCH] Add bare bones doc about managing models (#149)

Helps clarify #145
---
 README.md                |  1 +
 docs/model-management.md | 72 ++++++++++++++++++++++++++++++++++++++++
 2 files changed, 73 insertions(+)
 create mode 100644 docs/model-management.md

diff --git a/README.md b/README.md
index 7d8959a1..7ce3675e 100644
--- a/README.md
+++ b/README.md
@@ -108,6 +108,7 @@ Any vLLM or Ollama model can be served by KubeAI. Some examples of popular model
 ## Guides
 
 * [Cloud Installation](./docs/cloud-install.md) - Deploy on Kubernetes clusters in the cloud
+* [Model Management](./docs/model-management.md) - Manage ML models
 
 ## OpenAI API Compatibility
 
diff --git a/docs/model-management.md b/docs/model-management.md
new file mode 100644
index 00000000..47c12655
--- /dev/null
+++ b/docs/model-management.md
@@ -0,0 +1,72 @@
+# Model Management
+
+KubeAI uses Model [Custom Resources](https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/) to configure what ML models are available in the system.
+
+Example:
+
+```yaml
+apiVersion: kubeai.org/v1
+kind: Model
+metadata:
+  name: llama-3.1-8b-instruct-fp8-l4
+spec:
+  features: ["TextGeneration"]
+  owner: neuralmagic
+  url: hf://neuralmagic/Meta-Llama-3.1-8B-Instruct-FP8
+  engine: VLLM
+  args:
+    - --max-model-len=16384
+    - --max-num-batched-token=16384
+    - --gpu-memory-utilization=0.9
+  minReplicas: 0
+  maxReplicas: 3
+  resourceProfile: L4:1
+```
+
+### Listing Models
+
+You can view all installed models through the Kubernetes API using `kubectl get models` (use the `-o yaml` flag for more details).
+
+You can also list all models via the OpenAI-compatible `/v1/models` endpoint:
+
+```bash
+curl http://your-deployed-kubeai-endpoint/openai/v1/models
+```
+
+### Installing a predefined Model using Helm
+
+When you are defining your Helm values, you can install a predefined Model by setting `enabled: true`:
+
+```yaml
+models:
+  catalog:
+    llama-3.1-8b-instruct-fp8-l4:
+      enabled: true
+```
+
+You can also optionally override settings for a given model:
+
+```yaml
+models:
+  catalog:
+    llama-3.1-8b-instruct-fp8-l4:
+      enabled: true
+      env:
+        MY_CUSTOM_ENV_VAR: "some-value"
+```
+
+### Adding Custom Models
+
+You can add your own model by defining a Model yaml file and applying it using `kubectl apply -f model.yaml`.
+
+If you have a running cluster with KubeAI installed you can inspect the schema for a Model using `kubectl explain`:
+
+```bash
+kubectl explain models
+kubectl explain models.spec
+kubectl explain models.spec.engine
+```
+
+### Model Management UI
+
+We are considering adding a UI for managing models in a running KubeAI instance. Give the [GitHub Issue](https://github.com/substratusai/kubeai/issues/148) a thumbs up if you would be interested in this feature.
\ No newline at end of file