Update Filestore docs

substratusai · Oct 17, 2024 · 06873fc · 06873fc
1 parent 3645588
commit 06873fc
Show file tree

Hide file tree

Showing 4 changed files with 61 additions and 8 deletions.
diff --git a/charts/kubeai/templates/role.yaml b/charts/kubeai/templates/role.yaml
@@ -9,10 +9,10 @@ rules:
   - ""
   resources:
   - pods
-  - persistentvolumeclaims
   verbs:
   - create
   - delete
+  - deletecollection
   - get
   - list
   - patch
@@ -25,6 +25,19 @@ rules:
   verbs:
   - create
   - delete
+  - deletecollection
+  - get
+  - list
+  - patch
+  - update
+  - watch
+- apiGroups:
+  - ""
+  resources:
+  - persistentvolumeclaims
+  verbs:
+  - create
+  - delete
   - get
   - list
   - patch

diff --git a/docs/how-to/cache-models-with-gcp-filestore.md b/docs/how-to/cache-models-with-gcp-filestore.md
@@ -15,21 +15,34 @@ gcloud services enable file.googleapis.com
 
 Apply a Model with the cache profile set to `standard-filestore` (defined in the reference [GKE Helm values file](https://github.com/substratusai/kubeai/blob/main/charts/kubeai/values-gke.yaml)).
 
+<details markdown="1">
+<summary>TIP: If you want to use `premium-filestore` you will need to ensure you have quota.</summary>
+Open the cloud console quotas page: https://console.cloud.google.com/iam-admin/quotas. Make sure your project is selected in the top left.
+
+Ensure that you have at least 2.5Tb of `PremiumStorageGbPerRegion` quota in the region where your cluster is deployed.
+
+![Premium Storage Quota Screenshot](../screenshots/gcp-quota-premium-storage-gb-per-region)
+
+</details>
+<br>
+
 NOTE: If you already installed the models chart, you will need to edit you values file and run `helm upgrade`.
 
 ```bash
-helm install kubeai-models $REPO_DIR/charts/models -f - <<EOF
+helm install kubeai-models kubeai/models -f - <<EOF
 catalog:
-  opt-125m-cpu:
+  llama-3.1-8b-instruct-fp8-l4:
     enabled: true
     cacheProfile: standard-filestore
+  llama-3.1-8b-instruct-fp8-l4-nocache:
+    enabled: true
 EOF
 ```
 
-Wait for the Model to be fully cached.
+Wait for the Model to be fully cached. This may take a while if the Filestore instance needs to be created.
 
 ```bash
-kubectl wait --timeout 10m --for=jsonpath='{.status.cache.loaded}'=true model/opt-125m-cpu
+kubectl wait --timeout 10m --for=jsonpath='{.status.cache.loaded}'=true model/llama-3.1-8b-instruct-fp8-l4
 ```
 
 This model will now be loaded from Filestore when it is served.
@@ -44,14 +57,41 @@ Ensure that the Filestore CSI driver is enabled by checking for the existance of
 kubectl get storageclass standard-rwx premium-rwx
 ```
 
-### PersistentVolumeClaim
+### PersistentVolumes
 
 Check the PersistentVolumeClaim (that should be created by KubeAI).
 
 ```bash
 kubectl describe pvc shared-model-cache-
 ```
 
+<details markdown="1">
+<summary>Example: Out-of-quota error</summary>
+```
+  Warning  ProvisioningFailed    11m (x26 over 21m)  filestore.csi.storage.gke.io_gke-50826743a27a4d52bf5b-7fac-9607-vm_b4bdb2ec-b58b-4363-adec-15c270a14066  failed to provision volume with StorageClass "premium-rwx": rpc error: code = ResourceExhausted desc = googleapi: Error 429: Quota limit 'PremiumStorageGbPerRegion' has been exceeded. Limit: 0 in region us-central1.
+Details:
+[
+  {
+    "@type": "type.googleapis.com/google.rpc.QuotaFailure",
+    "violations": [
+      {
+        "description": "Quota 'PremiumStorageGbPerRegion' exhausted. Limit 0 in region us-central1",
+        "subject": "project:819220466562"
+      }
+    ]
+  }
+]
+```
+</details>
+
+Check to see if the PersistentVolume has been fully provisioned.
+
+```bash
+kubectl get pv
+# Find name of corresponding pv...
+kubectl describe pv <name>
+```
+
 ### Model Loading Job
 
 Check to see if there is an ongoing model loader Job.

diff --git a/docs/screenshots/gcp-quota-premium-storage-gb-per-region.png b/docs/screenshots/gcp-quota-premium-storage-gb-per-region.png
diff --git a/skaffold.yaml b/skaffold.yaml
@@ -43,8 +43,8 @@ profiles:
 
 - name: kubeai-only-gke
   build:
-    artifacts:
-    - image: substratusai/kubeai
+    local:
+      push: true
   deploy:
     helm:
       releases: