-
Notifications
You must be signed in to change notification settings - Fork 37
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Finish resource profile functionality and add doc
- Loading branch information
Showing
7 changed files
with
129 additions
and
41 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,51 @@ | ||
# Resource Profiles | ||
|
||
A resource profile maps a type of compute resource (i.e. NVIDIA L4 GPU) to a collection of Kubernetes settings that are set on inference server Pods. These profiles are defined in the KubeAI `config.yaml` file (via a ConfigMap). Each model specifies the resource profile that it requires. | ||
|
||
Kubernetes Model resources specify the resource profile and the count of that resource that they require: | ||
|
||
```yaml | ||
# model.yaml | ||
apiVersion: kubeai.org/v1 | ||
kind: Model | ||
metadata: | ||
name: llama-3.1-8b-instruct-fp8-l4 | ||
spec: | ||
engine: VLLM | ||
resourceProfile: NVIDIA_GPU_L4:1 # Specified at <profile>:<count> | ||
# ... | ||
``` | ||
A given profile might need to contain slightly different settings based on the cluster/cloud that KubeAI is deployed on. | ||
|
||
Example: A resource profile named `NVIDIA_GPU_L4` might contain the following settings on a GKE Kubernetes cluster: | ||
|
||
```yaml | ||
# KubeAI config.yaml | ||
resourceProfiles: | ||
NVIDIA_GPU_L4: | ||
limits: | ||
# Typical across most Kubernetes clusters: | ||
nvidia.com/gpu: "1" | ||
requests: | ||
nvidia.com/gpu: "1" | ||
nodeSelector: | ||
# Specific to GKE: | ||
cloud.google.com/gke-accelerator: "nvidia-l4" | ||
cloud.google.com/gke-spot: "true" | ||
imageName: "nvidia-gpu" | ||
``` | ||
In addition to node selectors and resource requirements, a resource profile may optionally specify an image name. This name maps to the container image that will be selected when serving a model on that resource: | ||
```yaml | ||
# KubeAI config.yaml | ||
modelServers: | ||
VLLM: | ||
images: | ||
default: "vllm/vllm-openai:v0.5.5" | ||
nvidia-gpu: "vllm/vllm-openai:v0.5.5" # <-- | ||
cpu: "vllm/vllm-openai-cpu:v0.5.5" | ||
OLlama: | ||
images: | ||
# ... | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
models: | ||
catalog: | ||
llama-3.1-8b-instruct-fp8-l4: | ||
enabled: true | ||
|
||
resourceProfiles: | ||
NVIDIA_GPU_L4: | ||
nodeSelector: | ||
cloud.google.com/gke-accelerator: "nvidia-l4" | ||
cloud.google.com/gke-spot: "true" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters