Skip to content

Commit

Permalink
Autoscale based on KubeAI OpenTelemetry active requests metrics (#261)
Browse files Browse the repository at this point in the history
* Replace existing controller metrics with OpenTelemetric metrics using
Prometheus `/metrics` endpoint
* Include common http otel metrics
* Update autoscaling loop to scrape KubeAI instance instead of backend
instances (all the info is contains in KubeAI instances)
* Update docs and diagrams
* Refactor `modelresolver` package to be named `endpoints` b/c it now
also tracks KubeAI server endpoints
* Modify integration test cases to run with unique system configurations
* Make leader election durations configurable at the system level to
facilitate expedient tests
* Add integration test that simulates autoscaling calcs with multiple
KubeAI replicas
* Run `go mod tidy`
* Fixes #123 
* Fixes #237
  • Loading branch information
nstogner authored Oct 4, 2024
1 parent d160077 commit 884aa62
Show file tree
Hide file tree
Showing 38 changed files with 993 additions and 435 deletions.
4 changes: 0 additions & 4 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,10 +101,6 @@ Now open your browser to [localhost:8000](http://localhost:8000) and select the

If you go back to the browser and start a chat with Qwen2, you will notice that it will take a while to respond at first. This is because we set `minReplicas: 0` for this model and KubeAI needs to spin up a new Pod (you can verify with `kubectl get models -oyaml qwen2-500m-cpu`).

NOTE: Autoscaling after initial scale-from-zero is not yet supported for the Ollama backend which we use in this local quickstart. KubeAI relies upon backend-specific metrics and the Ollama project has an open issue: https://github.com/ollama/ollama/issues/3144. To see autoscaling in action, checkout the [GKE install guide](./installation/gke.md) which uses the vLLM backend and autoscales across GPU resources.



## Documentation

Checkout our documentation on [kubeai.org](https://www.kubeai.org) to find info on:
Expand Down
2 changes: 1 addition & 1 deletion docs/concepts/autoscaling.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# Autoscaling

KubeAI proxies HTTP and messaging (i.e. Kafka, etc) requests and messages to models. It will adjust the number Pods serving a given model based on metrics reported by those servers. If no Pods are running when a request comes in, KubeAI will hold the request, scale up a Pod and forward the request when the Pod is ready. This process happens in a manner that is transparent to the end client (other than the added delay from a cold-start).
KubeAI proxies HTTP and messaging (i.e. Kafka, etc) requests and messages to models. It will adjust the number Pods serving a given model based on the average active number of requests. If no Pods are running when a request comes in, KubeAI will hold the request, scale up a Pod and forward the request when the Pod is ready. This process happens in a manner that is transparent to the end client (other than the added delay from a cold-start).

<br>
<img src="/diagrams/autoscaling.excalidraw.png" width="90%"></img>
Expand Down
Binary file modified docs/diagrams/autoscaling.excalidraw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
42 changes: 18 additions & 24 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,17 +3,28 @@ module github.com/substratusai/kubeai
go 1.22.0

require (
github.com/go-playground/validator/v10 v10.22.0
github.com/google/uuid v1.6.0
github.com/onsi/ginkgo/v2 v2.17.1
github.com/onsi/gomega v1.32.0
github.com/prometheus/client_golang v1.19.1
github.com/prometheus/client_model v0.5.0
github.com/prometheus/common v0.48.0
github.com/prometheus/client_golang v1.20.3
github.com/prometheus/client_model v0.6.1
github.com/prometheus/common v0.59.1
github.com/stretchr/testify v1.9.0
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.55.0
go.opentelemetry.io/otel v1.30.0
go.opentelemetry.io/otel/exporters/prometheus v0.52.0
go.opentelemetry.io/otel/exporters/stdout/stdoutlog v0.6.0
go.opentelemetry.io/otel/exporters/stdout/stdouttrace v1.30.0
go.opentelemetry.io/otel/metric v1.30.0
go.opentelemetry.io/otel/sdk v1.30.0
go.opentelemetry.io/otel/sdk/log v0.6.0
go.opentelemetry.io/otel/sdk/metric v1.30.0
gocloud.dev v0.39.0
gocloud.dev/pubsub/kafkapubsub v0.39.0
gocloud.dev/pubsub/natspubsub v0.39.0
gocloud.dev/pubsub/rabbitpubsub v0.39.0
golang.org/x/exp v0.0.0-20220722155223-a9213eeb770e
k8s.io/api v0.30.1
k8s.io/apimachinery v0.30.1
k8s.io/client-go v0.30.1
Expand All @@ -36,7 +47,6 @@ require (
github.com/Azure/go-amqp v1.0.5 // indirect
github.com/AzureAD/microsoft-authentication-library-for-go v1.2.2 // indirect
github.com/IBM/sarama v1.43.3 // indirect
github.com/antlr/antlr4/runtime/Go/antlr/v4 v4.0.0-20230305170008-8188dc5388df // indirect
github.com/aws/aws-sdk-go v1.55.5 // indirect
github.com/aws/aws-sdk-go-v2 v1.30.3 // indirect
github.com/aws/aws-sdk-go-v2/config v1.27.27 // indirect
Expand All @@ -54,8 +64,6 @@ require (
github.com/aws/aws-sdk-go-v2/service/sts v1.30.3 // indirect
github.com/aws/smithy-go v1.20.3 // indirect
github.com/beorn7/perks v1.0.1 // indirect
github.com/blang/semver/v4 v4.0.0 // indirect
github.com/cenkalti/backoff/v4 v4.2.1 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/eapache/go-resiliency v1.7.0 // indirect
Expand All @@ -74,14 +82,12 @@ require (
github.com/go-openapi/swag v0.22.3 // indirect
github.com/go-playground/locales v0.14.1 // indirect
github.com/go-playground/universal-translator v0.18.1 // indirect
github.com/go-playground/validator/v10 v10.22.0 // indirect
github.com/go-task/slim-sprig v0.0.0-20230315185526-52ccab3ef572 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt/v5 v5.2.1 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/golang/snappy v0.0.4 // indirect
github.com/google/cel-go v0.17.8 // indirect
github.com/google/gnostic-models v0.6.8 // indirect
github.com/google/go-cmp v0.6.0 // indirect
github.com/google/gofuzz v1.2.0 // indirect
Expand All @@ -90,7 +96,6 @@ require (
github.com/google/wire v0.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.13.0 // indirect
github.com/grpc-ecosystem/grpc-gateway/v2 v2.16.0 // indirect
github.com/hashicorp/errwrap v1.1.0 // indirect
github.com/hashicorp/go-multierror v1.1.1 // indirect
github.com/hashicorp/go-uuid v1.0.3 // indirect
Expand All @@ -117,29 +122,21 @@ require (
github.com/pkg/browser v0.0.0-20240102092130-5ac0b6a4141c // indirect
github.com/pkg/errors v0.9.1 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
github.com/prometheus/procfs v0.12.0 // indirect
github.com/prometheus/procfs v0.15.1 // indirect
github.com/rabbitmq/amqp091-go v1.10.0 // indirect
github.com/rcrowley/go-metrics v0.0.0-20201227073835-cf1acfcdf475 // indirect
github.com/spf13/pflag v1.0.5 // indirect
github.com/stoewer/go-strcase v1.2.0 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.53.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.53.0 // indirect
go.opentelemetry.io/otel v1.29.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace v1.19.0 // indirect
go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracegrpc v1.19.0 // indirect
go.opentelemetry.io/otel/metric v1.29.0 // indirect
go.opentelemetry.io/otel/sdk v1.29.0 // indirect
go.opentelemetry.io/otel/trace v1.29.0 // indirect
go.opentelemetry.io/proto/otlp v1.0.0 // indirect
go.opentelemetry.io/otel/log v0.6.0 // indirect
go.opentelemetry.io/otel/trace v1.30.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/crypto v0.26.0 // indirect
golang.org/x/exp v0.0.0-20220722155223-a9213eeb770e // indirect
golang.org/x/net v0.28.0 // indirect
golang.org/x/oauth2 v0.22.0 // indirect
golang.org/x/sync v0.8.0 // indirect
golang.org/x/sys v0.24.0 // indirect
golang.org/x/sys v0.25.0 // indirect
golang.org/x/term v0.23.0 // indirect
golang.org/x/text v0.17.0 // indirect
golang.org/x/time v0.6.0 // indirect
Expand All @@ -156,11 +153,8 @@ require (
gopkg.in/yaml.v2 v2.4.0 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
k8s.io/apiextensions-apiserver v0.30.1 // indirect
k8s.io/apiserver v0.30.1 // indirect
k8s.io/component-base v0.30.1 // indirect
k8s.io/klog/v2 v2.120.1 // indirect
k8s.io/kube-openapi v0.0.0-20240228011516-70dd3763d340 // indirect
sigs.k8s.io/apiserver-network-proxy/konnectivity-client v0.29.0 // indirect
sigs.k8s.io/json v0.0.0-20221116044647-bc3834ca7abd // indirect
sigs.k8s.io/structured-merge-diff/v4 v4.4.1 // indirect
)
Loading

0 comments on commit 884aa62

Please sign in to comment.