-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
1 changed file
with
14 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,3 +1,15 @@ | ||
Docs for setting up a demo cluster can be found under the demo directory | ||
Docs for setting up a demo cluster can be found ![here](https://github.com/opendatahub-io/caikit-tgis-serving/tree/main/demo/kserve) | ||
|
||
The cluster was set up using a single GPU, but could be set up in other configurations | ||
Caikit-tgis-serving is a combined image that allows users to perform LLM inference | ||
|
||
The architecture is shown here: | ||
|
||
![KServe+Knative+Istio+Caikit_TGIS Diagram](https://github.com/opendatahub-io/caikit-tgis-serving/assets/8479010/7009b95d-0f6f-4f18-b0e6-355f360a5ad1) | ||
|
||
There are several components: | ||
TGIS: Serving backend, loads the models and provides the inference engine | ||
Caikit: Wrapper layer, handles the lifecycle of the TGIS process, provides the inference endpoints, and has modules to handle different model types | ||
Caikit-nlp: Caikit module that handles NLP style models | ||
KServe: Orchestrates model serving for all types of models, servingruntimes implement loading given types of model servers. KServe handles the lifecycle of the deployment object, storage access, networking setup, etc | ||
Service Mesh (istio): Service mesh networking layer, manages traffic flows, enforces access policies, etc | ||
Serverless (knative): Allows for serverless deployments of models |