Skip to content

Commit

Permalink
Update TGI CPU image to latest official release 2.4.0
Browse files Browse the repository at this point in the history
Signed-off-by: lvliang-intel <liang1.lv@intel.com>
  • Loading branch information
lvliang-intel committed Oct 28, 2024
1 parent bc47930 commit a6d08fb
Show file tree
Hide file tree
Showing 39 changed files with 42 additions and 42 deletions.
2 changes: 1 addition & 1 deletion AudioQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "3006:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ services:
https_proxy: ${https_proxy}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "3006:80"
Expand Down
2 changes: 1 addition & 1 deletion AudioQnA/kubernetes/intel/cpu/xeon/manifest/audioqna.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -247,7 +247,7 @@ spec:
- envFrom:
- configMapRef:
name: audio-qna-config
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
name: llm-dependency-deploy-demo
securityContext:
capabilities:
Expand Down
2 changes: 1 addition & 1 deletion AvatarChatbot/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@ services:
environment:
TTS_ENDPOINT: ${TTS_ENDPOINT}
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "3006:80"
Expand Down
4 changes: 2 additions & 2 deletions ChatQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ For users in China who are unable to download models directly from Huggingface,
export HF_TOKEN=${your_hf_token}
export HF_ENDPOINT="https://hf-mirror.com"
model_name="Intel/neural-chat-7b-v3-3"
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu --model-id $model_name
docker run -p 8008:80 -v ./data:/data --name tgi-service -e HF_ENDPOINT=$HF_ENDPOINT -e http_proxy=$http_proxy -e https_proxy=$https_proxy --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id $model_name
```

2. Offline
Expand All @@ -209,7 +209,7 @@ For users in China who are unable to download models directly from Huggingface,
```bash
export HF_TOKEN=${your_hf_token}
export model_path="/path/to/model"
docker run -p 8008:80 -v $model_path:/data --name tgi_service --shm-size 1g ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu --model-id /data
docker run -p 8008:80 -v $model_path:/data --name tgi_service --shm-size 1g ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu --model-id /data
```

### Setup Environment Variables
Expand Down
2 changes: 1 addition & 1 deletion ChatQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ services:
HF_HUB_ENABLE_HF_TRANSFER: 0
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "9009:80"
Expand Down
2 changes: 1 addition & 1 deletion ChatQnA/docker_compose/intel/cpu/xeon/compose_qdrant.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ services:
HF_HUB_ENABLE_HF_TRANSFER: 0
command: --model-id ${RERANK_MODEL_ID} --auto-truncate
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "6042:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "9009:80"
Expand Down
2 changes: 1 addition & 1 deletion ChatQnA/kubernetes/intel/README_gmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ The ChatQnA uses the below prebuilt images if you choose a Xeon deployment
- tei_embedding_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- retriever: opea/retriever-redis:latest
- tei_xeon_service: ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
- tgi-service: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
- tgi-service: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
- chaqna-xeon-backend-server: opea/chatqna:latest

Should you desire to use the Gaudi accelerator, two alternate images are used for the embedding and llm services.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -1100,7 +1100,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down Expand Up @@ -1180,7 +1180,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion ChatQnA/kubernetes/intel/cpu/xeon/manifest/chatqna.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -922,7 +922,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -925,7 +925,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion ChatQnA/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function build_docker_images() {
service_list="chatqna chatqna-ui chatqna-conversation-ui dataprep-redis retriever-redis nginx"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5

docker images && sleep 1s
Expand Down
2 changes: 1 addition & 1 deletion CodeGen/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "8028:80"
Expand Down
2 changes: 1 addition & 1 deletion CodeGen/kubernetes/intel/cpu/xeon/manifest/codegen.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ spec:
- name: no_proxy
value:
securityContext: {}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion CodeGen/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function build_docker_images() {
service_list="codegen codegen-ui llm-tgi"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}

Expand Down
2 changes: 1 addition & 1 deletion CodeTrans/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: codetrans-tgi-service
ports:
- "8008:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion CodeTrans/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function build_docker_images() {
service_list="codetrans codetrans-ui llm-tgi nginx"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}

Expand Down
2 changes: 1 addition & 1 deletion DocSum/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "8008:80"
Expand Down
2 changes: 1 addition & 1 deletion DocSum/kubernetes/intel/README_gmc.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ Install GMC in your Kubernetes cluster, if you have not already done so, by foll
The DocSum application is defined as a Custom Resource (CR) file that the above GMC operator acts upon. It first checks if the microservices listed in the CR yaml file are running, if not it starts them and then proceeds to connect them. When the DocSum RAG pipeline is ready, the service endpoint details are returned, letting you use the application. Should you use "kubectl get pods" commands you will see all the component microservices, in particular embedding, retriever, rerank, and llm.

The DocSum pipeline uses prebuilt images. The Xeon version uses the prebuilt image `llm-docsum-tgi:latest` which internally leverages the
the image `ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu`. The service is called tgi-svc. Meanwhile, the Gaudi version launches the
the image `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`. The service is called tgi-svc. Meanwhile, the Gaudi version launches the
service tgi-gaudi-svc, which uses the image `ghcr.io/huggingface/tgi-gaudi:2.0.5`. Both TGI model services serve the model specified in the LLM_MODEL_ID variable that is exported by you. In the below example we use `Intel/neural-chat-7b-v3-3`.

[NOTE]
Expand Down
2 changes: 1 addition & 1 deletion DocSum/kubernetes/intel/cpu/xeon/manifest/docsum.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -404,7 +404,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ spec:
- name: no_proxy
value:
securityContext: {}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion FaqGen/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-xeon-server
ports:
- "8008:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,7 +126,7 @@ spec:
- name: no_proxy
value:
securityContext: {}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -993,7 +993,7 @@ spec:
name: chatqna-tgi-config
securityContext:
{}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ spec:
name: codegen-tgi-config
securityContext:
{}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -229,7 +229,7 @@ spec:
name: docsum-tgi-config
securityContext:
{}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -138,7 +138,7 @@ spec:
- configMapRef:
name: faqgen-tgi-config
securityContext: {}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion SearchQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ services:
HUGGINGFACEHUB_API_TOKEN: ${HUGGINGFACEHUB_API_TOKEN}
restart: unless-stopped
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "3006:80"
Expand Down
2 changes: 1 addition & 1 deletion SearchQnA/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ function build_docker_images() {
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-embeddings-inference:cpu-1.5
docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}

Expand Down
2 changes: 1 addition & 1 deletion Translation/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-service
ports:
- "8008:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -361,7 +361,7 @@ spec:
runAsUser: 1000
seccompProfile:
type: RuntimeDefault
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion Translation/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ function build_docker_images() {
service_list="translation translation-ui llm-tgi nginx"
docker compose -f build.yaml build ${service_list} --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}

Expand Down
4 changes: 2 additions & 2 deletions VisualQnA/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,12 +67,12 @@ docker build --no-cache -t opea/visualqna-ui:latest --build-arg https_proxy=$htt
### 4. Pull TGI Xeon Image

```bash
docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
```

Then run the command `docker images`, you will have the following 5 Docker Images:

1. `ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu`
1. `ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu`
2. `opea/lvm-tgi:latest`
3. `opea/visualqna:latest`
4. `opea/visualqna-ui:latest`
Expand Down
2 changes: 1 addition & 1 deletion VisualQnA/docker_compose/intel/cpu/xeon/compose.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

services:
llava-tgi-service:
image: ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
image: ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
container_name: tgi-llava-xeon-server
ports:
- "8399:80"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -216,7 +216,7 @@ spec:
name: visualqna-tgi-config
securityContext:
{}
image: "ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu"
image: "ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu"
imagePullPolicy: IfNotPresent
volumeMounts:
- mountPath: /data
Expand Down
2 changes: 1 addition & 1 deletion VisualQnA/tests/test_compose_on_xeon.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ function build_docker_images() {
echo "Build all the images with --no-cache, check docker_image_build.log for details..."
docker compose -f build.yaml build --no-cache > ${LOG_PATH}/docker_image_build.log

docker pull ghcr.io/huggingface/text-generation-inference:sha-e4201f4-intel-cpu
docker pull ghcr.io/huggingface/text-generation-inference:2.4.0-intel-cpu
docker images && sleep 1s
}

Expand Down

0 comments on commit a6d08fb

Please sign in to comment.