Skip to content

Commit

Permalink
[LLM] better format of vllm openai api README (#2440)
Browse files Browse the repository at this point in the history
* [LLM] better format of vllm openai api README

* Fix the way to fetch IP

* install fschat for chat completion

* Add missing dependency

* Update readme
  • Loading branch information
Michaelvll authored Aug 22, 2023
1 parent 0249308 commit f890269
Show file tree
Hide file tree
Showing 3 changed files with 66 additions and 18 deletions.
64 changes: 53 additions & 11 deletions llm/vllm/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,25 +51,27 @@ sky launch -c vllm-llama2 serving-openai-api.yaml
2. Check the IP for the cluster with:
```
sky status -a
# Or get the IP with Python API:
IP=$(python -c "import sky; print(sky.status('vllm-llama2')[0]['handle'].head_ip)")
```
3. You can now use the OpenAI API to interact with the model.
- Query the models hosted on the cluster:
```bash
curl http://<IP>:8000/v1/models
curl http://$IP:8000/v1/models
```
- Query a model with input prompts:
- Query a model with input prompts for text completion:
```bash
curl http://<IP>:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
curl http://$IP:8000/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"prompt": "San Francisco is a",
"max_tokens": 7,
"temperature": 0
}'
```
You should get a similar response as the following:
```
```console
{
"id":"cmpl-50a231f7f06a4115a1e4bd38c589cd8f",
"object":"text_completion","created":1692427390,
Expand All @@ -81,4 +83,44 @@ curl http://<IP>:8000/v1/completions \
}],
"usage":{"prompt_tokens":5,"total_tokens":12,"completion_tokens":7}
}
```
- Query a model with input prompts for chat completion:
```bash
curl http://$IP:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "meta-llama/Llama-2-7b-chat-hf",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Who are you?"
}
]
}'
```
You should get a similar response as the following:
```console
{
"id": "cmpl-879a58992d704caf80771b4651ff8cb6",
"object": "chat.completion",
"created": 1692650569,
"model": "meta-llama/Llama-2-7b-chat-hf",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": " Hello! I'm just an AI assistant, here to help you"
},
"finish_reason": "length"
}],
"usage": {
"prompt_tokens": 31,
"total_tokens": 47,
"completion_tokens": 16
}
}
```
11 changes: 7 additions & 4 deletions llm/vllm/serve-openai-api.yaml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
envs:
MODEL_NAME: meta-llama/Llama-2-7b-chat-hf
HF_TOKEN: <your-huggingface-token> # Change to your own huggingface token

resources:
accelerators: L4:1
ports:
- 8000

envs:
MODEL_NAME: meta-llama/Llama-2-7b-chat-hf
HF_TOKEN: <your-huggingface-token> # Change to your own huggingface token

setup: |
conda activate vllm
if [ $? -ne 0 ]; then
Expand All @@ -15,6 +15,9 @@ setup: |
fi
git clone https://github.com/vllm-project/vllm.git || true
# Install fschat and accelerate for chat completion
pip install fschat
pip install accelerate
cd vllm
pip list | grep vllm || pip install .
Expand Down
9 changes: 6 additions & 3 deletions llm/vllm/serve.yaml
Original file line number Diff line number Diff line change
@@ -1,9 +1,9 @@
resources:
accelerators: A100-80GB:8

envs:
MODEL_NAME: decapoda-research/llama-65b-hf

resources:
accelerators: A100-80GB:8

setup: |
conda activate vllm
if [ $? -ne 0 ]; then
Expand All @@ -12,6 +12,9 @@ setup: |
fi
git clone https://github.com/vllm-project/vllm.git || true
# Install fschat and accelerate for chat completion
pip install fschat
pip install accelerate
cd vllm
pip list | grep vllm || pip install .
Expand Down

0 comments on commit f890269

Please sign in to comment.