Skip to content

Commit

Permalink
[LLM] Add Qwen2-VL multimodal example (#3961)
Browse files Browse the repository at this point in the history
Add multimodal example
  • Loading branch information
Michaelvll authored Sep 19, 2024
1 parent 3871de9 commit d602225
Show file tree
Hide file tree
Showing 2 changed files with 65 additions and 0 deletions.
29 changes: 29 additions & 0 deletions llm/qwen/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,35 @@ curl http://$ENDPOINT/v1/chat/completions \
}' | jq -r '.choices[0].message.content'
```

## Running Multimodal Qwen2-VL


1. Start serving Qwen2-VL:

```console
sky launch -c qwen2-vl qwen2-vl-7b.yaml
```
2. Send a multimodalrequest to the endpoint for completion:
```bash
ENDPOINT=$(sky status --endpoint 8000 qwen2-vl)

curl http://$ENDPOINT/v1/chat/completions \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer token' \
--data '{
"model": "Qwen/Qwen2-VL-7B-Instruct",
"messages": [
{
"role": "user",
"content": [
{"type" : "text", "text": "Covert this logo to ASCII art"},
{"type": "image_url", "image_url": {"url": "https://pbs.twimg.com/profile_images/1584596138635632640/HWexMoH5_400x400.jpg"}}
]
}],
"max_tokens": 1024
}' | jq .
```

## Scale up the service with SkyServe

1. With [SkyPilot Serving](https://skypilot.readthedocs.io/en/latest/serving/sky-serve.html), a serving library built on top of SkyPilot, scaling up the Qwen service is as simple as running:
Expand Down
36 changes: 36 additions & 0 deletions llm/qwen/qwen2-vl-7b.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
envs:
MODEL_NAME: Qwen/Qwen2-VL-7B-Instruct

service:
# Specifying the path to the endpoint to check the readiness of the replicas.
readiness_probe:
path: /v1/chat/completions
post_data:
model: $MODEL_NAME
messages:
- role: user
content: Hello! What is your name?
max_tokens: 1
initial_delay_seconds: 1200
# How many replicas to manage.
replicas: 2


resources:
accelerators: {L4, A10g, A10, L40, A40, A100, A100-80GB}
disk_tier: best
ports: 8000

setup: |
# Install later transformers version for the support of
# qwen2_vl support
pip install git+https://github.com/huggingface/transformers@21fac7abba2a37fae86106f87fcf9974fd1e3830
pip install vllm==0.6.1.post2
pip install vllm-flash-attn
run: |
export PATH=$PATH:/sbin
vllm serve $MODEL_NAME \
--host 0.0.0.0 \
--tensor-parallel-size $SKYPILOT_NUM_GPUS_PER_NODE \
--max-model-len 2048 | tee ~/openai_api_server.log

0 comments on commit d602225

Please sign in to comment.