Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Usage]: Are prompts processed sequentially? #9695

Open
1 task done
nishadsinghi opened this issue Oct 25, 2024 · 1 comment
Open
1 task done

[Usage]: Are prompts processed sequentially? #9695

nishadsinghi opened this issue Oct 25, 2024 · 1 comment
Labels
usage How to use vllm

Comments

@nishadsinghi
Copy link

Your current environment

Running `python collect_env.py` throws this error:

Collecting environment information...
Traceback (most recent call last):
File "path/collect_env.py", line 743, in
main()
File patcollect_env.py", line 722, in main
output = get_pretty_env_info()
^^^^^^^^^^^^^^^^^^^^^
File "path/collect_env.py", line 717, in get_pretty_env_info
return pretty_str(get_env_info())
^^^^^^^^^^^^^^
File "path/collect_env.py", line 549, in get_env_info
vllm_version = get_vllm_version()
^^^^^^^^^^^^^^^^^^
File "path/collect_env.py", line 270, in get_vllm_version
from vllm import version, version_tuple
ImportError: cannot import name 'version_tuple' from 'vllm' (/path/lib/python3.11/site-packages/vllm/init.py)

How would you like to use vllm

I am running Llama 3 8B Instruct using vLLM as follows:

llm = LLM(model=config.model, max_logprobs=1000)
sampling_params = SamplingParams(temperature=config.temperature,
                                 max_tokens=config.max_tokens,
                                 n=1,
                                 stop=config.stop_strings,
                                 logprobs=config.logprobs,
                                 skip_special_tokens=config.skip_special_tokens,
                                 top_k=config.top_k
                                 ) 
outputs = llm.generate(prompts, sampling_params)

In the output, I notice this:
`Processed prompts: 0%| | 0/64 [00:00<?, ?it/s, est. speed input: 0.00 toks/s, output: 0.00 toks/s]�[A

Processed prompts: 2%|▏ | 1/64 [00:09<10:29, 10.00s/it, est. speed input: 157.27 toks/s, output: 8.70 toks/s]�[A

Processed prompts: 3%|▎ | 2/64 [00:10<04:31, 4.37s/it, est. speed input: 253.25 toks/s, output: 17.35 toks/s]�[A

Processed prompts: 5%|▍ | 3/64 [00:11<02:54, 2.86s/it, est. speed input: 357.92 toks/s, output: 24.91 toks/s]�[A
`

which is a bit confusing to me -- my batch size is 64, and this seems to suggest that prompts are being processed sequentially. Shouldn't the entire batch be processed all at once? And if so, then why do we see this progress bar?
In general, it would be great if someone could explain what this progress bar is for. Is it tokenizing the prompts one at a time?

I have also noticed that when I increase the batch size from 64 to 128, the time per batch also becomes roughly 2x, which again is a bit confusing because the time per batch shouldn't change as long as the batch can fit into memory, right?

Both these observations make me wonder if there is a performance bottleneck that I could resolve and get a speedup. Any help/ pointers would be greatly appreciated! :)

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
@nishadsinghi nishadsinghi added the usage How to use vllm label Oct 25, 2024
@zymy-chen
Copy link

I had the same problem and wanted to know how to run static batche.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
usage How to use vllm
Projects
None yet
Development

No branches or pull requests

2 participants