-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled #9693
Comments
@K-Mistele can you take a look into this? |
I’ve been debugging the issue on my own and think I've identified the solution. After testing the API, I noticed that it currently generates tool_calls where the function name and arguments are in separate yield statements, which is causing issues. Here’s an example of the current output: Current Output:
In this example, the function name is yielded separately from its arguments. However, for functionality like chatbot integration and API calls—where multiple frameworks expect the tool_call to be complete in a single field—it would be more efficient if both the name and arguments were generated in the same yield statement. Expected Behavior: The API should generate tool_calls with the function name and arguments combined, so the function can be utilized directly without additional processing. Here’s an example of the ideal output:
|
hi @ankush13r! You are correct in that the function name and function arguments are handled in separate here's an example request you can make with postman or something similar to illustrate what the streamed Server-sent events will look like according to OpenAI's standard: {
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Can you tell me the weather in dallas in fahrenheit?"
}
],
"stream": true,
"temperature": 0.7,
"tools": [
{
"type": "function",
"function": {
"name": "get_current_weather",
"description": "Get the current weather in a given location",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city to find the weather for, e.g. 'San Francisco'"
},
"state": {
"type": "string",
"description": "the two-letter abbreviation for the state that the city is in, e.g. 'CA' which would mean 'California'"
},
"unit": {
"type": "string",
"description": "The unit to fetch the temperature in",
"enum": [
"celsius",
"fahrenheit"
]
}
}
}
}
}
]
} Here is what this request generates from OpenAI using streaming: Long list of Server-sent events from OpenAI
There are a couple important things to observe here:
This is the OpenAI standard for server-sent events for tool streaming, and this is the standard that vLLM follows. A function's name is always streamed before argument deltas arrive, and argument deltas will never be streamed in the same event as the function's name. Multiple argument deltas will be received that must be concatenated; the entire arguments stream (should) never be received all at once. When you're receiving deltas from vLLM, are these (below) the only deltas that you are receiving before the stream ends, or are you receiving additional deltas with arguments diffs like shown above? ChoiceDelta(content='', function_call=None, refusal=None, role='assistant', tool_calls=None)
ChoiceDeltaToolCall(index=0, id='chatcmpl-tool-ac7886c6cea04451b439d4e24b21ab7a', function=ChoiceDeltaToolCallFunction(arguments=None, name='sum'), type='function')
ChoiceDelta(content='', function_call=None, refusal=None, role=None, tool_calls=None) If these are the only deltas you receive, that probably indicates a bug, since you should receive argument deltas as well. If you do receive additional deltas, you just need to handle concatenating and parsing them as described above & in the docs example that I linked to. Can you please share your entire vLLM start command and the entire request and all received deltas so that I can help you debug it? You should be able to see an example of how this works, including delta processing for arguments, in this example from the vLLM docs. I actually created this demo with hermes, so it should work for testing your purposes. |
Now I see that the arguments are being yielded separately. However, I found a bug in the Hermes parser during debugging, which causes it to return a response without arguments. Below is an example of the output received:
Debug Findings:
Proposed Solution: The solution that mitigates this bug is to add a check to verify that delta_text exists within cur_arguments_json before attempting to find its index and check if current_tool_call is not None. Here’s the current and modified code: function_name: Union[str, None] = current_tool_call.get("name")
cur_arguments = current_tool_call.get("arguments")
# get the location where previous args differ from current
args_delta_start_loc = cur_arguments_json.index(delta_text) \
+ len(delta_text)
arguments_delta = cur_arguments_json[:args_delta_start_loc] https://github.com/vllm-project/vllm/blob/main/vllm/entrypoints/openai/tool_parsers/hermes_tool_parser.py#L227C51-L227C72 Updated Code:
This fix both bugs. However, it still produces reponses with
To prevent empty responses, the solution is to check if
Let me know if you think this should fix the bug or if the issue lies with the model's response generation. I'm open to collaborating to resolve the bug and can make pull request. |
Can you please share the request you're using (messages, tools, vLLM config) so that I can try to reproduce the issue? It's not impossible that there's a bug in the Hermes tool parser, but it has been used and tested pretty robustly so I'm curious what's different about this and I'd like to be able to step through the streaming parsing. |
Your current environment
Dockerfile: vllm/vllm-openai:v0.6.3
Parameters:
--enable-auto-tool-choice --tool-call-parser hermes
Model Input Dumps
No response
🐛 Describe the bug
I'm using the VLLM library with a Docker container as a REST API, specifically the
v1/chat/completion/
endpoint with the OpenAI client.When I run chat completions without streaming, it returns
tool_calls
with the tool name and its arguments as expected. However, when I enable the streaming option, it only returns the tool name, with arguments set toNone
. I'm not sure why this is happening.I've tried searching for related issues but haven’t found anything helpful.
Have tried
stream_options={"include_usage": True}
and it gives same output.Model generate this output:
Output:
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: