-
Notifications
You must be signed in to change notification settings - Fork 341
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correctly format followup messages in turn-based (chat) inference #299
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall looks good. Thank you for the contribution!
@@ -159,15 +159,15 @@ public async IAsyncEnumerable<string> ChatAsync(string prompt, IInferenceParams? | |||
InteractiveExecutorState state = (InteractiveExecutorState)executor.GetStateData(); | |||
prompt = state.IsPromptRun | |||
? HistoryTransform.HistoryToText(History) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's an unexpected behaviour here. If the user input the prompt for the first time run, what he/she want is treat it as a raw text instead of transformed text. You could run the example ChatSessionWithRoleName
and set break point to check it.
It's not your fault, because the abstraction here makes the boundary vague. Some logic of executors and chat session need to be refactored to completely fix it. However a quick fix for it now will be better. :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I see what you're saying. Do you think the change to the overlay that uses the ChatHistory
is ok to go in by itself? The following messages a user sends in need the template from the IHistoryTransform
implementation applied so that the model has the appropriate tokens to continue the conversation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's better to see if the history is empty before processing the prompt here. If the history is empty, it indicates that the current prompt is the first input, therefore we don't modify it. Otherwise we could resume from history.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If a session was loaded wouldn't the history be duplicated in the state?
I think this is a problem that's created because the actual text is not saved when a session is saved.
I'm working on a proposal for the ChatSession
class that will communicate the behavior of the class better through its interface, and will have some more guardrails in place so it can't be used in unexpected ways.
I'll send that in as a separate PR and close this. I don't think we'll get this right without breaking changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually we may also refactor some parts recently. If you'd like to make some aggressive break changes, you could also open a PR to the preview branch, so that you wouldn't feel limited. :)
While working with another model than in my previous tests, I noticed that the model always started its responses with the token that indicates the start of a turn (e.g. <|assistant|>) to the second, third, etc. question I asked.
I investigated and found that it is not enough to just send in the raw text for this model as it happened to be for the other model I initially tested with.
I changed the code so that the correct turn template as per
HistoryTransform
class is applied before sending the prompt into theInternalChatAsync
method.This resolved the issue and turn-based inference started working as expected.