Return TokenData in Inference calls #199

saddam213 · 2023-10-19T21:16:48Z

This PR that is NOT intended to be merged, but to so the idea I had is easier explain.

Adding an overload ILLamaExecutor to return all the TokenData in the Infer call instead of string

IAsyncEnumerable<TokenData> InferTokensAsync(string text, IInferenceParams? inferenceParams = null, CancellationToken cancellationToken = default)

public record TokenData(int Id)
{
    public float Logit { get; set; }
    public float Probability { get; set; }
    public string Content { get; set; }
}

We could also possibly add n tokens that had the next highest probability,

public record TokenData(int Id)
{
    public float Logit { get; set; }
    public float Probability { get; set; }
    public string Content { get; set; }
    public TokenData[] LastN { get; set; }
}

This extra data is super helpful to a lot of end users,

I have a working example similar to this and the code in this PR was shoehorned out of that but I am not sure on the best way to implement this properly, so thought I would open this draft :)

martindevans · 2023-10-20T12:34:51Z

That's an interesting idea.

Are you thinking of something that allows the user to modify the tokens before they're selected (to influence token selection) or just returning the data?

I've been playing with ideas to split executors into "stages" which get chained together into an inference pipeline (e.g. Infer -> LogitBias -> Temperature -> TopK), but I haven't come up with anything good yet. That may be similar to what you're looking for?

saddam213 · 2023-10-20T19:19:52Z

Just returning the data at this point, having the logits will make using LogiBias a lot easier, and token ids would be far better for use in the output filters than using strings.

But you are right, this could also be used for many other things in the pipeline.

There is not any nice way for 3rd party libs go get this data without jumping though extra hoops

saddam213 · 2023-10-29T19:53:17Z

Found a work around without needing to maintain my custom executors

Run inference, then re-run to get embedding, then use Tokenize to match the token data with the token strings, then zip it all together

It is a lot slower, however no need to change LLamaSharp for the data I need :)

Crude TokenData example

e332041

saddam213 closed this Oct 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Return TokenData in Inference calls #199

Return TokenData in Inference calls #199

saddam213 commented Oct 19, 2023 •

edited

Loading

martindevans commented Oct 20, 2023

saddam213 commented Oct 20, 2023

saddam213 commented Oct 29, 2023 •

edited

Loading

Return TokenData in Inference calls #199

Return TokenData in Inference calls #199

Conversation

saddam213 commented Oct 19, 2023 • edited Loading

martindevans commented Oct 20, 2023

saddam213 commented Oct 20, 2023

saddam213 commented Oct 29, 2023 • edited Loading

saddam213 commented Oct 19, 2023 •

edited

Loading

saddam213 commented Oct 29, 2023 •

edited

Loading