Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: topk/topp sampling #105

Merged
merged 4 commits into from
Jul 31, 2024
Merged

Conversation

chenghuaWang
Copy link
Contributor

greedy search, topk sampling and topp sampling for language generation. see ref: https://huggingface.co/blog/how-to-generate

Note: The tensor provided to the top-p generator should sum to 1, indicating that a softmax operation should be applied first.

LlmTextGenerator gen(LLmTextGeneratorType::kTopkSampling, /*k*/ 50, /*temperature*/0.3, /*p*/0.92);
auto result = model(...);
auto out_token = gen.generate(result[0]);
auto out_string = tokenizer.detokenize({out_token});

@chenghuaWang
Copy link
Contributor Author

To avoid copying the entire vector, if you want to get all tokens by once, pls using call_back function. Here is an example

Chat:

for (int i = 0; i < in_strs.size(); ++i) {
        auto in_str = in_strs[i];
        auto input_tensor = tokenizer.tokenize(in_str, i);
        std::cout << "[Q] " << in_str << std::endl;
        std::cout << "[A] " << std::flush;

        LlmTextGeneratorOpts opt{
            .max_new_tokens = 100,
            .do_sample = true,
            .temperature = 0.3f,
            .top_k = 50,
            .top_p = 0.f,
        };
        model.generate(input_tensor, opt, [&](unsigned int out_token) -> bool {
            auto out_string = tokenizer.detokenize({out_token});
            auto [isOk, print_string] = processOutput(out_string);
            if (isOk) {
                std::cout << print_string << std::flush;
            } else {
                return false;
            }
            return true;
        });
        printf("\n");
    }

Get all Tokens:

for (int i = 0; i < in_strs.size(); ++i) {
        auto in_str = in_strs[i];
        auto input_tensor = tokenizer.tokenize(in_str, i);

        LlmTextGeneratorOpts opt{
            .max_new_tokens = 100,
            .do_sample = true,
            .temperature = 0.3f,
            .top_k = 50,
            .top_p = 0.f,
        };
        std::vector<unsigned int> tokens
        model.generate(input_tensor, opt, [&](unsigned int out_token) -> bool {
            tokens.emplace_back(out_token);
            return true;
        });
        auto out_string = tokenizer.detokenize(out_token);
    }

@yirongjie yirongjie merged commit c5c33de into UbiquitousLearning:main Jul 31, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants