Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use continuous batching by default #882

Open
wants to merge 58 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
58 commits
Select commit Hold shift + click to select a range
ec5f305
Use continuous batching by default
Wovchena Sep 19, 2024
dd7a5cf
Merge branch 'master' into use-continuos-batching-by-default
andrei-kochin Sep 19, 2024
229b7c5
Merge branch 'master' into use-continuos-batching-by-default
andrei-kochin Sep 20, 2024
41d1fe7
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
36150c4
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
4a4a09e
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
1a58b5e
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
90d81e6
Reorder cout
Wovchena Sep 20, 2024
6dc43a3
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
03e2f32
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
e561e93
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
37ea2ad
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 20, 2024
b62aee9
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 23, 2024
07505b3
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 23, 2024
e078818
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 23, 2024
001d3a0
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 23, 2024
a0a964f
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 25, 2024
3cb2105
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Sep 25, 2024
40ea516
Limit max new tokens.
popovaan Sep 25, 2024
193df7e
Fixed error
popovaan Sep 25, 2024
1704548
Clean up
Wovchena Sep 30, 2024
086c7b8
Default destructors
Wovchena Sep 30, 2024
607d90d
Merge branch 'master' into use-continuos-batching-by-default
Wovchena Sep 30, 2024
741c13b
Default ~PerfTime
Wovchena Sep 30, 2024
06d1b1e
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 10, 2024
8d7d39d
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Oct 11, 2024
c4e8e05
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Oct 11, 2024
8116342
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Oct 11, 2024
b87d0f6
Update src/cpp/src/llm_pipeline.cpp
andrei-kochin Oct 11, 2024
1806fa0
CB: fix deadlock (#71)
Wovchena Oct 11, 2024
c9dc107
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 12, 2024
4bbcd0e
Increase timeouts for tests
ilya-lavrenov Oct 12, 2024
743e018
Update causal_lm_cpp.yml
ilya-lavrenov Oct 12, 2024
cfccefa
Use split_core_complile_config for CB
ilya-lavrenov Oct 12, 2024
03965d6
Update causal_lm_cpp.yml
ilya-lavrenov Oct 12, 2024
784c331
Drop request if it's aborted by streamer
ilya-lavrenov Oct 13, 2024
93b8c38
Update src/cpp/src/continuous_batching_impl.cpp
ilya-lavrenov Oct 13, 2024
043d842
Drop request in case of exceptions, etc
ilya-lavrenov Oct 14, 2024
fdad63c
Turned off prefix caching
ilya-lavrenov Oct 14, 2024
a21f725
Apply suggestions from code review
ilya-lavrenov Oct 14, 2024
a66be9e
Apply suggestions from code review
ilya-lavrenov Oct 14, 2024
82fceb5
Update continuous_batching_impl.cpp
ilya-lavrenov Oct 14, 2024
a246c1c
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 14, 2024
4ee8f12
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 14, 2024
73a8872
Apply suggestions from code review
ilya-lavrenov Oct 14, 2024
4019678
Apply suggestions from code review
ilya-lavrenov Oct 14, 2024
ed7668e
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 14, 2024
feae546
Update causal_lm_cpp.yml
ilya-lavrenov Oct 14, 2024
5bdf779
Apply suggestions from code review
ilya-lavrenov Oct 14, 2024
e3f2949
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 16, 2024
f1a9ab5
Merge branch 'master' into use-continuos-batching-by-default
andrei-kochin Oct 17, 2024
debbdd4
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 18, 2024
7827199
Apply suggestions from code review
ilya-lavrenov Oct 21, 2024
3de57d3
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 21, 2024
42d26df
Merge branch 'master' into use-continuos-batching-by-default
ilya-lavrenov Oct 22, 2024
467ab86
Apply suggestions from code review
ilya-lavrenov Oct 22, 2024
5b7f94a
Merge branch 'master' into use-continuos-batching-by-default
andrei-kochin Oct 24, 2024
5a391a8
Merge branch 'master' into use-continuos-batching-by-default
andrei-kochin Oct 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .github/workflows/causal_lm_cpp.yml
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ jobs:
predicted_greedy = f.readline()
with open('predictions_prompt_lookup.txt', 'r') as f:
predicted_prompt_lookup = f.readline()
assert predicted_greedy == predicted_prompt_lookup
assert predicted_greedy == predicted_prompt_lookup, f'Expected {predicted_greedy}, actual {predicted_prompt_lookup}'
"
echo "Prompt lookup" passed
- name: run and compare (model with seq_length_axis = 1)
Expand All @@ -531,7 +531,7 @@ jobs:
predicted_greedy = f.readline()
with open('predictions_prompt_lookup.txt', 'r') as f:
predicted_prompt_lookup = f.readline()
assert predicted_greedy == predicted_prompt_lookup
assert predicted_greedy == predicted_prompt_lookup, f'Expected {predicted_greedy}, actual {predicted_prompt_lookup}'
"
echo "Prompt lookup" passed

Expand Down
2 changes: 1 addition & 1 deletion src/cpp/src/continuous_batching_impl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -287,7 +287,7 @@ ContinuousBatchingPipeline::ContinuousBatchingImpl::generate(const std::vector<o
m_requests.clear();
};

bool continue_generation = true, step_throws_exception = false;
bool continue_generation = true;
while (has_non_finished_requests() && continue_generation) {
try {
step();
Expand Down
29 changes: 29 additions & 0 deletions src/cpp/src/llm_pipeline.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
#include <algorithm>
#include <nlohmann/json.hpp>
#include <openvino/openvino.hpp>
#include <limits>
#include "openvino/genai/continuous_batching_pipeline.hpp"
#include "openvino/genai/generation_config.hpp"
#include "openvino/genai/llm_pipeline.hpp"
Expand Down Expand Up @@ -548,6 +549,7 @@ ov::genai::LLMPipeline::LLMPipeline(
const ov::genai::Tokenizer& tokenizer,
OptionalGenerationConfig generation_config
) {
OPENVINO_THROW("Not supported");
auto start_time = std::chrono::steady_clock::now();
m_pimpl = std::make_unique<StatefulLLMPipeline>(request, tokenizer, generation_config);
auto stop_time = std::chrono::steady_clock::now();
Expand All @@ -560,12 +562,25 @@ ov::genai::LLMPipeline::LLMPipeline(
const std::string& device,
const ov::AnyMap& properties
){
// std::cout << "Using continuous batching backend.\n";
auto start_time = std::chrono::steady_clock::now();
if (properties.find(ov::genai::scheduler_config.name()) != properties.end()) {
auto config_without_scheduler_config = properties;
config_without_scheduler_config.erase(ov::genai::scheduler_config.name());
auto& scheduler_config = properties.at(ov::genai::scheduler_config.name()).as<SchedulerConfig>();
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(models_path, tokenizer, scheduler_config, device, config_without_scheduler_config);
// std::cout << "Found custom SchedulerConfig.\n";
} else if (true) {
SchedulerConfig scheduler_config;
scheduler_config.cache_size = 1;
scheduler_config.enable_prefix_caching = false;
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(
models_path,
tokenizer,
scheduler_config,
device,
properties
);
} else if ("NPU" == device) {
m_pimpl = std::make_unique<StaticLLMPipeline>(models_path, tokenizer, device, properties);
} else {
Expand All @@ -580,12 +595,23 @@ ov::genai::LLMPipeline::LLMPipeline(
const std::string& device,
const ov::AnyMap& config
){
// std::cout << "Using continuous batching backend.\n";
auto start_time = std::chrono::steady_clock::now();
if (config.find(ov::genai::scheduler_config.name()) != config.end()) {
auto config_without_scheduler_config = config;
config_without_scheduler_config.erase(ov::genai::scheduler_config.name());
auto& scheduler_config = config.at(ov::genai::scheduler_config.name()).as<SchedulerConfig>();
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(models_path, scheduler_config, device, config_without_scheduler_config);
} else if (true) {
SchedulerConfig scheduler_config;
scheduler_config.cache_size = 1;
scheduler_config.enable_prefix_caching = false;
ilya-lavrenov marked this conversation as resolved.
Show resolved Hide resolved
m_pimpl = std::make_unique<ContinuousBatchingAdapter>(
models_path,
scheduler_config,
device,
config
);
} else if ("NPU" == device) {
m_pimpl = std::make_unique<StaticLLMPipeline>(models_path, device, config);
} else {
Expand Down Expand Up @@ -618,6 +644,9 @@ void ov::genai::LLMPipeline::set_generation_config(const GenerationConfig& confi
if (config.eos_token_id == -1)
m_pimpl->m_generation_config.eos_token_id = default_eos_token_id;

if (config.max_new_tokens == SIZE_MAX)
m_pimpl->m_generation_config.max_new_tokens = 100;

m_pimpl->m_generation_config.validate();
}

Expand Down
Loading