Max throughout for llama3.2 1B model #9680

JunhaoWang · 2024-10-25T04:41:32Z

benchmark that targets high througput small llm use case

my use case does not care about latency, but need extremely high throughput at minimum price (cheap t4 gpu or just cpu), i.e. run 1 billion requests every week (avg 100k input tokens, 20k output tokens per request). i didnt find any benchmark targeting this type of usecase, would love some pointers.

JunhaoWang added the misc label Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Max throughout for llama3.2 1B model #9680

Max throughout for llama3.2 1B model #9680

JunhaoWang commented Oct 25, 2024 •

edited

Loading

Max throughout for llama3.2 1B model #9680

Max throughout for llama3.2 1B model #9680

Comments

JunhaoWang commented Oct 25, 2024 • edited Loading

benchmark that targets high througput small llm use case

JunhaoWang commented Oct 25, 2024 •

edited

Loading