You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
benchmark that targets high througput small llm use case
my use case does not care about latency, but need extremely high throughput at minimum price (cheap t4 gpu or just cpu), i.e. run 1 billion requests every week (avg 100k input tokens, 20k output tokens per request). i didnt find any benchmark targeting this type of usecase, would love some pointers.
The text was updated successfully, but these errors were encountered:
benchmark that targets high througput small llm use case
my use case does not care about latency, but need extremely high throughput at minimum price (cheap t4 gpu or just cpu), i.e. run 1 billion requests every week (avg 100k input tokens, 20k output tokens per request). i didnt find any benchmark targeting this type of usecase, would love some pointers.
The text was updated successfully, but these errors were encountered: