Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

75 manager sampler add minimum tokens per second accepted #94

Merged

Conversation

AguirreNicolas
Copy link
Contributor

It was created a TimeoutHandler to assing timeouts.
For LLMs it depends of 3 variables:

  • Prefill Time a.k.a Time-To-First-Token (TTFT)
  • Decode Time a.k.a Time-Per-Output-Token (TPOT)
  • Queue Time

Prefill time (TTFT)

It was modeled as a quadratic functions [1]. Besides (0, 0), two more points are nedded. This two point acts as a SLA. For insntance, this points are (8192,2) and (32768, 10).

image

Decode time (TPOT)

The decode time is modeled as a linear function with respect to the average silent reading time in English. According to the reference [2], the average silent reading speed is 238 words per minute. Given that there are approximately 0.75 tokens per word, the TPOT can be calculated as follows:

$TPOT ~(ms/tok) = \frac{1000 ms}{1seg *~\frac{238 ~words}{min}*\frac{ 0.75 ~tok}{word}*\frac{1min}{60seg} } = \frac{1000ms}{2.975tok} = 336 \frac{ms}{tok}$

Queue time

The queue time has been defined as 30 seconds.

Close #75

[1] https://arxiv.org/abs/2407.07000
[2] https://www.sciencedirect.com/science/article/abs/pii/S0749596X19300786

Close #75

@AguirreNicolas AguirreNicolas linked an issue Jul 31, 2024 that may be closed by this pull request
Copy link
Collaborator

@RawthiL RawthiL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some minor changes requiered

docker-compose/dev/apps/config/sampler.json Show resolved Hide resolved
docker-compose/dev/apps/config/sampler.json Show resolved Hide resolved
apps/python/sampler/activities/lmeh/sample.py Outdated Show resolved Hide resolved
packages/python/protocol/protocol.py Outdated Show resolved Hide resolved
* Enhanced TimeoutHander construction.
* Added type into the timeout.
@RawthiL RawthiL merged commit 0efbae8 into main Aug 1, 2024
4 checks passed
@RawthiL RawthiL deleted the 75-manager-sampler-add-minimum-tokens-per-second-accepted branch August 1, 2024 17:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Manager / Sampler : Add minimum tokens per second accepted
2 participants