Fused ROPE and reshape cache kernel #229

maleksan85 · 2024-10-11T03:38:59Z

Added support to ROPE kernel to store values right into cache.

LLama3.1 8B fp16 test without using fused ROPE:

root@banff-cyxtera-s82-5:~/workspace/vllm# HIP_VISIBLE_DEVICES=5 VLLM_FUSED_ROPE_W_KV_CACHE=0 python benchmarks/benchmark_latency.py --model /data/models/Meta-Llama-3.1-8B --input_len 1024 --output_len 1024 --batch-size=16

Avg latency: 9.835484731818239 seconds

with fused kernel:

root@banff-cyxtera-s82-5:~/workspace/vllm# HIP_VISIBLE_DEVICES=5 VLLM_FUSED_ROPE_W_KV_CACHE=1 python benchmarks/benchmark_latency.py --model /data/models/Meta-Llama-3.1-8B --input_len 1024 --output_len 1024 --batch-size=16

Avg latency: 9.86210185677434 seconds

Also perf in test (although it is not very fair as kernels were run only once). Taking some test in the end of run of test suite:

tests/kernels/test_fused_rope_and_reshape_cache.py::test_fused_rotary_embedding_with_reshape_cache[5-0-False-dtype1-None-128-8-32-16-8-auto-1024-16] Non fused call 0.22136466577649117 ms
Fused run 0.0467388890683651 ms

Shapes are pretty small, so kernel start overhead might be sufficient.

Aleksandr Malyshev added 13 commits September 19, 2024 01:55

Initial Frame, without kernel itself

eee3c78

fused kernel

2f19c01

Merge branch 'main' into fused_re_and_reshape_cache_kernel

beba692

compilation passes, begging query

d7eaac6

query correctly ROPEd

deaf6aa

teduced test scope is working

108fb65

ordered per slot mapping test passed

c5dca86

kernel ready except fp8

58a93c1

+ fp8 for KV cache support

ef9a664

vllm and test runs, vllm has corretness issue

b30a7f7

correctness is fixed

f0648c3

Merge branch 'main' into fused_re_and_reshape_cache_kernel

afa466b

honest switch from fused ROPE and original

2548322

shajrawi requested a review from sanyalington October 15, 2024 20:50

timing test

3af41ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fused ROPE and reshape cache kernel #229

Fused ROPE and reshape cache kernel #229

maleksan85 commented Oct 11, 2024 •

edited

Loading

Fused ROPE and reshape cache kernel #229

Are you sure you want to change the base?

Fused ROPE and reshape cache kernel #229

Conversation

maleksan85 commented Oct 11, 2024 • edited Loading

maleksan85 commented Oct 11, 2024 •

edited

Loading