How to make vLLM support a new hardware (Tenstorrent Grayskull) #5627

yocox · 2024-06-18T06:47:20Z

yocox
Jun 18, 2024

Hi, vLLM community

I want to make vLLM support a new hardware, the Tenstorrent's Grayskull (which is a general purpose DLA, just like CUDA, but not CUDA). After reading the document and the code, I have some understanding and some questions, need the community's help to clarify my thoughts and check my understanding. Please correct me if I have any misunderstandings.

My understandings

The essential part of the vLLM is the PagedAttention, which is a highly optimized "memory paging mechanism" implemented on CUDA.
- The CUDA source is at attention_kernel.cu
- The Python binding to expose the kernel to Python is at torch_bindings.cpp
To utilized the Tenstorrent Grayskull, I have to do:
- Implement the PagedAttention with Tenstorrent Grayskull kernel. (that will a huge work)
- Expose the kernel to Python with bindings.
What I DON'T have to do:
- Modify LLM's implementation which already support vLLM, because they are already using the vLLM's interface.

My questions

I saw there are 2 versions of kernels, v1 and v2. Do I need to implement v1, or I can just go with v2?
Where can I find a list of API's that I have to implement? I am afraid I missed anything. In the torch_binding.py I saw there binds a lot of operations, but do I need to implement them all or just the paged_attention_v2()?
Can I first only modify the forward() function to adapt vLLM's interface, without the PagedAttention? will it work but just with worse performance?
Does quantization cause anything special considerations?
Is there are anything I missed but I should know?

Thank you for reading my long questions and thanks in advance for the helping :D

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to make vLLM support a new hardware (Tenstorrent Grayskull) #5627

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

How to make vLLM support a new hardware (Tenstorrent Grayskull) #5627

yocox Jun 18, 2024

My understandings

My questions

Replies: 0 comments

yocox
Jun 18, 2024