-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: Bfloat16 or Half are not compatible with HF float16/bfloat16 result.
bug
Something isn't working
#9729
opened Oct 27, 2024 by
jason9693
1 task done
[Bug]: Jetson support regression
bug
Something isn't working
#9728
opened Oct 27, 2024 by
conroy-cheers
1 task done
[Bug]: vllm.LLM does not seem to re-initialize for distributed inference with subsequent models with Offline Inference
bug
Something isn't working
#9727
opened Oct 27, 2024 by
lhl
1 task done
[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL
bug
Something isn't working
#9723
opened Oct 26, 2024 by
hector-gr
1 task done
[Performance]: How to Improve Performance Under Concurrency
performance
Performance-related issues
#9722
opened Oct 26, 2024 by
ljwps
1 task done
[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet
feature request
#9714
opened Oct 26, 2024 by
CyrusCY
1 task done
[Usage]: how to get average prompt token length and output token length
usage
How to use vllm
#9711
opened Oct 26, 2024 by
starrlee356
1 task done
[New Model]: Tarsier
new model
Requests to new models
#9707
opened Oct 25, 2024 by
Fangzhou-Ai
1 task done
[Bug]: Inconsistent evaluations when enabling / disabling chunked_prefill?
bug
Something isn't working
#9706
opened Oct 25, 2024 by
Jingyu6
1 task done
[Usage]: Using a model for inference and embedding
usage
How to use vllm
#9702
opened Oct 25, 2024 by
micuentadecasa
1 task done
[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows
installation
Installation problems
#9701
opened Oct 25, 2024 by
xiezhipeng-git
[Performance]: Empirical Measurement of NVLS
performance
Performance-related issues
#9699
opened Oct 25, 2024 by
youkaichao
[Usage]: Are prompts processed sequentially?
usage
How to use vllm
#9695
opened Oct 25, 2024 by
nishadsinghi
1 task done
[Usage]: GetTimeoutError when run distributed inference on ray with tensor parallel size > 1
usage
How to use vllm
#9694
opened Oct 25, 2024 by
sharlynxy
1 task done
[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled
bug
Something isn't working
#9693
opened Oct 25, 2024 by
ankush13r
1 task done
[Usage]: How do I use langchain for tool calls?
usage
How to use vllm
#9692
opened Oct 25, 2024 by
2500035435
1 task done
[Bug]: No error report when passing wrong lora path using Something isn't working
num_scheduler_steps=8
bug
#9688
opened Oct 25, 2024 by
sleepwalker2017
1 task done
[Usage]: How to improve throughput with multi-card inference?
usage
How to use vllm
#9684
opened Oct 25, 2024 by
tensorflowt
1 task done
[Bug]: "gettid" was not declared error when build from source for cpu with version after v0.6.1
bug
Something isn't working
#9683
opened Oct 25, 2024 by
smallccn
1 task
[Bug]: Worker timeout using TP=1 with ray concurrency
bug
Something isn't working
#9681
opened Oct 25, 2024 by
nathan-az
1 task done
[Bug]: An EXTREMELY WEIRD bug when I import evaluate before vllm
bug
Something isn't working
#9678
opened Oct 25, 2024 by
cafeii
1 task done
[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nemotron-70B-Instruct-HF generate garbage on v0.6.3 ( issue is not seen in v0.6.2)
bug
Something isn't working
#9670
opened Oct 24, 2024 by
source-ram
1 task done
Previous Next
ProTip!
Add no:assignee to see everything that’s not assigned.