vllm-project / vllm Public

Notifications You must be signed in to change notification settings
Fork 4.4k
Star 29.1k

Code
Issues 1.8k
Pull requests 425
Discussions
Actions
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Security
Insights

Issues: vllm-project/vllm

[Roadmap] vLLM Roadmap Q4 2024

#9006 opened Oct 1, 2024 by simon-mo

Open 10

vLLM's V2 Engine Architecture

#8779 opened Sep 24, 2024 by simon-mo

Open 7

Labels 55 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

1,772 Open 3,230 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug]: Bfloat16 or Half are not compatible with HF float16/bfloat16 result. bug

Something isn't working

#9729 opened Oct 27, 2024 by jason9693

1 task done

[Bug]: Jetson support regression bug

Something isn't working

#9728 opened Oct 27, 2024 by conroy-cheers

1 task done

[Bug]: vllm.LLM does not seem to re-initialize for distributed inference with subsequent models with Offline Inference bug

Something isn't working

#9727 opened Oct 27, 2024 by lhl

1 task done

[Bug]: Incoherent Offline Inference Single Video with Qwen2-VL bug

Something isn't working

#9723 opened Oct 26, 2024 by hector-gr

1 task done

[Performance]: How to Improve Performance Under Concurrency performance

Performance-related issues

#9722 opened Oct 26, 2024 by ljwps

1 task done

[Feature]: AttributeError: Model MllamaForConditionalGeneration does not support BitsAndBytes quantization yet feature request

#9714 opened Oct 26, 2024 by CyrusCY

1 task done

[Usage]: how to get average prompt token length and output token length usage

How to use vllm

#9711 opened Oct 26, 2024 by starrlee356

1 task done

[New Model]: Tarsier new model

Requests to new models

#9707 opened Oct 25, 2024 by Fangzhou-Ai

1 task done

[Bug]: Inconsistent evaluations when enabling / disabling chunked_prefill? bug

Something isn't working

#9706 opened Oct 25, 2024 by Jingyu6

1 task done

[Usage]: Using a model for inference and embedding usage

How to use vllm

#9702 opened Oct 25, 2024 by micuentadecasa

1 task done

[Installation] pip install vllm (0.6.3) will force a reinstallation of the CPU version torch and replace cuda torch on windows installation

Installation problems

#9701 opened Oct 25, 2024 by xiezhipeng-git

[Performance]: Empirical Measurement of NVLS performance

Performance-related issues

#9699 opened Oct 25, 2024 by youkaichao

[Feature]: Multiple Secret Keys feature request

#9698 opened Oct 25, 2024 by CHesketh76

1 task done

[Usage]: Are prompts processed sequentially? usage

How to use vllm

#9695 opened Oct 25, 2024 by nishadsinghi

1 task done

[Usage]: GetTimeoutError when run distributed inference on ray with tensor parallel size > 1 usage

How to use vllm

#9694 opened Oct 25, 2024 by sharlynxy

1 task done

[Bug]: Function calling with stream vs without stream, arguments=None when stream option is enabled bug

Something isn't working

#9693 opened Oct 25, 2024 by ankush13r

1 task done

[Usage]: How do I use langchain for tool calls？ usage

How to use vllm

#9692 opened Oct 25, 2024 by 2500035435

1 task done

[Bug]: No error report when passing wrong lora path using num_scheduler_steps=8 bug

Something isn't working

#9688 opened Oct 25, 2024 by sleepwalker2017

1 task done

[Usage]: How to improve throughput with multi-card inference？ usage

How to use vllm

#9684 opened Oct 25, 2024 by tensorflowt

1 task done

[Bug]: "gettid" was not declared error when build from source for cpu with version after v0.6.1 bug

Something isn't working

#9683 opened Oct 25, 2024 by smallccn

1 task

[Bug]: Worker timeout using TP=1 with ray concurrency bug

Something isn't working

#9681 opened Oct 25, 2024 by nathan-az

1 task done

Max throughout for llama3.2 1B model misc

#9680 opened Oct 25, 2024 by JunhaoWang

[Bug]: An EXTREMELY WEIRD bug when I import evaluate before vllm bug

Something isn't working

#9678 opened Oct 25, 2024 by cafeii

1 task done

[Bug]: Input length greater than 32K in nvidia/Llama-3.1-Nemotron-70B-Instruct-HF generate garbage on v0.6.3 ( issue is not seen in v0.6.2) bug

Something isn't working

#9670 opened Oct 24, 2024 by source-ram

1 task done

[RFC]: Model Deprecation Policy RFC

#9669 opened Oct 24, 2024 by youkaichao

1 task done

Previous 1 2 3 4 5 … 70 71 Next

Previous Next

ProTip! Add no:assignee to see everything that’s not assigned.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly