We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
A high-throughput and memory-efficient inference and serving engine for LLMs
Python 66.2k 12.2k
Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
Python 2.5k 337
Common recipes to run vLLM
Jupyter Notebook 306 112
A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
Python 175 22
System Level Intelligent Router for Mixture-of-Models
Go 2.6k 368
TPU inference for vLLM, with unified JAX and PyTorch support.
This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
Community maintained hardware plugin for vLLM on Ascend
A framework for efficient model inference with omni-modality models
There was an error while loading. Please reload this page.
The vLLM XPU kernels for Intel GPU
vLLM Daily Summarization of Merged PRs
Community maintained hardware plugin for vLLM on Spyre
A high-performance and light-weight router for vLLM large scale deployment