vllm-project repositories

vllm

Public

A high-throughput and memory-efficient inference and serving engine for LLMs

amd cuda inferencepytorch transformer openai moe llama gpt model-serving

Python

•

Apache License 2.0

•12k•65k•1.9k•1.3k•Updated

Dec 6, 2025

router

Public

A high-performance and light-weight router for vLLM large scale deployment

Rust

•

Apache License 2.0

•1•8•1•2•Updated

Dec 6, 2025

aibrix

Public

Cost-efficient and pluggable Infrastructure components for GenAI inference

Go

•

Apache License 2.0

•493•4.5k•258•24•Updated

Dec 6, 2025

vllm-ascend

Public

Community maintained hardware plugin for vLLM on Ascend

inference transformer model-servingmlops ascend llm llmops llm-serving vllm

Python

•

Apache License 2.0

•629•1.4k•799•279•Updated

Dec 6, 2025

vllm-omni

Public

A framework for efficient model inference with omni-modality models

inference pytorch transformerimage-generation diffusion model-serving multimodal video-generation audio-generation

Python

•

Apache License 2.0

•79•684•41•27•Updated

Dec 6, 2025

tpu-inference

Public

TPU inference for vLLM, with unified JAX and PyTorch support.

Python

•

Apache License 2.0

•53•178•17•68•Updated

Dec 6, 2025

ci-infra

Public

This repo hosts code for vLLM CI & Performance Benchmark infrastructure.

HCL

•

Apache License 2.0

•48•27•0•26•Updated

Dec 6, 2025

guidellm

Public

Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs

Python

•

Apache License 2.0

•105•733•43•19•Updated

Dec 6, 2025

llm-compressor

Public

Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM

sparsity compression quantization

Python

•

Apache License 2.0

•306•2.3k•77•46•Updated

Dec 6, 2025

vllm-neuron

Public

Community maintained hardware plugin for vLLM on AWS Neuron

Python

•1•13•0•0•Updated

Dec 6, 2025

vllm-spyre

Public

Community maintained hardware plugin for vLLM on Spyre

Python

•

Apache License 2.0

•30•37•4•15•Updated

Dec 5, 2025

speculators

Public

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM

Python

•

Apache License 2.0

•20•142•10•19•Updated

Dec 5, 2025

vllm-gaudi

Public

Community maintained hardware plugin for vLLM on Intel Gaudi

Python

•

Apache License 2.0

•77•19•0•67•Updated

Dec 5, 2025

recipes

Public

Common recipes to run vLLM

Jupyter Notebook

•

Apache License 2.0

•98•263•9•9•Updated

Dec 5, 2025

vllm-xpu-kernels

Public

The vLLM XPU kernels for Intel GPU

C++

•

Apache License 2.0

•15•12•1•5•Updated

Dec 5, 2025

semantic-router

Public

Intelligent Router for Mixture-of-Models

python kubernetes rustgolang mcp fine-tuning envoyproxy pii-detection mixture-of-models huggingface-transformers

Go

•

Apache License 2.0

•302•2.4k•97•37•Updated

Dec 5, 2025

compressed-tensors

Public

A safetensors extension to efficiently store sparse quantized tensors on disk

Python

•

Apache License 2.0

•41•214•3•14•Updated

Dec 4, 2025

vllm-openvino

Public

Python

•

Apache License 2.0

•10•27•2•0•Updated

Dec 4, 2025

vllm-project.github.io

Public

JavaScript

•47•24•0•2•Updated

Dec 3, 2025

production-stack

Public

vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization

Python

•

Apache License 2.0

•331•2k•93•53•Updated

Nov 30, 2025

flash-attention

Public

Fast and memory-efficient exact attention

Python

•

BSD 3-Clause "New" or "Revised" License

•2.2k•102•0•18•Updated

Nov 21, 2025

FlashMLA

Public

C++

•

MIT License

•909•8•0•3•Updated

Oct 22, 2025

media-kit

Public

vLLM Logo Assets

4•6•0•0•Updated

Oct 22, 2025

DeepGEMM

Public

DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling

Cuda

•

MIT License

•767•0•0•0•Updated

Sep 29, 2025

rfcs

Public

0•1•0•0•Updated

Jun 3, 2025

vllm-project.github.io-static

Public archive

HTML

•

MIT License

•7•9•0•1•Updated

Feb 7, 2025

vllm-nccl

Public archive

Manages vllm-nccl dependency

Python

•

Apache License 2.0

•3•17•2•0•Updated

Jun 3, 2024

dashboard

Public

vLLM performance dashboard

Python

•

Apache License 2.0

•8•39•0•0•Updated

Apr 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM

All

All

28 repositories

vllm

router

aibrix

vllm-ascend

vllm-omni

tpu-inference

ci-infra

guidellm

llm-compressor

vllm-neuron

vllm-spyre

speculators

vllm-gaudi

recipes

vllm-xpu-kernels

semantic-router

compressed-tensors

vllm-openvino

vllm-project.github.io

production-stack

flash-attention

FlashMLA

media-kit

DeepGEMM

rfcs

vllm-project.github.io-static

vllm-nccl

dashboard

All

All

Repositories list

28 repositories