Skip to content
Change the repository type filter

All

    Repositories list

    • vllm

      Public
      A high-throughput and memory-efficient inference and serving engine for LLMs
      Python
      12k65k1.9k1.3kUpdated Dec 6, 2025Dec 6, 2025
    • router

      Public
      A high-performance and light-weight router for vLLM large scale deployment
      Rust
      1812Updated Dec 6, 2025Dec 6, 2025
    • aibrix

      Public
      Cost-efficient and pluggable Infrastructure components for GenAI inference
      Go
      4934.5k25824Updated Dec 6, 2025Dec 6, 2025
    • vllm-ascend

      Public
      Community maintained hardware plugin for vLLM on Ascend
      Python
      6291.4k799279Updated Dec 6, 2025Dec 6, 2025
    • vllm-omni

      Public
      A framework for efficient model inference with omni-modality models
      Python
      796844127Updated Dec 6, 2025Dec 6, 2025
    • tpu-inference

      Public
      TPU inference for vLLM, with unified JAX and PyTorch support.
      Python
      531781768Updated Dec 6, 2025Dec 6, 2025
    • ci-infra

      Public
      This repo hosts code for vLLM CI & Performance Benchmark infrastructure.
      HCL
      4827026Updated Dec 6, 2025Dec 6, 2025
    • guidellm

      Public
      Evaluate and Enhance Your LLM Deployments for Real-World Inference Needs
      Python
      1057334319Updated Dec 6, 2025Dec 6, 2025
    • llm-compressor

      Public
      Transformers-compatible library for applying various compression algorithms to LLMs for optimized deployment with vLLM
      Python
      3062.3k7746Updated Dec 6, 2025Dec 6, 2025
    • vllm-neuron

      Public
      Community maintained hardware plugin for vLLM on AWS Neuron
      Python
      11300Updated Dec 6, 2025Dec 6, 2025
    • vllm-spyre

      Public
      Community maintained hardware plugin for vLLM on Spyre
      Python
      3037415Updated Dec 5, 2025Dec 5, 2025
    • speculators

      Public
      A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM
      Python
      201421019Updated Dec 5, 2025Dec 5, 2025
    • vllm-gaudi

      Public
      Community maintained hardware plugin for vLLM on Intel Gaudi
      Python
      7719067Updated Dec 5, 2025Dec 5, 2025
    • recipes

      Public
      Common recipes to run vLLM
      Jupyter Notebook
      9826399Updated Dec 5, 2025Dec 5, 2025
    • vllm-xpu-kernels

      Public
      The vLLM XPU kernels for Intel GPU
      C++
      151215Updated Dec 5, 2025Dec 5, 2025
    • semantic-router

      Public
      Intelligent Router for Mixture-of-Models
      Go
      3022.4k9737Updated Dec 5, 2025Dec 5, 2025
    • compressed-tensors

      Public
      A safetensors extension to efficiently store sparse quantized tensors on disk
      Python
      41214314Updated Dec 4, 2025Dec 4, 2025
    • vllm-openvino

      Public
      Python
      102720Updated Dec 4, 2025Dec 4, 2025
    • vllm-project.github.io

      Public
      JavaScript
      472402Updated Dec 3, 2025Dec 3, 2025
    • production-stack

      Public
      vLLM’s reference system for K8S-native cluster-wide deployment with community-driven performance optimization
      Python
      3312k9353Updated Nov 30, 2025Nov 30, 2025
    • flash-attention

      Public
      Fast and memory-efficient exact attention
      Python
      2.2k102018Updated Nov 21, 2025Nov 21, 2025
    • FlashMLA

      Public
      C++
      909803Updated Oct 22, 2025Oct 22, 2025
    • media-kit

      Public
      vLLM Logo Assets
      4600Updated Oct 22, 2025Oct 22, 2025
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      767000Updated Sep 29, 2025Sep 29, 2025
    • rfcs

      Public
      0100Updated Jun 3, 2025Jun 3, 2025
    • vllm-project.github.io-static

      Public archive
      HTML
      7901Updated Feb 7, 2025Feb 7, 2025
    • vllm-nccl

      Public archive
      Manages vllm-nccl dependency
      Python
      31720Updated Jun 3, 2024Jun 3, 2024
    • dashboard

      Public
      vLLM performance dashboard
      Python
      83900Updated Apr 26, 2024Apr 26, 2024