Skip to content

v0.5.0

Latest

Choose a tag to compare

@github-actions github-actions released this 09 Nov 23:06
· 39 commits to main since this release
45ada88

🚀 New Features Highlights

Batch API & Multimodal and other OpenAI compatible API Surface

  • Batch API Support: Add OpenAI-style Batch API with simple LLM workers, Envoy/Gateway integration, JSONL & File List support, job pool sizing, and robust validation to safely offload large asynchronous workloads. (#1298, #1617, #1671, #1698, #1700, #1701)
  • Embeddings API and Moltimodal API: Introduce OpenAI-compatible embeddings endpoint so online inference, search, and RAG traffic can share the same AIBrix control plane and routing. (#1570) Support multimodality deployments and for image/video generation for other engines. (#1678, #1679, #1603, #1584)
  • Files API & Unified Storage: Implement OpenAI Files API plus a pluggable storage layer (local, S3, TOS, Redis metadata) to standardize artifact and batch job management across backends. (#1583, #1571)

AIBrix KVCache Offloading frameworks & Connectors:

  • High-Performance KVCache: Adds GDR support, optimized collective communications, configurable max sequence length and batched tokens, multi-threading for higher concurrency, and block-hash based APIs plus external cache handles for flexible distributed deployments. (#1411, #1446, #1453, #1451, #1627, #1628, #1545, #1531, #1542)
  • Deep Engine Integrations: Provide official AIBrix KVCache Dockerfiles and integration paths for vLLM and SGLang plus correctness fixes (head size, metrics, types) to make KV offloading a first-class option. (#1641, #1696, #1705, #1473, #1450, #1689)

Production-Grade Prefill/Decode (P/D) Orchestration Support:

  • New StormService Primitives: Add PodSet API, PodGroup support, FullRecreate strategy, role upgrade sequences, roleStatuses, and richer RoleSet/PodSet fields to model multi-pod workers, shard groups, and safer rollout/rollback for complex topologies. (#1475, #1506, #1511, #1432, #1599, #1560)
  • P/D-Aware & Topology-Aware Routing: Prefer P/D workers in the same RoleSet in replication mode, score candidates by locality/load, and harden PD routing behavior for Nixl-based setups. (#1409, #1634, #1429, #1601, #1703, #1693)
  • Role-Level Autoscaling for StormService: Introduced the "subTargetSelector" field in the PodAutoscaler API, allowing independent autoscaling of specific roles (e.g., prefill, decode) within a StormService resource, particularly in pooled mode. (#1625)

📊 Feature Enhancements

  • Unified Runtime & Metadata: Migrate metadata server from golang to Python for a simpler, lighter control path. Add liveness/readiness probes and shrink runtime image sizes. Improve downloader reliability and recursive object-store fetch support. (#1391, #1639, #1548, #1702, #1571)
  • LoRA & Model Adapter Reliability: Support adapter scaling to desired replicas, refactor replica management, add wrappers, and enable LoRA downloading via the runtime to stabilize multi-adapter hosting.
    (#1132, #1472, #1670, #1680, #1537, #1541)
  • Autoscaling: Unify and harden metrics fetching by adding retryable RestMetricsFetcher, shared client/aggregator and fixing race-condition for configuration updates (#1466, #1487, #1620, #1621, #1709), Tune KPA defaults, support metric label selectors, and ensure PodAutoscaler emits events only when replica counts actually change. (#1624, #1629, #1630) scaling history decision has been supported in the status spec (#1618)
  • AIBrixRuntime Injection: Deployment & StormService webhooks and wrapper libraries to auto-inject the runtime sidecar, standardizing metrics, downloads, and admin controls across engines.
    (#1403, #1457, #1543, #1681, #1561)

📦 Installation & Tooling & CI

  • Helm & Installation: Strengthen the AIBrix Helm chart as the recommended deployment path by adding dedicated chart CI and fixes (#1370, #1424), enriching Chart.yaml metadata (#1414), introducing values.schema.json for input validation (#1415), supporting imagePullSecrets configuration (#1522), and resolving duplicate label issues for Flux Helm Controller compatibility (#1615). Made KubeRay optional for AIBrix installations if you do not use RayclusterFleet API(#1724)

🐞 Critical Bug Fixes

  • Fixes StormService headless Service ownership and DNS behavior by setting proper ownerReferences and PublishNotReadyAddresses. (#1441, #1442)
  • Fixes incorrect naming for AIBRIX_MODEL_GPU_PROFILE_CACHING_FLAG to ensure configuration consistency. (#1427)
  • Fixes KVCache stability issues by preventing panic when watcher or metadata are not set in kvcache.spec. (#1526)
  • Fixes PodAutoscaler and metrics correctness by emitting events only on replica changes, aggregating resources across all containers, handling optional MetricSource fields, validating multiple PodAutoscalers targeting the same workload, and ensuring PodSet autoscaler collects metrics from rank0. (#1630, #1643, #1648, #1662, #1704)

New Contributors

What's Changed

Full Changelog: v0.4.0...v0.5.0

New Contributors

Full Changelog: v0.4.0...v0.5.0