v1.0.1
1.0.1 Bugfix Release
This Release:
- Fixes a bug where cancelling multiple in-flight requests could crash the vllm server
- Fixes a bug where granite-3.x-8b models were not detected correctly, leading to
VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERSnot functioning properly - Fixes a bug where the number of processors was not detected correctly for setting threading configs.
VLLM_SPYRE_NUM_CPUSis now available as a manual override to set the number of cpu cores available to vllm
- Fixes a bug where attempting to run pooling models in continuous batching mode would crash, instead of defaulting to static batching
- Fixes a bug where the lower bound of FMS was not properly specified
- Disables prompt logprobs completely because it's still broken
- Updates the "simple compile backend" to
inductorto align with vLLM
What's Changed
- disable prompt logprobs by @yannicks1 in #486
- [docs] update docs continuous batching by @yannicks1 in #485
- 🐛 correct fms lower bound by @joerunde in #493
- 🎨 scheduler: make holdback queue a local variable by @yannicks1 in #465
- [CB] 🐛 fix padding of position ids by @yannicks1 in #495
- [s390x] Update s390x depencies by @nikheal2 in #494
- fix: logits processors for CB by @wallashss in #484
- [fp8] fix cb scheduler step tests by @yannicks1 in #491
- 🔥 remove auto-marked xfail for fp8, include fp8 tests by default, add xfail manually by @prashantgupta24 in #490
- feat: add VLLM_SPYRE_NUM_CPUS and psutil to help with cpu checks by @tjohnson31415 in #487
- 🐛 implement better checking for granite by @joerunde in #500
- Better Error Handling for attempts to run CB with pooling models by @gmarinho2 in #476
- 🔥 remove unused test parametrizations by @joerunde in #505
- 🔧 Update default simple compile backend by @joerunde in #506
New Contributors
Full Changelog: v1.0.0...v1.0.1