Release v1.0.1 · vllm-project/vllm-spyre

1.0.1 Bugfix Release

This Release:

Fixes a bug where cancelling multiple in-flight requests could crash the vllm server
Fixes a bug where granite-3.x-8b models were not detected correctly, leading to VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS not functioning properly
Fixes a bug where the number of processors was not detected correctly for setting threading configs.
1. VLLM_SPYRE_NUM_CPUS is now available as a manual override to set the number of cpu cores available to vllm
Fixes a bug where attempting to run pooling models in continuous batching mode would crash, instead of defaulting to static batching
Fixes a bug where the lower bound of FMS was not properly specified
Disables prompt logprobs completely because it's still broken
Updates the "simple compile backend" to inductor to align with vLLM

disable prompt logprobs by @yannicks1 in #486
[docs] update docs continuous batching by @yannicks1 in #485
🐛 correct fms lower bound by @joerunde in #493
🎨 scheduler: make holdback queue a local variable by @yannicks1 in #465
[CB] 🐛 fix padding of position ids by @yannicks1 in #495
[s390x] Update s390x depencies by @nikheal2 in #494
fix: logits processors for CB by @wallashss in #484
[fp8] fix cb scheduler step tests by @yannicks1 in #491
🔥 remove auto-marked xfail for fp8, include fp8 tests by default, add xfail manually by @prashantgupta24 in #490
feat: add VLLM_SPYRE_NUM_CPUS and psutil to help with cpu checks by @tjohnson31415 in #487
🐛 implement better checking for granite by @joerunde in #500
Better Error Handling for attempts to run CB with pooling models by @gmarinho2 in #476
🔥 remove unused test parametrizations by @joerunde in #505
🔧 Update default simple compile backend by @joerunde in #506

Full Changelog: v1.0.0...v1.0.1