v1.1.0

joerunde released this 10 Oct 23:28

· 55 commits to main since this release

dff277b

v1.1.0

⬆️ Adds support for vllm v0.11.0
🔥 Drops support for vllm v0.10.1.1
✨ Writes performance metrics to file when VLLM_SPYRE_PERF_METRIC_LOGGING_ENABLED is set
🐛 Fixes a bug where incorrect logits processors were applied to requests under load
🐛 Fixes a bug where /chat/completions required a user-specified max_tokens param to function

What's Changed

fix: unbatch removals of requests from input_batch by @tjohnson31415 in #511
🐛 fixup more tests to use the default max model length by @joerunde in #512
✨ Add vLLM 0.11.0 support by @joerunde in #513
[CB] consistent max context length by @yannicks1 in #514
[docs] rephrase comment about continuous batching configuration by @yannicks1 in #518
[CB] set new_tokens to to max value given the constraints by @yannicks1 in #516
✨ add debug perf logger by @joerunde in #515

Full Changelog: v1.0.2...v1.1.0

Contributors

joerunde, tjohnson31415, and yannicks1

Assets 2