Skip to content

v1.0.1

Choose a tag to compare

@joerunde joerunde released this 06 Oct 20:41
· 64 commits to main since this release
0ae7872

1.0.1 Bugfix Release

This Release:

  1. Fixes a bug where cancelling multiple in-flight requests could crash the vllm server
  2. Fixes a bug where granite-3.x-8b models were not detected correctly, leading to VLLM_SPYRE_REQUIRE_PRECOMPILED_DECODERS not functioning properly
  3. Fixes a bug where the number of processors was not detected correctly for setting threading configs.
    1. VLLM_SPYRE_NUM_CPUS is now available as a manual override to set the number of cpu cores available to vllm
  4. Fixes a bug where attempting to run pooling models in continuous batching mode would crash, instead of defaulting to static batching
  5. Fixes a bug where the lower bound of FMS was not properly specified
  6. Disables prompt logprobs completely because it's still broken
  7. Updates the "simple compile backend" to inductor to align with vLLM

What's Changed

New Contributors

Full Changelog: v1.0.0...v1.0.1