-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Your current environment
The output of commands above
Your output of commands above
🐛 Describe the bug
Crash when using fp8 kv quant
--kv-cache-dtype=fp8
Crash log
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] EngineCore failed to start.
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] Traceback (most recent call last):
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 834, in run_engine_core
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] engine_core = EngineCoreProc(*args, **kwargs)
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 610, in __init__
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] super().__init__(
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 102, in __init__
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self.model_executor = executor_class(vllm_config)
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/executor/abstract.py", line 101, in __init__
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self._init_executor()
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/executor/uniproc_executor.py", line 47, in _init_executor
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self.driver_worker.init_device()
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/vllm/vllm/v1/worker/worker_base.py", line 326, in init_device
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self.worker.init_device() # type: ignore
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/worker/tpu_worker.py", line 253, in init_device
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self.model_runner = TPUModelRunner(
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/runner/tpu_runner.py", line 264, in __init__
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] self.kv_cache_dtype = to_torch_dtype(cache_dtype)
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/utils.py", line 56, in to_torch_dtype
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] return j2t_dtype(dtype)
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] ^^^^^^^^^^^^^^^^
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] File "/home/hning_google_com/vllm-test/.venv/lib/python3.12/site-packages/torchax/ops/mappings.py", line 145, in j2t_dtype
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] raise RuntimeError(
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] RuntimeError: Attempting to convert unknown type: <class 'jax.numpy.float8_e4m3fn'> to torch type,
(EngineCore_DP0 pid=619982) Process EngineCore_DP0:
Before submitting a new issue...
- Make sure you already searched for relevant issues and checked the documentation page, which can answer lots of frequently asked questions.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working