Skip to content

[Bug]: Crash when using fp8 kv quant #1234

@helloworld1

Description

@helloworld1

Your current environment

The output of commands above
Your output of commands above

🐛 Describe the bug

Crash when using fp8 kv quant

--kv-cache-dtype=fp8

Crash log

(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] EngineCore failed to start.                                                                                                                                                                                                 
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] Traceback (most recent call last):                                                                                                                                                                                          
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 834, in run_engine_core                                                                                                                         
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     engine_core = EngineCoreProc(*args, **kwargs)                                                                                                                                                                           
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                           
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 610, in __init__                                                                                            
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     super().__init__(                                                                                                                                                                                                       
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/engine/core.py", line 102, in __init__                                                                                            
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self.model_executor = executor_class(vllm_config)                                                                                                                                                                       
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                       
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/executor/abstract.py", line 101, in __init__                                                                                                                          
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self._init_executor()                                                                                                                                                                                                   
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/executor/uniproc_executor.py", line 47, in _init_executor                                                                                                             
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self.driver_worker.init_device()                                                                                                                                                                                        
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/vllm/vllm/v1/worker/worker_base.py", line 326, in init_device                                                                                                                      
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self.worker.init_device()  # type: ignore                                                                                                                                                                               
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     ^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                                               
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/worker/tpu_worker.py", line 253, in init_device                                                                                                        
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self.model_runner = TPUModelRunner(                                                                                                                                                                                     
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]                         ^^^^^^^^^^^^^^^                                                                                                                                                                                     
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/runner/tpu_runner.py", line 264, in __init__                                                                                                           
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     self.kv_cache_dtype = to_torch_dtype(cache_dtype)                                                                                                                                                                       
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                                                                                       
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/tpu-inference/tpu_inference/utils.py", line 56, in to_torch_dtype                                                                                                                  
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     return j2t_dtype(dtype)                                                                                                                                                                                                 
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]            ^^^^^^^^^^^^^^^^                                                                                                                                                                                                 
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]   File "/home/hning_google_com/vllm-test/.venv/lib/python3.12/site-packages/torchax/ops/mappings.py", line 145, in j2t_dtype                                                                                                
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843]     raise RuntimeError(                                                                                                                                                                                                     
(EngineCore_DP0 pid=619982) ERROR 12-03 03:19:14 [core.py:843] RuntimeError: Attempting to convert unknown type: <class 'jax.numpy.float8_e4m3fn'> to torch type,                                                                                                                          
(EngineCore_DP0 pid=619982) Process EngineCore_DP0:                                                                                                                   

Before submitting a new issue...

  • Make sure you already searched for relevant issues and checked the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions