-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
🐛 Bug
I am a new user of mlc-llm. I am following steps in the model compilation guide (https://llm.mlc.ai/docs/compilation/compile_models.html) and get error during the compile steps (ValueError: IntImm supports only int or uint type, but bool was supplied.)
To Reproduce
Steps to reproduce the behavior:
- Follow steps on official tutorial for Mac Metal (https://llm.mlc.ai/docs/compilation/compile_models.html)
- Get error when running mlc_llm compile
(mlc) paradm_mac@ParaDM-macs-Mac-Studio mlc % python -m mlc_llm compile ./dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json
--device metal -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
[2025-12-01 17:41:07] INFO auto_config.py:70: Found model configuration: dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json
[2025-12-01 17:41:08] INFO auto_device.py:82: Found device: metal:0
[2025-12-01 17:41:08] INFO auto_target.py:78: Found configuration of target device "metal:0": {'kind': 'metal', 'tag': '', 'keys': ['metal', 'gpu'], 'max_num_threads': 256, 'max_function_args': 31, 'thread_warp_size': 32, 'max_threads_per_block': 1024, 'max_shared_memory_per_block': 32768}
[17:41:08] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.7 with -mcpu=apple-m3 is not valid in -mtriple=arm64-apple-darwin24.5.0, using default -mcpu=generic
[2025-12-01 17:41:08] INFO auto_target.py:110: Found host LLVM triple: arm64-apple-darwin24.5.0
[2025-12-01 17:41:08] INFO auto_target.py:111: Found host LLVM CPU: apple-m3
[2025-12-01 17:41:08] INFO auto_config.py:154: Found model type: gpt_neox. Use --model-type to override.
[2025-12-01 17:41:08] INFO compile.py:233: Active vocab size from input config: 50277
Compiling with arguments:
--config GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={'active_vocab_size': 50277})
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
--model-type gpt_neox
--target {'kind': 'metal', 'tag': '', 'keys': ['metal', 'gpu'], 'host': {'kind': 'llvm', 'tag': '', 'keys': ['arm_cpu', 'cpu'], 'mcpu': 'apple-m3', 'mtriple': 'arm64-apple-darwin24.5.0'}, 'max_num_threads': 256, 'max_function_args': 31, 'thread_warp_size': 32, 'max_threads_per_block': 1024, 'max_shared_memory_per_block': 32768}
--opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
--system-lib-prefix ""
--output dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
--overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None;disaggregation=None
[2025-12-01 17:41:08] INFO compile.py:131: TOP LEVEL MODEL CONFIG BEFORE OVERRIDES: GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={'active_vocab_size': 50277})
[2025-12-01 17:41:08] INFO compile.py:142: Creating model from: GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={})
[2025-12-01 17:41:08] INFO compile.py:160: Exporting the model to TVM compiler
[2025-12-01 17:41:09] INFO compile.py:166: Running optimizations using TVM
[2025-12-01 17:41:09] INFO compile.py:192: Registering metadata: {'model_type': 'gpt_neox', 'quantization': 'q4f16_1', 'context_window_size': 2048, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'disaggregation': False, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128, 'active_vocab_size': 50277}
error: Check failed: (dtype.is_int() || dtype.is_uint()) is false: ValueError: IntImm supports only int or uint type, but bool was supplied.
--> /Users/paradm_mac/miniconda3/envs/mlc/lib/python3.11/site-packages/mlc_llm/op/batch_spec_verify.py:114:25
|
114 | done[0] = False
| ^^^^^^^^^^^^^^^
note: run with TVM_BACKTRACE=1 environment variable to display a backtrace.
[17:41:09] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/relax/ir/block_builder.cc:64: Warning: BlockBuilder destroyed with remaining blocks!
Expected behavior
Environment
- Platform: metal
- Operating system: mac os
- Device: mac studio m3 ultra
- How you installed MLC-LLM (
conda, source): conda, pip - How you installed TVM (
pip, source): conda, pip - Python version (e.g. 3.10): 3.11.14
- GPU driver version (if applicable):
- CUDA/cuDNN version (if applicable):
- TVM Hash Tag (
python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
BUILD_STATIC_RUNTIME: OFF
BUILD_DUMMY_LIBTVM: OFF
COMPILER_RT_PATH: 3rdparty/compiler-rt
CUDA_VERSION: NOT-FOUND
DMLC_PATH: 3rdparty/dmlc-core/include
GIT_COMMIT_HASH: 791909d217548937626f98743c33620116b195b5
GIT_COMMIT_TIME: 2025-11-12 13:37:43 -0500
HIDE_PRIVATE_SYMBOLS: ON
INDEX_DEFAULT_I64: ON
INSTALL_DEV: OFF
LLVM_VERSION: 19.1.7
MLIR_VERSION: NOT-FOUND
PICOJSON_PATH: 3rdparty/picojson
RANG_PATH: 3rdparty/rang/include
ROCM_PATH: /opt/rocm
SUMMARIZE: OFF
TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++
USE_ALTERNATIVE_LINKER: AUTO
USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
USE_ARM_COMPUTE_LIB: OFF
USE_BLAS: none
USE_BNNS: OFF
USE_BYODT_POSIT: OFF
USE_COREML: OFF
USE_CPP_RPC: OFF
USE_CPP_RTVM:
USE_CUBLAS: OFF
USE_CUDA: OFF
USE_NVTX: OFF
USE_NCCL: OFF
USE_MSCCL: OFF
USE_CUDNN: OFF
USE_CUSTOM_LOGGING: OFF
USE_CUTLASS: OFF
USE_AMX: OFF
USE_DNNL: OFF
USE_FALLBACK_STL_MAP: OFF
USE_GTEST: AUTO
USE_HEXAGON: OFF
USE_HEXAGON_RPC: OFF
USE_HEXAGON_SDK: /path/to/sdk
USE_HEXAGON_GTEST: /path/to/hexagon/gtest
USE_HEXAGON_EXTERNAL_LIBS: OFF
USE_IOS_RPC: OFF
USE_KHRONOS_SPIRV: OFF
USE_LIBBACKTRACE: AUTO
USE_LIBTORCH: OFF
USE_LLVM: llvm-config --link-static
USE_MLIR: OFF
USE_METAL: ON
USE_MIOPEN: OFF
USE_MKL: OFF
USE_MRVL: OFF
USE_MSVC_MT: OFF
USE_NNPACK: OFF
USE_OPENCL: OFF
USE_OPENCL_ENABLE_HOST_PTR: OFF
USE_OPENCL_EXTN_QCOM: NOT-FOUND
USE_OPENCL_GTEST: /path/to/opencl/gtest
USE_OPENMP: OFF
USE_PAPI: OFF
USE_RANDOM: ON
TVM_DEBUG_WITH_ABI_CHANGE: OFF
TVM_LOG_BEFORE_THROW: OFF
USE_ROCBLAS: OFF
USE_HIPBLAS: OFF
USE_ROCM: OFF
USE_RCCL: OFF
USE_RPC: ON
TVM_BUILD_PYTHON_MODULE: ON
USE_RTTI: ON
USE_RUST_EXT: OFF
USE_SORT: ON
USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
USE_TENSORFLOW_PATH: none
USE_TENSORRT_CODEGEN: OFF
USE_TENSORRT_RUNTIME: OFF
USE_TFLITE: OFF
USE_THREADS: ON
USE_THRUST: OFF
USE_CURAND: OFF
USE_VULKAN: OFF
USE_CLML: OFF
TVM_CLML_VERSION:
USE_CLML_GRAPH_EXECUTOR: OFF
USE_UMA: OFF
USE_MSC: OFF
USE_CCACHE: AUTO
USE_NVSHMEM: OFF
USE_NNAPI_CODEGEN: OFF
USE_NNAPI_RUNTIME: OFF
BACKTRACE_ON_SEGFAULT: OFF - Any other relevant information: