Skip to content

[Bug] Error during model compilation ValueError: IntImm supports only int or uint type, but bool was supplied. #3389

@jimmyparadm

Description

@jimmyparadm

🐛 Bug

I am a new user of mlc-llm. I am following steps in the model compilation guide (https://llm.mlc.ai/docs/compilation/compile_models.html) and get error during the compile steps (ValueError: IntImm supports only int or uint type, but bool was supplied.)

To Reproduce

Steps to reproduce the behavior:

  1. Follow steps on official tutorial for Mac Metal (https://llm.mlc.ai/docs/compilation/compile_models.html)
  2. Get error when running mlc_llm compile

(mlc) paradm_mac@ParaDM-macs-Mac-Studio mlc % python -m mlc_llm compile ./dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json
--device metal -o dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
[2025-12-01 17:41:07] INFO auto_config.py:70: Found model configuration: dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json
[2025-12-01 17:41:08] INFO auto_device.py:82: Found device: metal:0
[2025-12-01 17:41:08] INFO auto_target.py:78: Found configuration of target device "metal:0": {'kind': 'metal', 'tag': '', 'keys': ['metal', 'gpu'], 'max_num_threads': 256, 'max_function_args': 31, 'thread_warp_size': 32, 'max_threads_per_block': 1024, 'max_shared_memory_per_block': 32768}
[17:41:08] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/target/llvm/llvm_instance.cc:226: Error: Using LLVM 19.1.7 with -mcpu=apple-m3 is not valid in -mtriple=arm64-apple-darwin24.5.0, using default -mcpu=generic
[2025-12-01 17:41:08] INFO auto_target.py:110: Found host LLVM triple: arm64-apple-darwin24.5.0
[2025-12-01 17:41:08] INFO auto_target.py:111: Found host LLVM CPU: apple-m3
[2025-12-01 17:41:08] INFO auto_config.py:154: Found model type: gpt_neox. Use --model-type to override.
[2025-12-01 17:41:08] INFO compile.py:233: Active vocab size from input config: 50277
Compiling with arguments:
--config GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={'active_vocab_size': 50277})
--quantization GroupQuantize(name='q4f16_1', kind='group-quant', group_size=32, quantize_dtype='int4', storage_dtype='uint32', model_dtype='float16', linear_weight_layout='NK', quantize_embedding=True, quantize_final_fc=True, num_elem_per_storage=8, num_storage_per_group=4, max_int_value=7, tensor_parallel_shards=0)
--model-type gpt_neox
--target {'kind': 'metal', 'tag': '', 'keys': ['metal', 'gpu'], 'host': {'kind': 'llvm', 'tag': '', 'keys': ['arm_cpu', 'cpu'], 'mcpu': 'apple-m3', 'mtriple': 'arm64-apple-darwin24.5.0'}, 'max_num_threads': 256, 'max_function_args': 31, 'thread_warp_size': 32, 'max_threads_per_block': 1024, 'max_shared_memory_per_block': 32768}
--opt flashinfer=0;cublas_gemm=0;faster_transformer=0;cudagraph=0;cutlass=0;ipc_allreduce_strategy=NONE
--system-lib-prefix ""
--output dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-metal.so
--overrides context_window_size=None;sliding_window_size=None;prefill_chunk_size=None;attention_sink_size=None;max_batch_size=None;tensor_parallel_shards=None;pipeline_parallel_stages=None;disaggregation=None
[2025-12-01 17:41:08] INFO compile.py:131: TOP LEVEL MODEL CONFIG BEFORE OVERRIDES: GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={'active_vocab_size': 50277})
[2025-12-01 17:41:08] INFO compile.py:142: Creating model from: GPTNeoXConfig(use_parallel_residual=False, hidden_size=2560, intermediate_size=10240, num_attention_heads=32, num_hidden_layers=32, layer_norm_eps=1e-05, vocab_size=50432, rotary_pct=1.0, position_embedding_base=10000, context_window_size=2048, head_dim=80, prefill_chunk_size=2048, tensor_parallel_shards=1, ffn_out_dtype='float32', max_batch_size=128, kwargs={})
[2025-12-01 17:41:08] INFO compile.py:160: Exporting the model to TVM compiler
[2025-12-01 17:41:09] INFO compile.py:166: Running optimizations using TVM
[2025-12-01 17:41:09] INFO compile.py:192: Registering metadata: {'model_type': 'gpt_neox', 'quantization': 'q4f16_1', 'context_window_size': 2048, 'sliding_window_size': -1, 'attention_sink_size': -1, 'prefill_chunk_size': 2048, 'tensor_parallel_shards': 1, 'pipeline_parallel_stages': 1, 'disaggregation': False, 'kv_state_kind': 'kv_cache', 'max_batch_size': 128, 'active_vocab_size': 50277}
error: Check failed: (dtype.is_int() || dtype.is_uint()) is false: ValueError: IntImm supports only int or uint type, but bool was supplied.
--> /Users/paradm_mac/miniconda3/envs/mlc/lib/python3.11/site-packages/mlc_llm/op/batch_spec_verify.py:114:25
|
114 | done[0] = False
| ^^^^^^^^^^^^^^^
note: run with TVM_BACKTRACE=1 environment variable to display a backtrace.
[17:41:09] /Users/catalyst/Workspace/mlc-ai-package-self-runner/_work/package/package/tvm/src/relax/ir/block_builder.cc:64: Warning: BlockBuilder destroyed with remaining blocks!

Expected behavior

Environment

  • Platform: metal
  • Operating system: mac os
  • Device: mac studio m3 ultra
  • How you installed MLC-LLM (conda, source): conda, pip
  • How you installed TVM (pip, source): conda, pip
  • Python version (e.g. 3.10): 3.11.14
  • GPU driver version (if applicable):
  • CUDA/cuDNN version (if applicable):
  • TVM Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
    BUILD_STATIC_RUNTIME: OFF
    BUILD_DUMMY_LIBTVM: OFF
    COMPILER_RT_PATH: 3rdparty/compiler-rt
    CUDA_VERSION: NOT-FOUND
    DMLC_PATH: 3rdparty/dmlc-core/include
    GIT_COMMIT_HASH: 791909d217548937626f98743c33620116b195b5
    GIT_COMMIT_TIME: 2025-11-12 13:37:43 -0500
    HIDE_PRIVATE_SYMBOLS: ON
    INDEX_DEFAULT_I64: ON
    INSTALL_DEV: OFF
    LLVM_VERSION: 19.1.7
    MLIR_VERSION: NOT-FOUND
    PICOJSON_PATH: 3rdparty/picojson
    RANG_PATH: 3rdparty/rang/include
    ROCM_PATH: /opt/rocm
    SUMMARIZE: OFF
    TVM_CXX_COMPILER_PATH: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/clang++
    USE_ALTERNATIVE_LINKER: AUTO
    USE_ARM_COMPUTE_LIB_GRAPH_EXECUTOR: OFF
    USE_ARM_COMPUTE_LIB: OFF
    USE_BLAS: none
    USE_BNNS: OFF
    USE_BYODT_POSIT: OFF
    USE_COREML: OFF
    USE_CPP_RPC: OFF
    USE_CPP_RTVM:
    USE_CUBLAS: OFF
    USE_CUDA: OFF
    USE_NVTX: OFF
    USE_NCCL: OFF
    USE_MSCCL: OFF
    USE_CUDNN: OFF
    USE_CUSTOM_LOGGING: OFF
    USE_CUTLASS: OFF
    USE_AMX: OFF
    USE_DNNL: OFF
    USE_FALLBACK_STL_MAP: OFF
    USE_GTEST: AUTO
    USE_HEXAGON: OFF
    USE_HEXAGON_RPC: OFF
    USE_HEXAGON_SDK: /path/to/sdk
    USE_HEXAGON_GTEST: /path/to/hexagon/gtest
    USE_HEXAGON_EXTERNAL_LIBS: OFF
    USE_IOS_RPC: OFF
    USE_KHRONOS_SPIRV: OFF
    USE_LIBBACKTRACE: AUTO
    USE_LIBTORCH: OFF
    USE_LLVM: llvm-config --link-static
    USE_MLIR: OFF
    USE_METAL: ON
    USE_MIOPEN: OFF
    USE_MKL: OFF
    USE_MRVL: OFF
    USE_MSVC_MT: OFF
    USE_NNPACK: OFF
    USE_OPENCL: OFF
    USE_OPENCL_ENABLE_HOST_PTR: OFF
    USE_OPENCL_EXTN_QCOM: NOT-FOUND
    USE_OPENCL_GTEST: /path/to/opencl/gtest
    USE_OPENMP: OFF
    USE_PAPI: OFF
    USE_RANDOM: ON
    TVM_DEBUG_WITH_ABI_CHANGE: OFF
    TVM_LOG_BEFORE_THROW: OFF
    USE_ROCBLAS: OFF
    USE_HIPBLAS: OFF
    USE_ROCM: OFF
    USE_RCCL: OFF
    USE_RPC: ON
    TVM_BUILD_PYTHON_MODULE: ON
    USE_RTTI: ON
    USE_RUST_EXT: OFF
    USE_SORT: ON
    USE_SPIRV_KHR_INTEGER_DOT_PRODUCT: OFF
    USE_TENSORFLOW_PATH: none
    USE_TENSORRT_CODEGEN: OFF
    USE_TENSORRT_RUNTIME: OFF
    USE_TFLITE: OFF
    USE_THREADS: ON
    USE_THRUST: OFF
    USE_CURAND: OFF
    USE_VULKAN: OFF
    USE_CLML: OFF
    TVM_CLML_VERSION:
    USE_CLML_GRAPH_EXECUTOR: OFF
    USE_UMA: OFF
    USE_MSC: OFF
    USE_CCACHE: AUTO
    USE_NVSHMEM: OFF
    USE_NNAPI_CODEGEN: OFF
    USE_NNAPI_RUNTIME: OFF
    BACKTRACE_ON_SEGFAULT: OFF
  • Any other relevant information:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugConfirmed bugs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions