Skip to content

Conversation

@taradaidv
Copy link

Summary

Add a Metal implementation of GGML_OP_DIAG_MASK_INF so the op can run on Apple GPUs.

Details

Previously, the Metal backend didn't support GGML_OP_DIAG_MASK_INF, which caused models using this op to fail with:

unsupported op 'DIAG_MASK_INF'

For example, stable-diffusion.cpp’s SDXL CLIP path could not run with keep_clip_on_cpu = false on Metal.

This PR:

  • Implements a Metal kernel for GGML_OP_DIAG_MASK_INF matching the existing CPU/CUDA semantics.
  • Wires it into ggml-metal dispatch so attention masks are applied correctly for batched tensors on Apple GPUs.

Related: leejet/stable-diffusion.cpp#1040

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant