Add Metal support for GGML_OP_DIAG_MASK_INF for CLIP on Apple GPUs #1395
+120
−0
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add a Metal implementation of
GGML_OP_DIAG_MASK_INFso the op can run on Apple GPUs.Details
Previously, the Metal backend didn't support
GGML_OP_DIAG_MASK_INF, which caused models using this op to fail with:For example,
stable-diffusion.cpp’s SDXL CLIP path could not run withkeep_clip_on_cpu = falseon Metal.This PR:
GGML_OP_DIAG_MASK_INFmatching the existing CPU/CUDA semantics.ggml-metaldispatch so attention masks are applied correctly for batched tensors on Apple GPUs.Related: leejet/stable-diffusion.cpp#1040