-
Notifications
You must be signed in to change notification settings - Fork 680
Open
Labels
bugSomething isn't workingSomething isn't working
Description
What is the bug?
We are running Mimir 2.17 and recently noticed compaction failures on the Mimir ingesters caused by missing head chunks.
compact ooo head: chunk iter: cannot populate chunk 1744774978 from block 0000000000XX000FZMPACTHEAD: invalid head chunk: not found
The error appears to originate from 1.
While looking into the issue, it seems possible that a race condition is involved as the head block is being GC'd before its compacted. I'm wondering whether it would be reasonable to skip over missing blocks that no longer exist.
diff --git i/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go w/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
index 21fedc8cf6..d47bc3980b 100644
--- i/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
+++ w/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
@@ -279,7 +279,11 @@ func (cr *HeadAndOOOChunkReader) chunkOrIterable(meta chunks.Meta, copyLastChunk
default:
_, cid, isOOO := unpackHeadChunkRef(m.Ref)
iterable, _, err := cr.head.chunkFromSeries(s, cid, isOOO, m.MinTime, m.MaxTime, isoState, copyLastChunk)
- if err != nil {
+ // During compaction, the series can be garbage collected.
+ // In that case, we should not error, but just ignore the chunk.
+ if errors.Is(err, storage.ErrNotFound) {
+ continue
+ } else if err != nil {
return nil, nil, 0, fmt.Errorf("invalid head chunk: %w", err)
}
How to reproduce it?
I haven't been able to reproduce this. However, it seems to be constantly trigger in our Kubernetes environment.
What did you think would happen?
The compaction process should continue compacting other blocks that are present.
What was your environment?
Kubernetes
Any additional context to share?
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working