Skip to content

Bug: compact ooo head: chunk iter: cannot populate chunk X from block XX: invalid head chunk: not found #13683

@dmyerscough

Description

@dmyerscough

What is the bug?

We are running Mimir 2.17 and recently noticed compaction failures on the Mimir ingesters caused by missing head chunks.

compact ooo head: chunk iter: cannot populate chunk 1744774978 from block 0000000000XX000FZMPACTHEAD: invalid head chunk: not found

The error appears to originate from 1.

While looking into the issue, it seems possible that a race condition is involved as the head block is being GC'd before its compacted. I'm wondering whether it would be reasonable to skip over missing blocks that no longer exist.

diff --git i/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go w/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
index 21fedc8cf6..d47bc3980b 100644
--- i/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
+++ w/vendor/github.com/prometheus/prometheus/tsdb/ooo_head_read.go
@@ -279,7 +279,11 @@ func (cr *HeadAndOOOChunkReader) chunkOrIterable(meta chunks.Meta, copyLastChunk
                default:
                        _, cid, isOOO := unpackHeadChunkRef(m.Ref)
                        iterable, _, err := cr.head.chunkFromSeries(s, cid, isOOO, m.MinTime, m.MaxTime, isoState, copyLastChunk)
-                       if err != nil {
+                       // During compaction, the series can be garbage collected.
+                       // In that case, we should not error, but just ignore the chunk.
+                       if errors.Is(err, storage.ErrNotFound) {
+                               continue
+                       } else if err != nil {
                                return nil, nil, 0, fmt.Errorf("invalid head chunk: %w", err)
                        }

How to reproduce it?

I haven't been able to reproduce this. However, it seems to be constantly trigger in our Kubernetes environment.

What did you think would happen?

The compaction process should continue compacting other blocks that are present.

What was your environment?

Kubernetes

Any additional context to share?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions