-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Revertible Migration #1607
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Revertible Migration #1607
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR introduces a revertible migration feature for gh-ost, enabling users to reverse a completed migration by replaying DML events that occurred after the original cutover. The implementation adds checkpoint tracking at cutover time and a new --revert operation mode.
Key Changes
- Adds a new
Revert()method that applies post-cutover DML events from the checkpoint and performs a reverse cutover - Implements a checkpoint mechanism with an
IsCutoverflag to mark post-cutover checkpoints for revert operations - Introduces new command-line flags
--revertand--old-tableto support the revert workflow
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| go/logic/migrator.go | Core revert logic including new Revert() method, checkpoint after cutover, channel type change for lock processing, and DML-only execution function for revert mode |
| go/logic/checkpoint.go | Adds IsCutover boolean field to track post-cutover checkpoints |
| go/logic/applier.go | Updates checkpoint table schema with gh_ost_is_cutover column, removes NOT NULL constraints on iteration range columns, updates read/write methods |
| go/logic/applier_test.go | Updates tests for new checkpoint field and corrects timezone expectation |
| go/logic/inspect.go | Adds debug logging for table inspection |
| go/logic/migrator_test.go | Adds comprehensive TestRevert() test case and includes test MySQL config file |
| go/logic/my.cnf.test | MySQL configuration file for test containers with GTID enabled |
| go/sql/builder.go | Updates checkpoint insert query to include gh_ost_is_cutover column |
| go/sql/builder_test.go | Updates test expectations for modified checkpoint query |
| go/base/context.go | Adds Revert and OldTableName fields, modifies table naming logic for revert mode |
| go/cmd/gh-ost/main.go | Adds command-line flags for revert mode and corresponding validation |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
danieljoos
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is great!
|
👋 is there a plan to cut a release with this change? Last release was 2024. |
|
Love to see this shipped 🙌🏻 |
Description
This PR introduces revertible migrations, following the approach suggested by @jonahberquist and outlined in #302 (comment). In case a migration causes production impact after the cut-over, it can be reverted quickly while preserving the writes that happened after the cut-over.
Usage: When gh-ost is invoked with the
--checkpointflag and the migration completes, the migration can be reverted by invoking gh-ost again with the--revertflag and the--old-tableflag specifying the name of the "old" table from the first migration e.g._mytable_del. Also see docs/revert.md.Note that the checkpoint table (name ends with
_ghk) will not be automatically dropped unless--ok-to-drop-tableis provided.Hooks: gh-ost hook scripts now include the
GH_OST_REVERTenvironment variable with value "true" or "false", indicating if gh-ost is running in revert mode.Closes #302.
script/cibuildreturns with no formatting errors, build errors or unit test errors.Details
When gh-ost is invoked with
--checkpointit will write a final checkpoint to the checkpoint (_ghk) table after the cut-over is successful. This checkpoint contains the binlog coordinates of the last insertedAllEventsUpToLockProcessedrow in the changelog (_ghc) table. During the cut-over stage,AllEventsUpToLockProcessedis written to the_ghctable after the original table is locked. Therefore no writes after the checkpoint coordinates are applied to the original table.After the migration completes, the original table is renamed to e.g.
_mytable_del. Invokinggh-ostwith--revert --old-table="_mytable_del"will read the checkpoint and start application of DML events to_mytable_delstarting from the checkpoint coordinates. This is possible as long as the binlogs containing these coordinates still exist. The cut-over then proceeds the same as a regular migration, renaming the "old" table to_mytable_rev_del.In summary, the original table is now reinstated as
mytableand all writes since the first cut-over have been applied to it.Testing
On a testing MySQL primary-replica cluster, I created a sysbench test table with 5M rows:
I ran
gh-ostwith--checkpoint --gtid --alter="drop index k_1"while under sysbench oltp_write_only workload, with ~3000 qps.After the cut-over, I waited 60s before starting
gh-ost --revert. Before the revert migration finished, I killed the sysbench workload. When the revert was complete, I checksummed the original and_rev_deltables to verify data integrity ✅Next I repeated the test, but started the revert using a different replica than the original migration. ✅