Skip to content

Conversation

@siddharth16396
Copy link
Contributor

@siddharth16396 siddharth16396 commented Dec 3, 2025

This PR is clone of #18905 which i messed up pretty badly while rebasing for DCO commits 😢

Description

This PR Replaces periodic config polling (60s intervals) with event-driven topology server watches for QueryThrottler configuration updates. Instead of continuously reading config files on a timer, the throttler now subscribes to SrvKeyspace changes and reacts immediately when configs are updated in the topo server.

Changes Made

Removed the polling model:

  • Deleted ConfigLoader interface and FileBasedConfigLoader implementation.
  • Removed 60-second refresh loop from QueryThrottler.
  • Eliminated file-based config loading entirely.

Added event-driven topo watch:

  • Introduced startSrvKeyspaceWatch() that uses srvtopo.WatchSrvKeyspace() for real-time updates
  • Added HandleConfigUpdate() callback for processing config changes
  • Implemented smart deduplication to avoid redundant strategy reloads
  • Added resilient error handling (continues watching on transient errors, stops only on permanent ones like NoNode)

New proto definitions:

  • Created querythrottler.proto with Config, ThrottlingStrategy, and nested rule structures
  • Added incoming_query_throttler_config field to SrvKeyspace message in topodata.proto
  • Implemented ConfigFromProto() conversion between protobuf and internal config representation

Architecture improvements:

  • Config now lives in topology server (centralized, versioned, observable)
  • Tablets get instant updates when operators change throttling rules
  • Initial config loaded synchronously during InitDBConfig() to ensure correct state on startup/restart
  • Watch continues indefinitely with auto-retry for transient failures

How it works

  1. Startup: InitDBConfig() sets the keyspace and immediately fetches initial config via GetSrvKeyspace()
  2. Watch: Background goroutine subscribes to WatchSrvKeyspace() for the cell+keyspace
  3. Updates: When config changes in topo, HandleConfigUpdate() fires, deduplicates, and hot-swaps the strategy
  4. Resilience: Network blips are retried automatically; only fatal errors (context canceled, keyspace deleted) stop the watch

Benefits achieved:

  • Faster propagation: Config changes apply immediately instead of waiting up to 60s
  • Less load: No more periodic file reads from every tablet
  • Better ops: Centralized config in topo server (can use vtctldclient to update)
  • Consistency: All tablets see changes at roughly the same time
  • Debugging: Topo changes are auditable (vs. scattered config files)

Related Issue(s)

Checklist

  • "Backport to:" labels have been added if this change should be back-ported to release branches
  • If this change is to be back-ported to previous releases, a justification is included in the PR description
  • Tests were added or are not required
  • Did the new or modified tests pass consistently locally and on CI?
  • Documentation was added or is not required

@vitess-bot
Copy link
Contributor

vitess-bot bot commented Dec 3, 2025

Review Checklist

Hello reviewers! 👋 Please follow this checklist when reviewing this Pull Request.

General

  • Ensure that the Pull Request has a descriptive title.
  • Ensure there is a link to an issue (except for internal cleanup and flaky test fixes), new features should have an RFC that documents use cases and test cases.

Tests

  • Bug fixes should have at least one unit or end-to-end test, enhancement and new features should have a sufficient number of tests.

Documentation

  • Apply the release notes (needs details) label if users need to know about this change.
  • New features should be documented.
  • There should be some code comments as to why things are implemented the way they are.
  • There should be a comment at the top of each new or modified test to explain what the test does.

New flags

  • Is this flag really necessary?
  • Flag names must be clear and intuitive, use dashes (-), and have a clear help text.

If a workflow is added or modified:

  • Each item in Jobs should be named in order to mark it as required.
  • If the workflow needs to be marked as required, the maintainer team must be notified.

Backward compatibility

  • Protobuf changes should be wire-compatible.
  • Changes to _vt tables and RPCs need to be backward compatible.
  • RPC changes should be compatible with vitess-operator
  • If a flag is removed, then it should also be removed from vitess-operator and arewefastyet, if used there.
  • vtctl command output order should be stable and awk-able.

@vitess-bot vitess-bot bot added NeedsBackportReason If backport labels have been applied to a PR, a justification is required NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsIssue A linked issue is missing for this Pull Request NeedsWebsiteDocsUpdate What it says labels Dec 3, 2025
@github-actions github-actions bot added this to the v24.0.0 milestone Dec 3, 2025
Signed-off-by: siddharth16396 <[email protected]>
@mattlord mattlord added Type: Enhancement Logical improvement (somewhere between a bug and feature) and removed NeedsDescriptionUpdate The description is not clear or comprehensive enough, and needs work NeedsWebsiteDocsUpdate What it says NeedsIssue A linked issue is missing for this Pull Request NeedsBackportReason If backport labels have been applied to a PR, a justification is required labels Dec 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Component: Query Serving Component: Throttler Type: Enhancement Logical improvement (somewhere between a bug and feature)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants