-
Notifications
You must be signed in to change notification settings - Fork 664
[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Thanks for your contribution! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR changes the default value of the FD_ENABLE_CACHE_TASK environment variable from "1" (enabled) to "0" (disabled), which alters the default behavior of request caching for resource preallocation in the decode node. The change also updates the associated comment to better describe the setting's purpose.
Key Changes:
- Default value of
FD_ENABLE_CACHE_TASKchanged from "1" to "0" - Comment updated to clarify that this controls whether decode caches requests for preallocating resources
| "FD_ZMQ_CONTROL_CMD_SERVER_PORTS": lambda: os.getenv("FD_ZMQ_CONTROL_CMD_SERVER_PORTS", "8202"), | ||
| # Whether to enable cache task in decode node | ||
| "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"), | ||
| # Whether to enable decode caches requests for preallocating resource |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grammar issue: The comment should read "Whether to enable decode to cache requests" or "Whether decode caches requests". The current phrasing "enable decode caches requests" is grammatically incorrect.
| # Whether to enable decode caches requests for preallocating resource | |
| # Whether to enable decode to cache requests for preallocating resources |
| # Whether to enable cache task in decode node | ||
| "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"), | ||
| # Whether to enable decode caches requests for preallocating resource | ||
| "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"), |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing the default value from "1" (enabled) to "0" (disabled) is a breaking change that alters the default behavior of the decode node's request caching for resource preallocation. This could significantly impact existing deployments that rely on the default behavior.
According to the custom guidelines, the PR description should clearly explain why this modification is being made and what problem is being solved. While the motivation states "Decode does not cache requests for preallocating resource in default," it doesn't explain:
- Why is the previous default behavior (caching enabled) problematic?
- What issue does disabling it by default solve?
- What is the impact on existing users?
- Are there any migration steps for users who need the caching behavior?
Please expand the PR description to include this information so that users understand the rationale and impact of this breaking change.
| # Whether to enable cache task in decode node | ||
| "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"), | ||
| # Whether to enable decode caches requests for preallocating resource | ||
| "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"), |
Copilot
AI
Dec 8, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a breaking API change. The default behavior is being inverted from enabled (1) to disabled (0). This change affects how the decode node handles request caching for resource preallocation across the entire system.
When enable_decode_cache_task is False (the new default), the code at line 1184-1188 in common_engine.py will immediately send an error back to the prefill node when resources are insufficient, rather than caching and waiting. This fundamentally changes the behavior of the PD (Prefill-Decode) disaggregation feature.
Consider:
- Adding deprecation warnings in the current release before changing the default
- Providing clear migration documentation
- Adding a note in the CHANGELOG about this breaking change
- Ensuring users can easily opt back into the previous behavior by setting
FD_ENABLE_CACHE_TASK=1
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #5427 +/- ##
==========================================
Coverage ? 59.13%
==========================================
Files ? 327
Lines ? 40646
Branches ? 6171
==========================================
Hits ? 24037
Misses ? 14765
Partials ? 1844
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Motivation
Decode does not cache requests for preallocating resource in default
Modifications
env.FD_ENABLE_CACHE_TASK = 0
Usage or Command
None
Accuracy Tests
None
Checklist
[FDConfig],[APIServer],[Engine],[Scheduler],[PD Disaggregation],[Executor],[Graph Optimization],[Speculative Decoding],[RL],[Models],[Quantization],[Loader],[OP],[KVCache],[DataProcessor],[BugFix],[Docs],[CI],[Optimization],[Feature],[Benchmark],[Others],[XPU],[HPU],[GCU],[DCU],[Iluvatar],[Metax]]pre-commitbefore commit.releasebranch, make sure the PR has been submitted to thedevelopbranch, then cherry-pick it to thereleasebranch with the[Cherry-Pick]PR tag.