[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427

juncaipeng · 2025-12-08T07:05:43Z

Motivation

Decode does not cache requests for preallocating resource in default

Modifications

env.FD_ENABLE_CACHE_TASK = 0

Usage or Command

None

Accuracy Tests

None

Checklist

Add at least a tag in the PR title.
- Tag list: [[FDConfig],[APIServer],[Engine], [Scheduler], [PD Disaggregation], [Executor], [Graph Optimization], [Speculative Decoding], [RL], [Models], [Quantization], [Loader], [OP], [KVCache], [DataProcessor], [BugFix], [Docs], [CI], [Optimization], [Feature], [Benchmark], [Others], [XPU], [HPU], [GCU], [DCU], [Iluvatar], [Metax]]
- You can add new tags based on the PR content, but the semantics must be clear.
Format your code, run pre-commit before commit.
Add unit tests. Please write the reason in this PR if no unit tests.
Provide accuracy results.
If the current PR is submitting to the release branch, make sure the PR has been submitted to the develop branch, then cherry-pick it to the release branch with the [Cherry-Pick] PR tag.

paddle-bot · 2025-12-08T07:05:49Z

Thanks for your contribution!

Copilot

Pull request overview

This PR changes the default value of the FD_ENABLE_CACHE_TASK environment variable from "1" (enabled) to "0" (disabled), which alters the default behavior of request caching for resource preallocation in the decode node. The change also updates the associated comment to better describe the setting's purpose.

Key Changes:

Default value of FD_ENABLE_CACHE_TASK changed from "1" to "0"
Comment updated to clarify that this controls whether decode caches requests for preallocating resources

Copilot · 2025-12-08T07:08:24Z

fastdeploy/envs.py

    "FD_ZMQ_CONTROL_CMD_SERVER_PORTS": lambda: os.getenv("FD_ZMQ_CONTROL_CMD_SERVER_PORTS", "8202"),
-    # Whether to enable cache task in decode node
-    "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
+    # Whether to enable decode caches requests for preallocating resource


Grammar issue: The comment should read "Whether to enable decode to cache requests" or "Whether decode caches requests". The current phrasing "enable decode caches requests" is grammatically incorrect.

Suggested change

# Whether to enable decode caches requests for preallocating resource

# Whether to enable decode to cache requests for preallocating resources

Copilot · 2025-12-08T07:08:24Z

fastdeploy/envs.py

-    # Whether to enable cache task in decode node
-    "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
+    # Whether to enable decode caches requests for preallocating resource
+    "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"),


Changing the default value from "1" (enabled) to "0" (disabled) is a breaking change that alters the default behavior of the decode node's request caching for resource preallocation. This could significantly impact existing deployments that rely on the default behavior.

According to the custom guidelines, the PR description should clearly explain why this modification is being made and what problem is being solved. While the motivation states "Decode does not cache requests for preallocating resource in default," it doesn't explain:

Why is the previous default behavior (caching enabled) problematic?

What issue does disabling it by default solve?

What is the impact on existing users?

Are there any migration steps for users who need the caching behavior?

Please expand the PR description to include this information so that users understand the rationale and impact of this breaking change.

Copilot · 2025-12-08T07:08:24Z

fastdeploy/envs.py

-    # Whether to enable cache task in decode node
-    "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "1"),
+    # Whether to enable decode caches requests for preallocating resource
+    "FD_ENABLE_CACHE_TASK": lambda: os.getenv("FD_ENABLE_CACHE_TASK", "0"),


This is a breaking API change. The default behavior is being inverted from enabled (1) to disabled (0). This change affects how the decode node handles request caching for resource preallocation across the entire system.

When enable_decode_cache_task is False (the new default), the code at line 1184-1188 in common_engine.py will immediately send an error back to the prefill node when resources are insufficient, rather than caching and waiting. This fundamentally changes the behavior of the PD (Prefill-Decode) disaggregation feature.

Consider:

Adding deprecation warnings in the current release before changing the default

Providing clear migration documentation

Adding a note in the CHANGELOG about this breaking change

Ensuring users can easily opt back into the previous behavior by setting FD_ENABLE_CACHE_TASK=1

codecov-commenter · 2025-12-08T08:34:54Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (develop@80efe98). Learn more about missing BASE report.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #5427   +/-   ##
==========================================
  Coverage           ?   59.13%           
==========================================
  Files              ?      327           
  Lines              ?    40646           
  Branches           ?     6171           
==========================================
  Hits               ?    24037           
  Misses             ?    14765           
  Partials           ?     1844

Flag	Coverage Δ
GPU	`59.13% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Decode does not cache requests for preallocating resource in default

beaa5a8

Copilot AI review requested due to automatic review settings December 8, 2025 07:05

Copilot started reviewing on behalf of juncaipeng December 8, 2025 07:06 View session

Copilot AI reviewed Dec 8, 2025

View reviewed changes

juncaipeng closed this Dec 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427

[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427

Uh oh!

juncaipeng commented Dec 8, 2025 •

edited

Loading

Uh oh!

paddle-bot bot commented Dec 8, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 8, 2025

Uh oh!

Copilot AI Dec 8, 2025

Uh oh!

Copilot AI Dec 8, 2025

Uh oh!

codecov-commenter commented Dec 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	# Whether to enable decode caches requests for preallocating resource
	# Whether to enable decode to cache requests for preallocating resources

[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427

[PD Disaggregation] Decode does not cache requests for preallocating resource in default #5427

Uh oh!

Conversation

juncaipeng commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Usage or Command

Accuracy Tests

Checklist

Uh oh!

paddle-bot bot commented Dec 8, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

codecov-commenter commented Dec 8, 2025

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

juncaipeng commented Dec 8, 2025 •

edited

Loading