Skip to content

fix: add Promise.race hard timeout to feature flags fetch (#465)#467

Merged
edgarsskore merged 2 commits into
mainfrom
fix/feature-flags-fetch-timeout
May 11, 2026
Merged

fix: add Promise.race hard timeout to feature flags fetch (#465)#467
edgarsskore merged 2 commits into
mainfrom
fix/feature-flags-fetch-timeout

Conversation

@wonderwhy-er

@wonderwhy-er wonderwhy-er commented May 8, 2026

Copy link
Copy Markdown
Owner

User description

Problem

On Windows + Node 24 / undici 7.x, AbortController.abort() fails to interrupt an in-progress TCP connect. The fetch hangs until the OS-level TCP timeout (~30s on Windows), blocking the MCP initialize response for users on high-latency networks.

Reported by a user in Australia — every cold start of Desktop Commander blocks for ~30s. With WiFi disabled (DNS fails fast), startup completes in ~1.6s.

See #465 for full details and logs.

Root Cause

Two paths where the slow fetch can block startup:

  1. fetchFlags() uses AbortController with a 5s timeout, but on affected platforms the abort signal doesn't interrupt the TCP socket — it hangs for ~30s
  2. waitForFreshFlags() (called in the MCP initialize handler for new users without a cache) awaits the fetch promise with no upper bound

Fix

  • fetchFlags(): Wrap fetch in Promise.race with a hard 3s timeout alongside AbortController. Even if AbortController fails to interrupt the socket, Promise.race ensures rejection at the JS level.
  • waitForFreshFlags(): Add 5s safety timeout so it can never hang indefinitely.
  • Timeout lowered from 5s to 3s — flags load from cache anyway; a fresh fetch isn't worth perceived startup latency.

Tests

Added test/test-feature-flags-timeout.js with 6 tests:

Test What it does
AbortController behavior (2s) Verifies abort interrupts fetch on this platform
Current pattern (black-hole) Promise.race pattern against server that never responds
AbortController-only (baseline) Pre-fix pattern for comparison
Slow response (15s server) Verifies timeout fires before slow response arrives
New-user onboarding path Simulates waitForFreshFlags() with hanging fetch
Broken AbortController simulation Mocks a fetch ignoring abort (20s), proves Promise.race recovers to 5s

Note: Tests pass on macOS where AbortController works correctly. The broken AbortController simulation (test 6) proves the fix works regardless of platform. Needs testing on Windows to confirm the real-world fix.

Fixes #465


CodeAnt-AI Description

Prevent feature flags fetch from blocking startup

What Changed

  • Feature flags now stop waiting after 3 seconds, even if the network request does not cancel cleanly
  • New-user startup now has a 5-second safety limit when waiting for fresh flags, so the MCP initialize flow does not hang
  • Added tests that cover slow responses, stuck connections, and the new-user startup path

Impact

✅ Faster MCP startup on slow networks
✅ Fewer startup hangs for new users
✅ Clearer protection against stalled feature-flag fetches

🔄 Retrigger CodeAnt AI Review

Details

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

Summary by CodeRabbit

  • Bug Fixes

    • Improved feature-flag fetching so app startup and updates no longer hang indefinitely under slow or unresponsive network conditions.
  • Tests

    • Added end-to-end tests that validate timeout and resilience behaviors for feature-flag fetching across various network scenarios.

Review Change Stack

On Windows + Node 24 / undici 7.x, AbortController.abort() fails to
interrupt an in-progress TCP connect — the fetch hangs until the OS-level
TCP timeout (~30s). This blocks the MCP initialize response for users
on high-latency networks (e.g. Australia).

Changes:
- fetchFlags(): wrap fetch in Promise.race with hard 3s timeout alongside
  AbortController, so the JS-level timeout is enforced regardless of
  whether AbortController actually interrupts the underlying socket
- waitForFreshFlags(): add 5s safety timeout so it can never hang
  indefinitely (protects the new-user onboarding path that awaits this
  inside the MCP initialize handler)
- Lower fetch timeout from 5s to 3s — flags load from cache anyway,
  a fresh fetch is not worth perceived startup latency
- Add test/test-feature-flags-timeout.js with 6 tests covering:
  AbortController behavior, Promise.race pattern, slow-response servers,
  new-user onboarding path, and simulated broken AbortController

Fixes #465
@codeant-ai

codeant-ai Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai Bot added the size:L This PR changes 100-499 lines, ignoring generated files label May 8, 2026
@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8ca59fe8-abae-43c7-a58d-538574e09070

📥 Commits

Reviewing files that changed from the base of the PR and between 97caa29 and be9d648.

📒 Files selected for processing (1)
  • src/utils/feature-flags.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/feature-flags.ts

📝 Walkthrough

Walkthrough

This PR hardens feature flags fetch timeout behavior by adding a JS-level hard timeout via Promise.race alongside the existing AbortController, addressing issue #465. It also adds a Node.js test runner with six tests that simulate black-hole and slow servers, onboarding wait, and broken-abort cases to validate the timeouts.

Changes

Feature Flags Fetch Timeout Hardening

Layer / File(s) Summary
Fetch Timeout Implementation
src/utils/feature-flags.ts
fetchFlags() now combines a 3s AbortController abort timer with a 3s JS-level Promise.race hard timeout and clears both timers in finally. waitForFreshFlags() now races the in-flight fresh-fetch promise against a 5s safety timeout and clears its timer.
Test Server & Harness Setup
test/test-feature-flags-timeout.js
Mock TCP black-hole server and delayed HTTP server plus helpers to listen/close. currentFetchPattern() mirrors production fetch logic (AbortController + Promise.race) for tests.
Timeout Validation Tests
test/test-feature-flags-timeout.js
Tests 1–6 validate timeout behavior: AbortController interrupt on black-hole, fetchPattern validation, AbortController-only baseline, slow HTTP response handling, onboarding fresh-fetch wait simulation, and broken-abort fallback via Promise.race.
Test Runner & Error Handling
test/test-feature-flags-timeout.js
Main runner runs all tests, logs results, exits with code 1 on any failure, and includes a fatal error handler.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

size:M

Poem

🐰 A rabbit hopped in code tonight,
Quietly adding timeout light.
When TCP stalls or aborts fail,
Promise.race will tell the tale.
Startup swift — the rabbit's delight.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 69.23% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title 'fix: add Promise.race hard timeout to feature flags fetch (#465)' accurately reflects the main change—wrapping the fetch in Promise.race with a hard timeout to resolve the 30s startup delay.
Linked Issues check ✅ Passed Code changes implement all primary objectives from issue #465: Promise.race hard timeout in fetchFlags() (3s), safety timeout in waitForFreshFlags() (5s), timeout cleanup in finally blocks, and comprehensive timeout tests covering AbortController behavior and broken-abort scenarios.
Out of Scope Changes check ✅ Passed All changes are directly scoped to addressing issue #465: timeout improvements in feature-flags.ts and comprehensive timeout behavior validation tests in test-feature-flags-timeout.js; no unrelated modifications detected.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/feature-flags-fetch-timeout

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

  • Generate code and open pull requests
  • Plan features and break down work
  • Investigate incidents and troubleshoot customer tickets together
  • Automate recurring tasks and respond to alerts with triggers
  • Summarize progress and report instantly

Built for teams:

  • Shared memory across your entire org—no repeating context
  • Per-thread sandboxes to safely plan and execute work
  • Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
test/test-feature-flags-timeout.js (2)

287-300: ⚡ Quick win

This test does not cover the new waitForFreshFlags() safety timeout.

Here the onboarding path awaits freshFetchPromise directly, so it only proves the fetch path returns in ~3s. If freshFetchPromise never resolves, the real protection is now the Promise.race inside waitForFreshFlags(), and this test would miss that regression.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test-feature-flags-timeout.js` around lines 287 - 300, The test
currently awaits freshFetchPromise directly and therefore doesn't exercise the
waitForFreshFlags() timeout behavior; update the test to call the actual
waitForFreshFlags() (or the public method that races freshFetchPromise with the
timeout) instead of awaiting freshFetchPromise, e.g. trigger the fire-and-forget
fetch via currentFetchPattern/invoke initialize() and then await
waitForFreshFlags() so the Promise.race timeout path is exercised and the test
will fail if the safety timeout doesn't resolve.

96-113: ⚡ Quick win

This copied helper has already drifted from production.

currentFetchPattern() clears abortTimeout on both success and failure, while FeatureFlagManager.fetchFlags() only clears it on success and never clears the hard-timeout timer. That means this suite can stay green while the shipped path still leaks timers. Please share the timeout wrapper or exercise the real manager instead of copying the logic here.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/test-feature-flags-timeout.js` around lines 96 - 113, The test helper
currentFetchPattern() diverges from production behavior in
FeatureFlagManager.fetchFlags() by clearing abortTimeout on both paths and not
reusing the shared timeout wrapper, which masks a timer leak; to fix, stop
duplicating logic and either import/use the shared timeout wrapper used by
FeatureFlagManager or invoke FeatureFlagManager.fetchFlags() in the test instead
of currentFetchPattern(), ensuring the test exercises the real behavior (i.e.,
match how abortTimeout and the hard-timeout are handled in production and do not
silently clear the hard-timeout timer if production doesn't).
src/utils/feature-flags.ts (1)

122-123: ⚡ Quick win

Cancel or unref() the waitForFreshFlags() safety timer.

If freshFetchPromise resolves immediately, this 5s timer still stays active. On the initialize path that can keep the process open after the response is already sent.

Suggested fix
   async waitForFreshFlags(): Promise<void> {
     if (this.freshFetchPromise) {
-      const safetyTimeout = new Promise<void>((resolve) => setTimeout(resolve, 5000));
-      await Promise.race([this.freshFetchPromise, safetyTimeout]);
+      let timeoutHandle: NodeJS.Timeout | undefined;
+      const safetyTimeout = new Promise<void>((resolve) => {
+        timeoutHandle = setTimeout(resolve, 5000);
+        timeoutHandle.unref();
+      });
+      try {
+        await Promise.race([this.freshFetchPromise, safetyTimeout]);
+      } finally {
+        if (timeoutHandle) {
+          clearTimeout(timeoutHandle);
+        }
+      }
     }
   }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/utils/feature-flags.ts` around lines 122 - 123, The safety timer created
as safetyTimeout (new Promise with setTimeout) keeps the Node timer active even
when this.freshFetchPromise resolves; change the implementation to create the
timeout with setTimeout (store the returned timer id), call timer.unref() if
available to avoid keeping the event loop alive, and after awaiting
Promise.race([this.freshFetchPromise, safetyTimeoutPromise]) clearTimeout(timer)
(or otherwise cancel the timer) so the timer is not left running; update the
code paths around waitForFreshFlags / the initialization that reference
this.freshFetchPromise to use this cancelable/unrefable timer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@src/utils/feature-flags.ts`:
- Around line 160-182: The fetch timeout logic leaks timers; store the hard
timeout handle and move cleanup into a finally block so both timers are always
cleared: capture the return value of setTimeout(...) for hardTimeout, wrap the
Promise.race([fetchPromise, hardTimeout]) await in try/catch/finally (or
try/finally) and in finally call clearTimeout(abortTimeout) and
clearTimeout(hardTimeout) (and if desired call controller.abort() there), and
apply the same pattern to waitForFreshFlags() to ensure no timer handles remain
active on error/timeout paths; refer to FETCH_TIMEOUT_MS, controller,
abortTimeout, hardTimeout, fetchPromise, Promise.race and waitForFreshFlags() to
locate the changes.

---

Nitpick comments:
In `@src/utils/feature-flags.ts`:
- Around line 122-123: The safety timer created as safetyTimeout (new Promise
with setTimeout) keeps the Node timer active even when this.freshFetchPromise
resolves; change the implementation to create the timeout with setTimeout (store
the returned timer id), call timer.unref() if available to avoid keeping the
event loop alive, and after awaiting Promise.race([this.freshFetchPromise,
safetyTimeoutPromise]) clearTimeout(timer) (or otherwise cancel the timer) so
the timer is not left running; update the code paths around waitForFreshFlags /
the initialization that reference this.freshFetchPromise to use this
cancelable/unrefable timer.

In `@test/test-feature-flags-timeout.js`:
- Around line 287-300: The test currently awaits freshFetchPromise directly and
therefore doesn't exercise the waitForFreshFlags() timeout behavior; update the
test to call the actual waitForFreshFlags() (or the public method that races
freshFetchPromise with the timeout) instead of awaiting freshFetchPromise, e.g.
trigger the fire-and-forget fetch via currentFetchPattern/invoke initialize()
and then await waitForFreshFlags() so the Promise.race timeout path is exercised
and the test will fail if the safety timeout doesn't resolve.
- Around line 96-113: The test helper currentFetchPattern() diverges from
production behavior in FeatureFlagManager.fetchFlags() by clearing abortTimeout
on both paths and not reusing the shared timeout wrapper, which masks a timer
leak; to fix, stop duplicating logic and either import/use the shared timeout
wrapper used by FeatureFlagManager or invoke FeatureFlagManager.fetchFlags() in
the test instead of currentFetchPattern(), ensuring the test exercises the real
behavior (i.e., match how abortTimeout and the hard-timeout are handled in
production and do not silently clear the hard-timeout timer if production
doesn't).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 11a6c6fc-d58c-4f7d-aa6e-5ffe16f39b73

📥 Commits

Reviewing files that changed from the base of the PR and between 9901344 and 97caa29.

📒 Files selected for processing (2)
  • src/utils/feature-flags.ts
  • test/test-feature-flags-timeout.js

Comment thread src/utils/feature-flags.ts Outdated
@e-grn

e-grn commented May 11, 2026

Copy link
Copy Markdown

Tested locally — fix confirmed working

Thanks for the rapid turnaround on this. Tested the fix/feature-flags-fetch-timeout branch on the original repro environment. Three runs today, same Windows 11 box, residential connection in Brisbane:

Configuration initialize → response Notes
hosts blackhole + DXT v0.2.40 (no fix) 30.5 s TCP timeout against 0.0.0.0, see note below
Direct connect + DXT v0.2.40 (no fix) 4.7 s Today's transit to desktopcommander.app is better than when I originally reported
Direct connect + local checkout of this PR 1.2 s

1.2 s is effectively at parity with the "WiFi off" baseline from the original issue (~1.6 s) — meaning the feature flags fetch no longer blocks the initialize response under any network condition. Exactly the goal. Promise.race does its job.

Three side observations from the test run

1. The hosts workaround I suggested in #465 turns out to be broken on Windows.

Worth flagging because it's in the issue body and anyone hitting the same problem will try it. Pointing desktopcommander.app to 0.0.0.0 doesn't fail fast on Windows — the TCP stack still waits the full ~30 s connect timeout against 0.0.0.0:443 before giving up. So the hosts trick replaces "30 s waiting on real server" with "30 s waiting on 0.0.0.0" — same hang, different destination. On Linux/macOS the kernel returns ECONNREFUSED immediately, so it works there.

This actually reinforces the case for the code-level fix: a userspace workaround that relies on socket-level fail-fast behavior is platform-dependent and unreliable. Promise.race is the right answer because it doesn't care what the underlying network is doing.

2. The bug reproduces on Node 22.20 (system), not only Node 24.15 (DXT-bundled).

My original report was based on the Node bundled in Claude Desktop's DXT runtime (Node 24.15 / undici 7.24.4). For this test, npm run setup registered the server via system Node — C:\Program Files\nodejs\node.exe, Node 22.20. Both versions exhibited the original 30 s hang, both are resolved by this PR. So the issue isn't tied to a specific Node/undici pair — anything where AbortController doesn't cleanly preempt a slow Windows TCP connect will benefit.

3. Minor unrelated bug: setup-claude-server.js double-escapes the path on Windows.

After npm run setup, claude_desktop_config.json contained:

"args": [
    "C:\\\\MyProjects\\\\DesktopCommanderMCP\\\\dist\\\\index.js"
]

That's quadruple backslashes in JSON, which JSON-parses to C:\\MyProjects\\...\\index.js at runtime — a double-backslashed string. Windows is tolerant and normalizes this internally, so Node spawns fine and the server runs (as the working log confirms). But it's not the standard \\ escape pattern other entries in the same config use.

Likely the setup script applies something like path.replace(/\\/g, '\\\\') on a value that has already been JSON-escaped, doubling the escaping. Happy to file a separate issue/PR for it — orthogonal to the timeout fix.

Logs

Happy to attach the three logs as artifacts if useful — the relevant deltas are the timestamps between Message from client: initialize and the subsequent server response. Just let me know.
mcp-server-Desktop Commander no blocking hosts.log
mcp-server-Desktop Commander with blocking hosts.log
mcp-server-desktop-commander.log

Thanks again for prioritising this so fast.

@edgarsskore edgarsskore merged commit c4e75ff into main May 11, 2026
2 checks passed
@wonderwhy-er

Copy link
Copy Markdown
Owner Author

@e-grn thank you for reporting, investigating and testing the solution! Merged, will go in to next release this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

~30 second MCP startup delay caused by feature flags fetch on high-latency networks (5s AbortController doesn't fire)

3 participants