Skip to content

test(edit): add MCP edit performance integration#489

Merged
wonderwhy-er merged 7 commits into
mainfrom
investigate/edit-block-performance
Jun 4, 2026
Merged

test(edit): add MCP edit performance integration#489
wonderwhy-er merged 7 commits into
mainfrom
investigate/edit-block-performance

Conversation

@edgarsskore

@edgarsskore edgarsskore commented Jun 2, 2026

Copy link
Copy Markdown
Collaborator

Summary by CodeRabbit

  • Tests
    • Added long-running integration/performance tests covering large-file edit workflows (Markdown and Python), including exact-match and fuzzy-fallback edit handling, periodic checkpoints, parallel workloads, responsiveness/latency probes, and end-to-end assertions.
  • Chores
    • Added a new npm script to build and run the integration suite and a test runner that executes each integration test, reports pass/fail with durations, and emits an aggregate summary.

@coderabbitai

coderabbitai Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

Adds an integration/performance test and npm script that starts the built MCP server over stdio, generates large deterministic markdown and Python fixtures, runs concurrent exact and fuzzy edit_block workflows with paged reads and checkpoints, probes server responsiveness, validates final file contents, and restores configuration on teardown.

Changes

Edit Block Performance Integration Test

Layer / File(s) Summary
Test Configuration & Utilities
test/integration/edit-block-performance.js (lines 1–196)
Defines test constants, helpers for MCP calls and assertions, deterministic fixture generators (markdown/Python/fuzzy), marker builders, read-offset calculations, and fuzzy diff parsing.
Edit Block Workflow Implementations
test/integration/edit-block-performance.js (lines 198–530)
Implements three workflows: markdown same-file edits (various editCounts), Python exact-match edits (150), and Python fuzzy-fallback edits (25). Each writes fixtures, performs paged reads and edit_block calls, writes periodic checkpoints, re-reads to verify edits, and enforces performance bounds.
Orchestration & Responsiveness Probe
test/integration/edit-block-performance.js (lines 532–581)
Runs workflows in parallel, starts a responsiveness probe that pings the MCP server at intervals, aggregates durations, and asserts max observed latency stays below configured threshold.
MCP Client, Setup & Teardown
test/integration/edit-block-performance.js (lines 583–658)
Launches the built MCP server via stdio, connects an MCP SDK client, captures server stderr, recreates test directory, validates tools, sets editable config values (allowed dirs, read/write line limits), and restores config and filesystem on teardown.
Main Entrypoint & Process Handling
test/integration/edit-block-performance.js (lines 660–714)
Orchestrates client creation, setup, parallel workflow execution, verification and performance reporting, strict planned-vs-verified checks (with fuzzy special-casing), guarded teardown, client shutdown, and error-exit behavior.
Integration Test Runner & NPM Wiring
package.json, test/integration/run-all-integration-tests.js (lines 1–90)
Adds test:integration npm script to run build then the runner; runner discovers .js tests in test/integration/, spawns each test sequentially with node, records durations and pass/fail results, prints per-test timings and aggregate summary, and exits nonzero if any test failed.

Sequence Diagram(s)

sequenceDiagram
  participant TestMain as Test Main
  participant MCPServer as MCP Server (dist/index.js)
  participant MCPClient as MCP Client (SDK)
  participant Workflows as Concurrent Workflows
  participant Probe as Responsiveness Probe

  TestMain->>MCPServer: spawn via stdio
  MCPServer-->>TestMain: server stdout/stderr (captured)
  TestMain->>MCPClient: connect()
  TestMain->>MCPClient: enumerate_tools
  TestMain->>MCPClient: get_config / set_config (test limits)

  TestMain->>Workflows: start parallel workflows
  TestMain->>Probe: start probe loop

  par Markdown Workflow
    Workflows->>MCPClient: write_file (large markdown)
    Workflows->>MCPClient: read_file (paged reads)
    Workflows->>MCPClient: edit_block (replace marker)
    Workflows->>MCPClient: read_file (verify)
  and Python Exact Workflow
    Workflows->>MCPClient: write_file (python fixture)
    Workflows->>MCPClient: edit_block (exact match)
    Workflows->>MCPClient: read_file (verify)
  and Python Fuzzy Workflow
    Workflows->>MCPClient: write_file (fuzzy fixture)
    Workflows->>MCPClient: edit_block (near-miss)
    MCPClient-->>Workflows: Differences (extracted exact)
    Workflows->>MCPClient: edit_block (retry with exact)
  and Probe Loop
    loop every 1s
      Probe->>MCPClient: list_tools / ping
      MCPClient-->>Probe: respond (measure latency)
    end
  end

  Workflows-->>TestMain: workflows complete
  Probe-->>TestMain: probe stopped (latencies)
  TestMain->>MCPClient: set_config (restore)
  TestMain->>TestMain: teardown & shutdown
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Poem

🐰 I hopped through fixtures large and bright,
Replaced old markers in the quiet night.
Fuzzy fell back, exact matches flew,
Pings kept time as edits grew and grew.
Tests finished green — a carrot-shaped view!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main changes: adding integration tests for MCP edit block performance, including a performance test script, test runner, and npm script.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch investigate/edit-block-performance

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@test/test-edit-block-performance-integration.js`:
- Around line 595-603: The test calls callTool(client, 'get_config', {}) without
first asserting the MCP advertises the 'get_config' tool; add an assertion
similar to the existing loop that checks tools.tools.some(tool => tool.name ===
'get_config') (using the same tools variable from client.listTools) so the test
validates presence of 'get_config' before invoking callTool and fails with the
correct message if the tool is missing.
- Around line 591-623: The setup() routine can fail after creating TEST_DIR and
mutating server state (via callTool('set_config_value')), leaving the MCP
process and partial config changes running; modify main() to guard the setup
call by wrapping it in try/catch (or move setup into the existing try block) and
on any setup error run the same cleanup/teardown logic used in the finally block
(stop the MCP process, remove TEST_DIR, and revert any config changes if
possible), ensuring resources created during setup are always cleaned even if
setup throws; reference setup(), main(), TEST_DIR, callTool and the
'set_config_value' tool when applying the fix.
- Around line 613-620: The test sets 'fileWriteLineLimit' to 50 via
callTool(client, 'set_config_value', { key, value, origin: 'llm' }) which is too
small for the generated fixtures (thousands of lines) and can cause writes to
fail before exercising edit_block; increase the value for the
'fileWriteLineLimit' entry (in the array iterated by the for..of that calls
set_config_value and assertToolSuccess) to a number larger than the fixture size
(e.g., several thousand) so write_file calls won't be truncated or rejected
during the integration test.
- Around line 510-521: The responsiveness probe (started via
runResponsivenessProbe with stopProbe) is not stopped/drained if Promise.all
rejects; modify runParallelWorkflows to ensure the probe is always stopped and
awaited: wrap the Promise.all([...]) call in a try/finally (or
try/catch/finally), set stopProbe.value = true in the finally block, then await
responsivenessProbe in that same finally to drain it before rethrowing any
error; keep existing variables workflowResults and responsivenessProbe names so
the change is local to runParallelWorkflows and ensures the probe won’t continue
pinging during teardown.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cac780dd-aca7-4fd3-bfc5-805c7c7fb7c7

📥 Commits

Reviewing files that changed from the base of the PR and between ce4669c and d66c3e2.

📒 Files selected for processing (1)
  • test/test-edit-block-performance-integration.js

Comment thread test/integration/edit-block-performance.js Outdated
Comment thread test/integration/edit-block-performance.js
Comment thread test/integration/edit-block-performance.js
Comment thread test/integration/edit-block-performance.js

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
test/integration/run-all-integration-tests.js (1)

16-43: 🏗️ Heavy lift

Consider adding a timeout mechanism for hung tests.

Currently, if a test hangs indefinitely (e.g., waiting for a resource that never arrives), the runner will also hang. While CI pipelines typically have external timeouts, adding a configurable per-test timeout would improve developer experience and provide clearer error messages when tests exceed expected durations.

💡 Example timeout implementation
-function runTestFile(testFile) {
+function runTestFile(testFile, timeoutMs = 300000) { // default 5 min
   return new Promise((resolve) => {
     console.log(`\nRunning integration test: ${testFile}`);
     const startedAt = Date.now();
     const proc = spawn('node', [testFile], {
       cwd: __dirname,
       stdio: 'inherit',
       shell: false,
     });

+    const timer = setTimeout(() => {
+      proc.kill('SIGTERM');
+      const duration = Date.now() - startedAt;
+      console.error(`FAIL ${testFile} (${duration}ms): timeout after ${timeoutMs}ms`);
+      resolve({ file: testFile, success: false, duration, error: 'timeout' });
+    }, timeoutMs);
+
     proc.on('close', (code) => {
+      clearTimeout(timer);
       const duration = Date.now() - startedAt;
       if (code === 0) {
         console.log(`PASS ${testFile} (${duration}ms)`);
         resolve({ file: testFile, success: true, duration });
       } else {
         console.error(`FAIL ${testFile} (${duration}ms, exit code ${code})`);
         resolve({ file: testFile, success: false, duration, exitCode: code });
       }
     });

     proc.on('error', (error) => {
+      clearTimeout(timer);
       const duration = Date.now() - startedAt;
       console.error(`FAIL ${testFile} (${duration}ms): ${error.message}`);
       resolve({ file: testFile, success: false, duration, error: error.message });
     });
   });
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@test/integration/run-all-integration-tests.js` around lines 16 - 43, The
runTestFile function lacks a per-test timeout and can hang indefinitely; add a
configurable timeout (e.g., from an env var or argument like TEST_TIMEOUT_MS
with a sane default) inside runTestFile that starts a timer after spawning the
child, and on timeout logs a clear FAIL message, kills the child process
(proc.kill()), and resolves the promise with success:false, duration and a
timeout flag/message; ensure you clear the timeout when proc emits 'close' or
'error' to avoid leaks and race conditions.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@test/integration/run-all-integration-tests.js`:
- Around line 16-43: The runTestFile function lacks a per-test timeout and can
hang indefinitely; add a configurable timeout (e.g., from an env var or argument
like TEST_TIMEOUT_MS with a sane default) inside runTestFile that starts a timer
after spawning the child, and on timeout logs a clear FAIL message, kills the
child process (proc.kill()), and resolves the promise with success:false,
duration and a timeout flag/message; ensure you clear the timeout when proc
emits 'close' or 'error' to avoid leaks and race conditions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 601046cf-19b1-4482-86ff-828497e16c48

📥 Commits

Reviewing files that changed from the base of the PR and between 98d7745 and 5bd6800.

📒 Files selected for processing (2)
  • package.json
  • test/integration/run-all-integration-tests.js
🚧 Files skipped from review as they are similar to previous changes (1)
  • package.json

…itch

The integration test set DESKTOP_COMMANDER_DISABLE_TELEMETRY on the spawned
server, but nothing read it — telemetry was gated only on the persisted
`telemetryEnabled` config, so test/CI runs fired real GA4 + BigQuery-proxy
events. Add an env-based kill-switch that short-circuits both send paths,
independent of config and without mutating the user's config.
Adds a DOCX same-file edit workflow (40 edits) to the parallel performance
suite so the docx edit_block path (find/replace on pretty-printed document
XML + zip repack) is exercised under the concurrent responsiveness probe.
Targets <w:t xml:space="preserve">...</w:t> elements and verifies via the
DOCX outline read.
@wonderwhy-er wonderwhy-er merged commit 3881aed into main Jun 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants