Skip to content

Commit fea0681

Browse files
edgarsskoreclaude
andauthored
fix(edit_block): run fuzzy search in a worker thread to keep event lo… (#500)
* fix(edit_block): run fuzzy search in a worker thread to keep event loop responsive When edit_block found no exact match, the fuzzy-search fallback ran recursiveFuzzyIndexOf() synchronously on the main thread. On large files this pinned the event loop for seconds, and several parallel edit_block calls serialized their scans back-to-back, freezing pings and all other tool calls — the server appeared hung. The scan now runs in a worker thread via runFuzzySearchInWorker(): - inline eval worker that imports this module, so no separate worker file ships with the build - 30s timeout terminates runaway scans (errors surface via the existing handleEditBlock catch) - worker and timer are unref'd so a running scan never delays shutdown Measured: 4 parallel 2MB scans drop from 16.0s (serialized) to 4.8s (concurrent), with max ping latency 1-14ms during scans vs ~3.5s before. Adds a regression scenario to the edit-block performance integration test that pings every 200ms during a deliberately slow scan and fails if max latency exceeds 500ms (the existing 5s responsiveness probe was too loose to catch this). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: terminate fuzzy search worker on success Without an explicit terminate the worker lingers until its module graph drains (the in-worker telemetry HTTPS call alone holds it ~3s), keeping a structured-clone copy of the full file content in memory per call. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * refactor: extract fuzzy search into a dependency-free core module The worker previously imported fuzzySearch.ts, which pulls in capture.js and through it server.ts — booting the entire app module graph per fuzzy search. That cost ~275ms per call on small files and fired the search telemetry from inside the worker, where the client identity is never initialized. fuzzySearchCore.ts now holds the pure search functions with fastest-levenshtein as its only import; the worker loads just that and returns timing metrics as data, which the main thread captures with the real client id (same event names and payloads as before). Worker round-trip drops to ~18ms. The worker snippet also gained a .catch so import/search failures surface as a structured error instead of an opaque worker crash. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * test: handle rejection on the detached editPromise.finally chain The .finally() call created a derived promise with no rejection handler; if edit_block failed it raised unhandledRejection even though editPromise itself is awaited later. Swallow rejections on the detached chain only — errors still surface via the awaited editPromise. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> * fix: seed iterativeReduction with the measured distance of its slice In all recursive paths parentDistance is exact — the parent passes the distance it measured on precisely the child's slice, so the original branch-and-bound seeding was sound there. The one broken path is a top-level call where the whole text fits the small-segment branch (text <= 2x query length): parentDistance defaults to Infinity, the first shrink check always passes, and a match at position 0 can never be returned at position 0 — e.g. matching 'the quick brown fox' in 'the quick brwn fox jumps' returned start 1 / distance 2 instead of start 0 / distance 1. Measuring the slice distance on entry fixes that case and is a no-op for recursive calls (the computed seed equals the value the parent passed; scan timings are unchanged). Costs one extra distance() call per search, the same size as a single shrink-loop iteration. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
1 parent 8f425e2 commit fea0681

4 files changed

Lines changed: 328 additions & 134 deletions

File tree

src/tools/edit.ts

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@
1818
import { getDefaultEditorMetadata, readFile, writeFile, readFileInternal, validatePath } from './filesystem.js';
1919
import fs from 'fs/promises';
2020
import { ServerResult } from '../types.js';
21-
import { recursiveFuzzyIndexOf, getSimilarityRatio } from './fuzzySearch.js';
21+
import { runFuzzySearchInWorker, getSimilarityRatio } from './fuzzySearch.js';
2222
import { capture } from '../utils/capture.js';
2323
import { createErrorResponse } from '../error-handlers.js';
2424
import { EditBlockArgsSchema } from "./schemas.js";
@@ -251,9 +251,10 @@ RECOMMENDATION: For large search/replace operations, consider breaking them into
251251
if (count === 0) {
252252
// Track fuzzy search time
253253
const startTime = performance.now();
254-
255-
// Perform fuzzy search
256-
const fuzzyResult = recursiveFuzzyIndexOf(content, block.search);
254+
255+
// Perform fuzzy search in a worker thread so the main event loop stays
256+
// responsive to pings and parallel tool calls during the scan
257+
const fuzzyResult = await runFuzzySearchInWorker(content, block.search);
257258
const similarity = getSimilarityRatio(block.search, fuzzyResult.value);
258259

259260
// Calculate execution time in milliseconds

src/tools/fuzzySearch.ts

Lines changed: 78 additions & 130 deletions
Original file line numberDiff line numberDiff line change
@@ -1,140 +1,88 @@
1-
import { distance } from 'fastest-levenshtein';
21
import { capture } from '../utils/capture.js';
2+
import { Worker } from 'worker_threads';
3+
import type { FuzzyMatch, FuzzySearchMetrics } from './fuzzySearchCore.js';
4+
5+
// Re-export so existing callers keep importing from this module.
6+
export { recursiveFuzzyIndexOf, getSimilarityRatio } from './fuzzySearchCore.js';
7+
8+
/** Abort fuzzy search in the worker after this many ms to avoid unbounded CPU burn. */
9+
export const FUZZY_SEARCH_TIMEOUT_MS = 30000;
310

411
/**
5-
* Recursively finds the closest match to a query string within text using fuzzy matching
6-
* @param text The text to search within
7-
* @param query The query string to find
8-
* @param start Start index in the text (default: 0)
9-
* @param end End index in the text (default: text.length)
10-
* @param parentDistance Best distance found so far (default: Infinity)
11-
* @returns Object with start and end indices, matched value, and Levenshtein distance
12+
* Inline worker entry: imports the dependency-free core module (passed as
13+
* moduleUrl) and runs the search off the main thread. The core module is
14+
* deliberately a leaf — importing this module (or anything app-level) from the
15+
* worker would boot the whole server per search. Kept as an eval'd snippet so
16+
* the worker needs no separate file to ship alongside the compiled output.
1217
*/
13-
export function recursiveFuzzyIndexOf(text: string, query: string, start: number = 0, end: number | null = null, parentDistance: number = Infinity, depth: number = 0): {
14-
start: number;
15-
end: number;
16-
value: string;
17-
distance: number;
18-
} {
19-
// For debugging and performance tracking purposes
20-
if (depth === 0) {
21-
const startTime = performance.now();
22-
const result = recursiveFuzzyIndexOf(text, query, start, end, parentDistance, depth + 1);
23-
const executionTime = performance.now() - startTime;
24-
25-
// Capture detailed metrics for the recursive search for in-depth analysis
26-
capture('fuzzy_search_recursive_metrics', {
27-
execution_time_ms: executionTime,
28-
text_length: text.length,
29-
query_length: query.length,
30-
result_distance: result.distance
31-
});
32-
33-
return result;
34-
}
35-
36-
if (end === null) end = text.length;
37-
38-
// For small text segments, use iterative approach
39-
if (end - start <= 2 * query.length) {
40-
return iterativeReduction(text, query, start, end, parentDistance);
41-
}
42-
43-
let midPoint = start + Math.floor((end - start) / 2);
44-
let leftEnd = Math.min(end, midPoint + query.length); // Include query length to cover overlaps
45-
let rightStart = Math.max(start, midPoint - query.length); // Include query length to cover overlaps
46-
47-
// Calculate distance for current segments
48-
let leftDistance = distance(text.substring(start, leftEnd), query);
49-
let rightDistance = distance(text.substring(rightStart, end), query);
50-
let bestDistance = Math.min(leftDistance, parentDistance, rightDistance);
51-
52-
// If parent distance is already the best, use iterative approach
53-
if (parentDistance === bestDistance) {
54-
return iterativeReduction(text, query, start, end, parentDistance);
55-
}
56-
57-
// Recursively search the better half
58-
if (leftDistance < rightDistance) {
59-
return recursiveFuzzyIndexOf(text, query, start, leftEnd, bestDistance, depth + 1);
60-
} else {
61-
return recursiveFuzzyIndexOf(text, query, rightStart, end, bestDistance, depth + 1);
62-
}
63-
}
18+
const WORKER_CODE = `
19+
const { workerData, parentPort } = require('worker_threads');
20+
import(workerData.moduleUrl)
21+
.then((m) => {
22+
parentPort.postMessage({ ok: true, ...m.runFuzzySearch(workerData.text, workerData.query) });
23+
})
24+
.catch((err) => {
25+
parentPort.postMessage({ ok: false, error: String(err && err.stack || err) });
26+
});
27+
`;
28+
29+
const CORE_MODULE_URL = new URL('./fuzzySearchCore.js', import.meta.url).href;
6430

6531
/**
66-
* Iteratively refines the best match by reducing the search area
67-
* @param text The text to search within
68-
* @param query The query string to find
69-
* @param start Start index in the text
70-
* @param end End index in the text
71-
* @param parentDistance Best distance found so far
72-
* @returns Object with start and end indices, matched value, and Levenshtein distance
32+
* Runs the fuzzy search in a Worker thread so the main MCP event loop stays
33+
* responsive to pings and other tool calls during heavy scans. Rejects if the
34+
* scan exceeds timeoutMs, terminating the worker so it doesn't linger in the
35+
* background. Search metrics come back with the result and are captured here,
36+
* on the main thread, where the client identity is initialized.
7337
*/
74-
function iterativeReduction(text: string, query: string, start: number, end: number, parentDistance: number): {
75-
start: number;
76-
end: number;
77-
value: string;
78-
distance: number;
79-
} {
80-
const startTime = performance.now();
81-
let iterations = 0;
82-
83-
let bestDistance = parentDistance;
84-
let bestStart = start;
85-
let bestEnd = end;
86-
87-
// Improve start position
88-
let nextDistance = distance(text.substring(bestStart + 1, bestEnd), query);
89-
90-
while (nextDistance < bestDistance) {
91-
bestDistance = nextDistance;
92-
bestStart++;
93-
const smallerString = text.substring(bestStart + 1, bestEnd);
94-
nextDistance = distance(smallerString, query);
95-
iterations++;
96-
}
97-
98-
// Improve end position
99-
nextDistance = distance(text.substring(bestStart, bestEnd - 1), query);
100-
101-
while (nextDistance < bestDistance) {
102-
bestDistance = nextDistance;
103-
bestEnd--;
104-
const smallerString = text.substring(bestStart, bestEnd - 1);
105-
nextDistance = distance(smallerString, query);
106-
iterations++;
107-
}
108-
109-
const executionTime = performance.now() - startTime;
110-
111-
// Capture metrics for the iterative refinement phase
112-
capture('fuzzy_search_iterative_metrics', {
113-
execution_time_ms: executionTime,
114-
iterations: iterations,
115-
segment_length: end - start,
116-
query_length: query.length,
117-
final_distance: bestDistance
38+
export function runFuzzySearchInWorker(
39+
text: string,
40+
query: string,
41+
timeoutMs: number = FUZZY_SEARCH_TIMEOUT_MS
42+
): Promise<FuzzyMatch> {
43+
return new Promise((resolve, reject) => {
44+
const worker = new Worker(WORKER_CODE, { eval: true, workerData: { moduleUrl: CORE_MODULE_URL, text, query } });
45+
// Never let a scan keep the server process alive during shutdown.
46+
worker.unref();
47+
48+
const timer = setTimeout(() => {
49+
worker.terminate();
50+
reject(new Error(`Fuzzy search timed out after ${timeoutMs}ms`));
51+
}, timeoutMs);
52+
timer.unref();
53+
54+
worker.on('message', (msg: { ok: true; result: FuzzyMatch; metrics: FuzzySearchMetrics } | { ok: false; error: string }) => {
55+
clearTimeout(timer);
56+
if (msg.ok) {
57+
captureFuzzySearchMetrics(msg.metrics);
58+
resolve(msg.result);
59+
} else {
60+
reject(new Error(`Fuzzy search worker failed: ${msg.error}`));
61+
}
62+
// Don't let the worker wind down on its own; the answer is already
63+
// here, and a lingering worker holds its copy of the file text.
64+
// The promise is settled, so the exit-code rejection below is a no-op.
65+
worker.terminate();
66+
});
67+
68+
worker.on('error', (err) => {
69+
clearTimeout(timer);
70+
reject(err);
71+
});
72+
73+
worker.on('exit', (code) => {
74+
clearTimeout(timer);
75+
if (code !== 0) {
76+
reject(new Error(`Fuzzy search worker exited with code ${code}`));
77+
}
78+
});
11879
});
119-
120-
return {
121-
start: bestStart,
122-
end: bestEnd,
123-
value: text.substring(bestStart, bestEnd),
124-
distance: bestDistance
125-
};
12680
}
12781

128-
/**
129-
* Calculates the similarity ratio between two strings
130-
* @param a First string
131-
* @param b Second string
132-
* @returns Similarity ratio (0-1)
133-
*/
134-
export function getSimilarityRatio(a: string, b: string): number {
135-
const maxLength = Math.max(a.length, b.length);
136-
if (maxLength === 0) return 1; // Both strings are empty
137-
138-
const levenshteinDistance = distance(a, b);
139-
return 1 - (levenshteinDistance / maxLength);
140-
}
82+
/** Same telemetry events the search used to emit inline, now sent from the main thread. */
83+
function captureFuzzySearchMetrics(metrics: FuzzySearchMetrics): void {
84+
capture('fuzzy_search_recursive_metrics', metrics.recursive);
85+
if (metrics.iterative) {
86+
capture('fuzzy_search_iterative_metrics', metrics.iterative);
87+
}
88+
}

0 commit comments

Comments
 (0)