Few more fixes for search in files#232
Conversation
|
Caution Review failedThe pull request is closed. WalkthroughAdds an earlyTermination option to search flows. Updates search-manager to validate paths, wait for first data chunk, stream results with optional early stop on exact filename, refine ripgrep args, and tighten error/telemetry handling. Handlers and tools pass through the option. Schemas and server docs updated; minor text and .gitignore tweaks. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant Handler as search-handlers.ts
participant Manager as search-manager.ts
participant RG as ripgrep process
Client->>Handler: startSearch({ pattern, earlyTermination? })
Handler->>Manager: startSearch(options)
Manager->>Manager: validatePath(rootPath) -> validPath
Manager->>RG: spawn rg with args (glob/exact-filename aware)
Note over Manager: Wait for first data chunk or 40ms
RG-->>Manager: stdout chunk(s)
Manager->>Manager: processBufferedOutput()
alt exact filename match AND earlyTermination != false
Manager-->>RG: SIGTERM (delay ~100ms)
Note over Manager: mark session complete
end
Manager-->>Handler: initial results (+hasMoreResults, isError/error trimmed)
Handler-->>Client: response
sequenceDiagram
autonumber
participant Client
participant Handler as search-handlers.ts
participant Manager as search-manager.ts
Client->>Handler: getMoreResults(sessionId, offset, length)
Handler->>Manager: readSearchResults(...)
Manager-->>Handler: { results, error?, hasMoreResults, isComplete }
alt results.totalResults == 0 AND error present
Handler-->>Client: error surfaced
else
Handler-->>Client: return results (partial OK), error omitted
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/tools/filesystem.ts (1)
905-919: Apply configurable earlyTermination, dedupe results, and always terminate the search session
File: src/tools/filesystem.ts (lines 905–919)
Apply this consolidated diff:-export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> { +export async function searchFiles( + rootPath: string, + pattern: string, + opts?: { earlyTermination?: boolean; maxResults?: number; ignoreCase?: boolean } +): Promise<string[]> { @@ - const result = await searchManager.startSearch({ + const result = await searchManager.startSearch({ rootPath, pattern, searchType: 'files', - ignoreCase: true, - maxResults: 5000, // Higher limit for compatibility - earlyTermination: true, // Use early termination for better performance + ignoreCase: opts?.ignoreCase ?? true, + maxResults: opts?.maxResults ?? 5000, + earlyTermination: opts?.earlyTermination ?? true, }); @@ - let allResults: string[] = []; + const seen = new Set<string>(); + let allResults: string[] = []; @@ - for (const searchResult of result.results) { - if (searchResult.type === 'file') { - allResults.push(searchResult.file); - } - } + for (const searchResult of result.results) { + if (searchResult.type === 'file' + && searchResult.file !== '__LAST_READ_MARKER__' + && !seen.has(searchResult.file) + ) { + seen.add(searchResult.file); + allResults.push(searchResult.file); + } + } @@ - for (const searchResult of results.results) { - if (searchResult.file !== '__LAST_READ_MARKER__' && searchResult.type === 'file') { - allResults.push(searchResult.file); - } - } + for (const searchResult of results.results) { + if (searchResult.type === 'file' + && searchResult.file !== '__LAST_READ_MARKER__' + && !seen.has(searchResult.file) + ) { + seen.add(searchResult.file); + allResults.push(searchResult.file); + } + } @@ - // Log only the count of found files, not their paths + // Proactively terminate the search session on completion + try { searchManager.terminateSearch(sessionId); } catch {} + + // Log only the count of found files, not their pathsNo existing call sites of
searchFileswere found; ensure future callers passopts(e.g.{ earlyTermination: false }) when they require full result sets.src/search-manager.ts (1)
127-134: Avoid sending full filesystem paths in telemetry.
requestedPath/validatedPath can leak PII. Send only basename or a hash.Apply this diff:
- capture('search_session_started', { + capture('search_session_started', { sessionId, searchType: options.searchType, hasTimeout: !!timeoutMs, timeoutMs, - requestedPath: options.rootPath, - validatedPath: validPath + requestedPathBasename: path.basename(options.rootPath), + validatedPathBasename: path.basename(validPath) });Additionally, consider a safe wrapper to prevent unhandled rejections:
// near imports const safeCapture = (...args: Parameters<typeof capture>) => { void capture(...args).catch(() => {}); };Then replace capture(...) with safeCapture(...).
🧹 Nitpick comments (7)
.gitignore (1)
42-45: Ignore test outputs: LGTM, with a tiny pattern nitWorks as intended. Consider anchoring to repo root to avoid accidental matches in nested paths.
Apply this diff:
- test/test_output/ + /test/test_output/src/tools/schemas.ts (1)
111-112: Schema addition: good; surface the description to clientsThe inline comment won’t reach clients via JSON Schema. Add a Zod description for generated docs/UIs.
Apply this diff:
- earlyTermination: z.boolean().optional(), // Stop search early when exact filename match is found (default: true for files, false for content) + earlyTermination: z + .boolean() + .optional() + .describe('Stop search early when exact filename match is found (default: true for file searches, false for content searches)'),src/utils/system-info.ts (1)
609-611: macOS guidance: LGTMNice addition. Consider adding a concrete example to reduce ambiguity, e.g.,
mdfind -name "config.json".src/handlers/search-handlers.ts (2)
100-107: Good change: don’t block results on non-fatal errors.
Consider truncating very long errors to keep responses compact.Apply this diff:
- text: `Search session ${parsed.data.sessionId} encountered an error: ${results.error}` + text: `Search session ${parsed.data.sessionId} encountered an error: ${(results.error || '').slice(0, 600)}${(results.error || '').length > 600 ? '…' : ''}`
114-116: Unify runtime units across handlers.
start_search prints ms; get_more prints seconds. Pick one for consistency.Apply this diff to keep ms here:
- output += `Runtime: ${Math.round(results.runtime / 1000)}s\n`; + output += `Runtime: ${Math.round(results.runtime)}ms\n`;src/search-manager.ts (2)
203-205: hasMoreResults computation is fine for range reads.
For parity, you could set hasMoreResults on tail to !session.isComplete.Apply this diff (tail branch):
- hasMoreResults: false, // Tail always returns what's available + hasMoreResults: !session.isComplete,
393-419: Filter is good; also redact paths before logging/sending.
Errors may still contain absolute paths. Redact before capture.Apply this diff:
- if (filteredErrors.length > 0) { - const meaningfulErrors = filteredErrors.join('\n').trim(); + if (filteredErrors.length > 0) { + const redactPaths = (s: string) => + s + // Unix-like absolute paths + .replace(/\b\/[^ \n\r\t]*/g, '<path>') + // Windows absolute paths + .replace(/\b[A-Za-z]:\\[^ \n\r\t]*/g, '<path>'); + const meaningfulErrors = filteredErrors.map(redactPaths).join('\n').trim(); if (meaningfulErrors) { session.error = (session.error || '') + meaningfulErrors + '\n'; - capture('search_session_error', { + capture('search_session_error', { sessionId: session.id, error: meaningfulErrors.substring(0, 200) }); } }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
.gitignore(1 hunks)src/handlers/search-handlers.ts(2 hunks)src/search-manager.ts(5 hunks)src/server.ts(1 hunks)src/tools/filesystem.ts(1 hunks)src/tools/schemas.ts(1 hunks)src/utils/system-info.ts(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/tools/filesystem.ts (3)
test/test_search_truncation.js (1)
testSearchTruncation(46-93)test/test_improved_search_truncation.js (1)
testImprovedSearchTruncation(46-106)src/tools/search.ts (1)
searchCodeFallback(149-257)
src/search-manager.ts (2)
src/utils/capture.ts (1)
capture(277-284)src/tools/search.ts (1)
searchCode(16-146)
🔇 Additional comments (7)
src/server.ts (1)
316-316: Defaults verified: earlyTermination=true for file searches and false for content searches; handler passes user overrides correctly. LGTM.src/handlers/search-handlers.ts (1)
33-34: Plumbing earlyTermination looks good — confirm defaults match docs.
Pass-through is correct. Please verify StartSearchArgsSchema defaults enforce: files=true, content=false.src/search-manager.ts (5)
39-40: API surface OK.
Optional boolean with comment reads well.
138-144: Nice startup optimization.
Event-based first chunk with 40ms cap is reasonable.
195-199: Error gating on tail reads is sensible.
Returns data while suppressing noisy errors; LGTM.
214-216: Trimmed error exposure is good.
Prevents spurious error states; LGTM.
430-445: Exit-code policy: confirm intent for code 2.
You’re not marking code 2 (some files unreadable) as error when there are results. Confirm this matches UX expectations.
Summary by CodeRabbit