Skip to content

Few more fixes for search in files#232

Merged
wonderwhy-er merged 2 commits into
mainfrom
additional-tweaks-for-file-search
Sep 1, 2025
Merged

Few more fixes for search in files#232
wonderwhy-er merged 2 commits into
mainfrom
additional-tweaks-for-file-search

Conversation

@wonderwhy-er

@wonderwhy-er wonderwhy-er commented Sep 1, 2025

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features
    • Optional early stop for file searches on exact filename matches, improving speed.
    • Faster initial feedback by briefly awaiting first results at search start.
    • More accurate indication of whether more results are available.
  • Bug Fixes
    • Partial results are preserved when permission errors occur.
    • Cleaner, clearer error messages with reduced noise and fewer false error states.
  • Documentation
    • Added guidance and examples for the early-stop option.
    • Updated macOS tips for file search.
  • Chores
    • Ignore test output directory; minor .gitignore formatting updates.

@coderabbitai

coderabbitai Bot commented Sep 1, 2025

Copy link
Copy Markdown
Contributor

Caution

Review failed

The pull request is closed.

Walkthrough

Adds an earlyTermination option to search flows. Updates search-manager to validate paths, wait for first data chunk, stream results with optional early stop on exact filename, refine ripgrep args, and tighten error/telemetry handling. Handlers and tools pass through the option. Schemas and server docs updated; minor text and .gitignore tweaks.

Changes

Cohort / File(s) Summary
Ignore rules
\.gitignore
Add test/test_output/ ignore; minor reflow in documentation-related entries; no behavior change for plans/.
Handlers
src/handlers/search-handlers.ts
Pass earlyTermination into startSearch. Adjust get-more-results error surfacing: only show error when no results and non-empty error; otherwise return partial results.
Search core
src/search-manager.ts
Add earlyTermination?: boolean to options; validate root path; include new telemetry fields; build ripgrep args with validated path and improved glob/exact-filename handling; wait for first data chunk (~40ms cap); implement early termination on exact filename match; filter stderr noise; refine isError logic; add hasMoreResults to reads; new helpers for exact filename/globs.
Server docs
src/server.ts
Document earlyTermination in start_search tool description; add two usage examples; no functional changes.
Tooling and schemas
src/tools/schemas.ts, src/tools/filesystem.ts
Schema: add optional earlyTermination to StartSearchArgsSchema. Filesystem: set earlyTermination: true for file searches when calling startSearch.
System guidance
src/utils/system-info.ts
Update macOS guidance text: fix stray backtick; add note about using mdfind for exact filename searches.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Handler as search-handlers.ts
  participant Manager as search-manager.ts
  participant RG as ripgrep process

  Client->>Handler: startSearch({ pattern, earlyTermination? })
  Handler->>Manager: startSearch(options)
  Manager->>Manager: validatePath(rootPath) -> validPath
  Manager->>RG: spawn rg with args (glob/exact-filename aware)
  Note over Manager: Wait for first data chunk or 40ms
  RG-->>Manager: stdout chunk(s)
  Manager->>Manager: processBufferedOutput()

  alt exact filename match AND earlyTermination != false
    Manager-->>RG: SIGTERM (delay ~100ms)
    Note over Manager: mark session complete
  end

  Manager-->>Handler: initial results (+hasMoreResults, isError/error trimmed)
  Handler-->>Client: response
Loading
sequenceDiagram
  autonumber
  participant Client
  participant Handler as search-handlers.ts
  participant Manager as search-manager.ts

  Client->>Handler: getMoreResults(sessionId, offset, length)
  Handler->>Manager: readSearchResults(...)
  Manager-->>Handler: { results, error?, hasMoreResults, isComplete }
  alt results.totalResults == 0 AND error present
    Handler-->>Client: error surfaced
  else
    Handler-->>Client: return results (partial OK), error omitted
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • serg33v

Poem

Hop hop! I sniff the names, then stop—so fast!
A file found quick, the spinning wheels at last
Take rest. If not, I’ll burrow, fetch more crumbs,
Stream by stream, till every echo comes.
Early to finish, or happily roam—
I’m a searchy bun, bringing findings home. 🐇🔎


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 18c22bb and d642231.

📒 Files selected for processing (1)
  • src/search-manager.ts (5 hunks)
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch additional-tweaks-for-file-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/tools/filesystem.ts (1)

905-919: Apply configurable earlyTermination, dedupe results, and always terminate the search session
File: src/tools/filesystem.ts (lines 905–919)
Apply this consolidated diff:

-export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> {
+export async function searchFiles(
+  rootPath: string,
+  pattern: string,
+  opts?: { earlyTermination?: boolean; maxResults?: number; ignoreCase?: boolean }
+): Promise<string[]> {
@@
-        const result = await searchManager.startSearch({
+        const result = await searchManager.startSearch({
             rootPath,
             pattern,
             searchType: 'files',
-            ignoreCase: true,
-            maxResults: 5000, // Higher limit for compatibility
-            earlyTermination: true, // Use early termination for better performance
+            ignoreCase: opts?.ignoreCase ?? true,
+            maxResults: opts?.maxResults ?? 5000,
+            earlyTermination: opts?.earlyTermination ?? true,
         });
@@
-        let allResults: string[] = [];
+        const seen = new Set<string>();
+        let allResults: string[] = [];
@@
-        for (const searchResult of result.results) {
-            if (searchResult.type === 'file') {
-                allResults.push(searchResult.file);
-            }
-        }
+        for (const searchResult of result.results) {
+            if (searchResult.type === 'file'
+              && searchResult.file !== '__LAST_READ_MARKER__'
+              && !seen.has(searchResult.file)
+            ) {
+                seen.add(searchResult.file);
+                allResults.push(searchResult.file);
+            }
+        }
@@
-            for (const searchResult of results.results) {
-                if (searchResult.file !== '__LAST_READ_MARKER__' && searchResult.type === 'file') {
-                    allResults.push(searchResult.file);
-                }
-            }
+            for (const searchResult of results.results) {
+                if (searchResult.type === 'file'
+                  && searchResult.file !== '__LAST_READ_MARKER__'
+                  && !seen.has(searchResult.file)
+                ) {
+                    seen.add(searchResult.file);
+                    allResults.push(searchResult.file);
+                }
+            }
@@
-        // Log only the count of found files, not their paths
+        // Proactively terminate the search session on completion
+        try { searchManager.terminateSearch(sessionId); } catch {}
+
+        // Log only the count of found files, not their paths

No existing call sites of searchFiles were found; ensure future callers pass opts (e.g. { earlyTermination: false }) when they require full result sets.

src/search-manager.ts (1)

127-134: Avoid sending full filesystem paths in telemetry.
requestedPath/validatedPath can leak PII. Send only basename or a hash.

Apply this diff:

-    capture('search_session_started', {
+    capture('search_session_started', {
       sessionId,
       searchType: options.searchType,
       hasTimeout: !!timeoutMs,
       timeoutMs,
-      requestedPath: options.rootPath,
-      validatedPath: validPath
+      requestedPathBasename: path.basename(options.rootPath),
+      validatedPathBasename: path.basename(validPath)
     });

Additionally, consider a safe wrapper to prevent unhandled rejections:

// near imports
const safeCapture = (...args: Parameters<typeof capture>) => { void capture(...args).catch(() => {}); };

Then replace capture(...) with safeCapture(...).

🧹 Nitpick comments (7)
.gitignore (1)

42-45: Ignore test outputs: LGTM, with a tiny pattern nit

Works as intended. Consider anchoring to repo root to avoid accidental matches in nested paths.

Apply this diff:

- test/test_output/
+ /test/test_output/
src/tools/schemas.ts (1)

111-112: Schema addition: good; surface the description to clients

The inline comment won’t reach clients via JSON Schema. Add a Zod description for generated docs/UIs.

Apply this diff:

-  earlyTermination: z.boolean().optional(), // Stop search early when exact filename match is found (default: true for files, false for content)
+  earlyTermination: z
+    .boolean()
+    .optional()
+    .describe('Stop search early when exact filename match is found (default: true for file searches, false for content searches)'),
src/utils/system-info.ts (1)

609-611: macOS guidance: LGTM

Nice addition. Consider adding a concrete example to reduce ambiguity, e.g., mdfind -name "config.json".

src/handlers/search-handlers.ts (2)

100-107: Good change: don’t block results on non-fatal errors.
Consider truncating very long errors to keep responses compact.

Apply this diff:

-          text: `Search session ${parsed.data.sessionId} encountered an error: ${results.error}`
+          text: `Search session ${parsed.data.sessionId} encountered an error: ${(results.error || '').slice(0, 600)}${(results.error || '').length > 600 ? '…' : ''}`

114-116: Unify runtime units across handlers.
start_search prints ms; get_more prints seconds. Pick one for consistency.

Apply this diff to keep ms here:

-  output += `Runtime: ${Math.round(results.runtime / 1000)}s\n`;
+  output += `Runtime: ${Math.round(results.runtime)}ms\n`;
src/search-manager.ts (2)

203-205: hasMoreResults computation is fine for range reads.
For parity, you could set hasMoreResults on tail to !session.isComplete.

Apply this diff (tail branch):

-        hasMoreResults: false, // Tail always returns what's available
+        hasMoreResults: !session.isComplete,

393-419: Filter is good; also redact paths before logging/sending.
Errors may still contain absolute paths. Redact before capture.

Apply this diff:

-      if (filteredErrors.length > 0) {
-        const meaningfulErrors = filteredErrors.join('\n').trim();
+      if (filteredErrors.length > 0) {
+        const redactPaths = (s: string) =>
+          s
+            // Unix-like absolute paths
+            .replace(/\b\/[^ \n\r\t]*/g, '<path>')
+            // Windows absolute paths
+            .replace(/\b[A-Za-z]:\\[^ \n\r\t]*/g, '<path>');
+        const meaningfulErrors = filteredErrors.map(redactPaths).join('\n').trim();
         if (meaningfulErrors) {
           session.error = (session.error || '') + meaningfulErrors + '\n';
-          capture('search_session_error', {
+          capture('search_session_error', {
             sessionId: session.id,
             error: meaningfulErrors.substring(0, 200)
           });
         }
       }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3505d21 and 18c22bb.

📒 Files selected for processing (7)
  • .gitignore (1 hunks)
  • src/handlers/search-handlers.ts (2 hunks)
  • src/search-manager.ts (5 hunks)
  • src/server.ts (1 hunks)
  • src/tools/filesystem.ts (1 hunks)
  • src/tools/schemas.ts (1 hunks)
  • src/utils/system-info.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
src/tools/filesystem.ts (3)
test/test_search_truncation.js (1)
  • testSearchTruncation (46-93)
test/test_improved_search_truncation.js (1)
  • testImprovedSearchTruncation (46-106)
src/tools/search.ts (1)
  • searchCodeFallback (149-257)
src/search-manager.ts (2)
src/utils/capture.ts (1)
  • capture (277-284)
src/tools/search.ts (1)
  • searchCode (16-146)
🔇 Additional comments (7)
src/server.ts (1)

316-316: Defaults verified: earlyTermination=true for file searches and false for content searches; handler passes user overrides correctly. LGTM.

src/handlers/search-handlers.ts (1)

33-34: Plumbing earlyTermination looks good — confirm defaults match docs.
Pass-through is correct. Please verify StartSearchArgsSchema defaults enforce: files=true, content=false.

src/search-manager.ts (5)

39-40: API surface OK.
Optional boolean with comment reads well.


138-144: Nice startup optimization.
Event-based first chunk with 40ms cap is reasonable.


195-199: Error gating on tail reads is sensible.
Returns data while suppressing noisy errors; LGTM.


214-216: Trimmed error exposure is good.
Prevents spurious error states; LGTM.


430-445: Exit-code policy: confirm intent for code 2.
You’re not marking code 2 (some files unreadable) as error when there are results. Confirm this matches UX expectations.

Comment thread src/search-manager.ts
@wonderwhy-er wonderwhy-er merged commit ac36d70 into main Sep 1, 2025
1 of 2 checks passed
@wonderwhy-er wonderwhy-er deleted the additional-tweaks-for-file-search branch September 10, 2025 08:49
@coderabbitai coderabbitai Bot mentioned this pull request Sep 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant