Skip to content

Refactor file search#229

Merged
wonderwhy-er merged 7 commits into
mainfrom
refactor-file-search
Aug 28, 2025
Merged

Refactor file search#229
wonderwhy-er merged 7 commits into
mainfrom
refactor-file-search

Conversation

@wonderwhy-er

@wonderwhy-er wonderwhy-er commented Aug 25, 2025

Copy link
Copy Markdown
Owner

Summary by CodeRabbit

  • New Features

    • Session-based streaming search: start_search, get_more_search_results, stop_search, list_searches — progressive results, pagination and cancellation.
  • Refactor

    • Replaces one-shot search with streaming sessions and clearer options (files vs content, case/includeHidden/context defaults, pagination).
  • Chores

    • File-search now prefers a session-backed engine with a Node.js fallback; usage tracking updated for new tools.
  • Tests

    • Tests migrated to drive streaming sessions, poll for completion, and ensure session cleanup.

@coderabbitai

coderabbitai Bot commented Aug 25, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a session-based streaming search subsystem: new search-manager and search-handlers, replaces one-shot search_files/search_code with start/get_more/stop/list search tools, updates schemas and server routing, adapts filesystem search to prefer ripgrep sessions with a NodeJS fallback, and migrates tests to the streaming API.

Changes

Cohort / File(s) Summary
Handlers — search refactor
src/handlers/edit-search-handlers.ts, src/handlers/filesystem-handlers.ts, src/handlers/index.ts, src/handlers/search-handlers.ts
Removes legacy handleSearchCode and handleSearchFiles; adds streaming handlers handleStartSearch, handleGetMoreSearchResults, handleStopSearch, handleListSearches; exposes search-handlers via barrel export; updates handler exports.
Search manager
src/search-manager.ts
New session-based ripgrep orchestrator (SearchManager singleton) with start/read/terminate/list APIs, pagination/tail reads, process wiring/parse logic, auto-cleanup, timeouts, and telemetry.
Server wiring
src/server.ts
Replaces tools search_files/search_code with start_search/get_more_search_results; adds stop_search/list_searches; swaps input schemas and routes calls to new handlers.
Schemas
src/tools/schemas.ts
Removes SearchFilesArgsSchema/SearchCodeArgsSchema; adds StartSearchArgsSchema, GetMoreSearchResultsArgsSchema, StopSearchArgsSchema, ListSearchesArgsSchema with renamed/expanded fields (e.g., searchType, timeout_ms, defaults).
Tools / Filesystem
src/tools/filesystem.ts, src/tools/search.ts
searchFiles now prefers search-manager sessions and falls back to a NodeJS recursive search; telemetry updated to indicate ripgrep vs fallback; searchTextInFiles adds ripgrep-fallback logging and adjusts exclude dirs.
Usage tracking
src/utils/usageTracker.ts
Updates TOOL_CATEGORIES to include new search tool names (start_search, get_more_search_results, stop_search, list_searches) under searchOperations.
Tests (streaming migration)
test/test-search-code-edge-cases.js, test/test-search-code.js, test/test_improved_search_truncation.js, test/test_search_truncation.js
Tests migrated from single-shot API to streaming API: use handleStartSearch, poll via handleGetMoreSearchResults, call handleStopSearch, add session cleanup and teardown, expanded scenarios, and updated assertions/imports to new handlers.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Client
  participant Server
  participant Handlers as search-handlers
  participant Manager as searchManager
  participant RG as ripgrep process

  Client->>Server: start_search(args)
  Server->>Handlers: handleStartSearch(args)
  Handlers->>Manager: startSearch(options)
  Manager->>RG: spawn ripgrep (stream JSON)
  RG-->>Manager: streaming lines
  Manager-->>Handlers: { sessionId, initial results, status }
  Handlers-->>Server: ServerResult (text summary)
  Server-->>Client: sessionId + initial summary

  loop polling
    Client->>Server: get_more_search_results({ sessionId, offset, length })
    Server->>Handlers: handleGetMoreSearchResults(...)
    Handlers->>Manager: readSearchResults(sessionId, ...)
    Manager-->>Handlers: { results, hasMore, isComplete }
    Handlers-->>Server: ServerResult (paged summary)
    Server-->>Client: incremental results/status
  end

  alt stop requested
    Client->>Server: stop_search({ sessionId })
    Server->>Handlers: handleStopSearch(...)
    Handlers->>Manager: terminateSearch(sessionId)
    Manager-->>Handlers: terminated status
    Handlers-->>Server: confirmation
    Server-->>Client: stopped message
  end
Loading
sequenceDiagram
  autonumber
  actor Client
  participant Server
  participant Handlers as search-handlers
  participant Manager as searchManager

  Client->>Server: list_searches()
  Server->>Handlers: handleListSearches()
  Handlers->>Manager: listSearchSessions()
  Manager-->>Handlers: sessions summary
  Handlers-->>Server: ServerResult (list)
  Server-->>Client: sessions list
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

size:L

Suggested reviewers

  • serg33v
  • dmitry-ottic-ai

Poem

I thump my paw and start the quest,
Sessions hum and ripgrep does its best.
I fetch some bits, then peek some more,
Stop, list, and tidy up the store.
Hooray — the burrow’s search is blessed! 🥕🐇

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch refactor-file-search

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbit in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbit in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbit gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbit read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbit help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbit ignore or @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbit summary or @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbit or @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 10

🧹 Nitpick comments (25)
src/tools/search.ts (2)

270-281: Telemetry on ripgrep fallback: OK — ensure messages are sanitized.

The new capture('searchTextInFiles_ripgrep_fallback', { error }) is helpful. Keep error payloads strictly to sanitized strings (no file paths, patterns, or user content) to avoid accidental PII/log leakage. Given other capture calls, this is a minor consistency reminder.


277-281: Behavior drift: fallback excludes "dist" but primary ripgrep path does not.

This will produce different results depending on the path taken. Either exclude dist consistently in ripgrep or stop excluding it in the fallback. Easiest immediate fix: remove "dist" from the fallback exclude list.

Apply this minimal diff:

-    return searchCodeFallback({
-      ...options,
-      excludeDirs: ['node_modules', '.git', 'dist']
-    });
+    return searchCodeFallback({
+      ...options,
+      excludeDirs: ['node_modules', '.git']
+    });

Alternative (longer-term): add a consistent exclude to ripgrep with a glob like -g '!dist/**', or thread an excludeDirs option through searchCode so both code paths honor the same exclusions.

src/handlers/edit-search-handlers.ts (1)

13-13: Export-only module for edit handler — LGTM.

This simplifies responsibility after moving search handling to search-handlers.

Consider renaming this file to edit-handlers.ts to better reflect its current scope and reduce confusion with the new search-handlers.ts.

test/test_search_truncation.js (3)

10-16: Make sessionId extraction regex tighter and more robust.

Current pattern is greedy and may capture unintended trailing text if the header line format changes.

Use a stricter capture:

-  const sessionIdMatch = result.content[0].text.match(/Started .+ session: (.+)/);
+  const sessionIdMatch = result.content[0].text.match(/Started .* session:\s*([a-zA-Z0-9_-]+)/);

69-77: Use a single constant for the API limit to avoid drift.

You’re hardcoding 1,000,000 here while other tests use 1,048,576. Centralize to one value for consistency and future tweaks.

Suggested change within this test:

-        if (combinedLength > 1000000) {
+        const apiLimit = 1048576; // 1 MiB
+        if (combinedLength > apiLimit) {
             console.log('⚠️  Combined results quite large - may need truncation handling');
-        } else if (finalResult.content[0].text.includes('Results truncated')) {
+        } else if (finalResult.content[0].text.includes('Results truncated')) {

7-8: Reduce flakiness by relaxing the timeout for large repos.

A 10s overall timeout with 100ms polling might be tight on CI or larger trees. Consider 20–30s.

Example:

-async function searchAndWaitForCompletion(searchArgs, timeout = 10000) {
+async function searchAndWaitForCompletion(searchArgs, timeout = 30000) {

Also applies to: 20-33

test/test_improved_search_truncation.js (3)

11-16: Tighten the sessionId regex.

Avoid greedy capture to keep just the session ID token.

-  const sessionIdMatch = result.content[0].text.match(/Started .+ session: (.+)/);
+  const sessionIdMatch = result.content[0].text.match(/Started .* session:\s*([a-zA-Z0-9_-]+)/);

70-79: Good diagnostics and safety analysis. Minor polish suggestion.

You already use apiLimit = 1048576. Consider reusing it for the earlier thresholds to keep a single source of truth (e.g., warn if totalLength > apiLimit, caution if > 0.8 * apiLimit).

Example:

-        if (totalLength > 1000000) {
+        if (totalLength > apiLimit) {
             console.log('❌ Results still quite large - over 1MB combined');
-        } else if (totalLength > 800000) {
+        } else if (totalLength > Math.floor(0.8 * apiLimit)) {
             console.log('⚠️  Results approaching limits but acceptable');
         } else {
             console.log('✅ Results well within safe limits');
         }

Also applies to: 93-102


7-8: Bump timeout to reduce CI flakiness.

Same rationale as the sibling test — consider increasing to 30s.

-async function searchAndWaitForCompletion(searchArgs, timeout = 10000) {
+async function searchAndWaitForCompletion(searchArgs, timeout = 30000) {

Also applies to: 20-33

src/search-manager.ts (4)

368-387: Context lines are counted as matches, inflating totals.

processBufferedOutput increments totalMatches for both match and context records. That makes “Total results” ambiguous and higher than actual match count.

Recommendation: track two counters (e.g., matchCount and displayCount), and ensure handlers label them clearly (e.g., “Total matches” vs “Lines output (with context)”). Keep displayCount for pagination math if needed.

Would you like a small patch to separate counters and update the handlers’ summary strings accordingly?


4-4: Cyclic dependency risk with validatePath import.

This module imports validatePath from ./tools/filesystem.js, while src/tools/filesystem.ts dynamically imports ../search-manager.js. Although the dynamic import mitigates hard cycles at load time, it’s fragile and easy to regress.

Suggestion: extract validatePath (and path-allowlisting helpers) into a standalone path-utils.ts used by both modules to break the cycle cleanly.


98-105: Timeout cleanup: clear the termination timer on process exit.

You arm a setTimeout to kill ripgrep, but never clear it. If the process exits early, the timer still fires later (harmless but noisy).

Add a handle to the timer and clear it in both close and error handlers.

@@
-    if (options.timeout) {
-      setTimeout(() => {
+    let killTimer: NodeJS.Timeout | null = null;
+    if (options.timeout) {
+      killTimer = setTimeout(() => {
         if (!session.isComplete && !session.process.killed) {
           session.process.kill('SIGTERM');
         }
       }, options.timeout);
     }
@@
-    process.on('close', (code: number) => {
+    process.on('close', (code: number) => {
+      if (killTimer) { clearTimeout(killTimer); killTimer = null; }
@@
-    process.on('error', (error: Error) => {
+    process.on('error', (error: Error) => {
+      if (killTimer) { clearTimeout(killTimer); killTimer = null; }

Also applies to: 322-349


389-427: Lossy multi-match handling per line.

For parsed.type === 'match', only the first submatch is returned (submatches[0]). Lines with multiple occurrences are collapsed to one result.

If downstream consumers care about every occurrence, emit one result per submatch (or include submatches in the payload). Otherwise, document that matches are at most one per line.

src/tools/filesystem.ts (1)

971-1026: Fallback Node.js search semantics differ (no glob support, heavy per-file validatePath).

  • The fallback uses name.includes(pattern) (substring) so *.js won’t work. Consider supporting simple globs and | splitting for parity with ripgrep.
  • Calling validatePath(fullPath) for every entry is expensive. Given you validated rootPath and you remain inside it, you can:
    • avoid per-file validation, or
    • at least skip following symlinks (use lstat + isSymbolicLink) to prevent escaping the allowed tree.

Optional parity improvement:

-                if (entry.name.toLowerCase().includes(pattern.toLowerCase())) {
+                const patterns = pattern.split('|').map(p => p.trim()).filter(Boolean);
+                const matches = patterns.length
+                  ? patterns.some(glob => minimatch(entry.name, glob, { nocase: true }))
+                  : entry.name.toLowerCase().includes(pattern.toLowerCase());
+                if (matches) {
                     results.push(fullPath);
                 }

You’ll need minimatch (or implement a tiny glob matcher). If you want, I can wire this with zero new deps.

test/test-search-code.js (1)

526-543: Tests reach into private internals (searchManager.sessions?.clear?.()), increasing coupling.

Prefer a public shutdown/reset API to keep tests decoupled from private state. For example, expose:

// in src/search-manager.ts
export function shutdownSearchManager() {
  const sessions = searchManager.listSearchSessions();
  for (const s of sessions) searchManager.terminateSearch(s.id);
  stopSearchManagerCleanup();
  // Optional: expose a safe clear method on SearchManager guarded for tests
  (searchManager as any).clearAllSessions?.();
}

Or add a clearAllSessions() method to SearchManager under a __TEST__ guard.

test/test-search-code-edge-cases.js (3)

261-275: Concurrent searches test doesn’t assert completion.

You start multiple searches concurrently but only check the initial responses. Consider driving them to completion (via searchAndWaitForCompletion) to validate concurrency in the streaming pipeline.


285-300: Very short timeout test: broaden assertions to accept graceful termination text.

Depending on the exact ripgrep exit code and handler logic, the initial start text may still be “running”. It’s safer to assert for “Status: RUNNING” or a termination hint (e.g., “terminated”/“completed”) in addition to current checks.


328-340: Zero max results semantics are unclear.

Passing maxResults: 0 currently results in no -m flag (effectively “unlimited”). If “0 means unrestricted” is the intended contract, add an assertion here that ensures results can still be returned; otherwise, tighten the schema and handler to reject zero.

src/handlers/index.ts (1)

6-6: Barrel export looks good. Verify deprecation of legacy search handlers.

You still export ./edit-search-handlers.js. If the legacy handlers are deprecated, consider removing that export to avoid accidental use; keep it if you need backward compatibility.

src/tools/schemas.ts (1)

113-117: Constrain pagination types and semantics

Make intent explicit and prevent accidental zero/negative lengths. Also clarify that offset supports negative values.

 export const GetMoreSearchResultsArgsSchema = z.object({
   sessionId: z.string(),
-  offset: z.number().optional().default(0),    // Same as file reading
-  length: z.number().optional().default(100),  // Same as file reading (but smaller default)
+  offset: z.number().int().default(0),         // Negative = tail behavior
+  length: z.number().int().positive().default(100),
 });
src/handlers/search-handlers.ts (3)

95-98: Avoid redundant || fallbacks; rely on Zod defaults

safeParse with .default() already provides values. || masks intentional zeroes and complicates negatives.

-    const results = searchManager.readSearchResults(
-      parsed.data.sessionId,
-      parsed.data.offset || 0,
-      parsed.data.length || 100
-    );
+    const results = searchManager.readSearchResults(
+      parsed.data.sessionId,
+      parsed.data.offset,
+      parsed.data.length
+    );
@@
-    const offset = parsed.data.offset || 0;
+    const offset = parsed.data.offset;

Also applies to: 116-117


14-14: Prefer unknown over any for handler inputs

You’re validating with Zod; accept unknown and keep any out of the type surface.

-export async function handleStartSearch(args: any): Promise<ServerResult> {
+export async function handleStartSearch(args: unknown): Promise<ServerResult> {
@@
-export async function handleGetMoreSearchResults(args: any): Promise<ServerResult> {
+export async function handleGetMoreSearchResults(args: unknown): Promise<ServerResult> {
@@
-export async function handleStopSearch(args: any): Promise<ServerResult> {
+export async function handleStopSearch(args: unknown): Promise<ServerResult> {

Optionally, define precise types via z.infer<typeof Schema> after Zod parse.

Also applies to: 84-84, 172-172


2-7: Unused import

ListSearchesArgsSchema isn’t used in this module.

Remove it from the import list to keep the file tidy.

-  StopSearchArgsSchema,
-  ListSearchesArgsSchema
+  StopSearchArgsSchema
src/server.ts (2)

304-331: Tool doc: mention that initial results are included on start

The handler prints initial results (up to 10) on start. The description says it “returns immediately with a session ID”, which might imply no results. Clarify that it may include initial results and status.

-                        immediately with a session ID. Use get_more_search_results to get results as they
+                        immediately with a session ID and may include initial results. Use get_more_search_results to get results as they

808-822: Routing: new search tools wired correctly

Switch cases call the new handlers. Consider temporary aliases for removed tools to ease migration (return a helpful error pointing to the new API).

Example:

             default:
                 capture('server_unknown_tool', {name});
                 result = {
                     content: [{type: "text", text: `Error: Unknown tool: ${name}`}],
                     isError: true,
                 };

could be extended as:

+            case "search_files":
+            case "search_code":
+                result = {
+                    content: [{
+                        type: "text",
+                        text: `The tools "${name}" have been replaced by start_search/get_more_search_results/stop_search/list_searches. Please migrate.`
+                    }],
+                    isError: true,
+                };
+                break;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a7e10a7 and 02ed3f9.

📒 Files selected for processing (14)
  • src/handlers/edit-search-handlers.ts (1 hunks)
  • src/handlers/filesystem-handlers.ts (0 hunks)
  • src/handlers/index.ts (1 hunks)
  • src/handlers/search-handlers.ts (1 hunks)
  • src/search-manager.ts (1 hunks)
  • src/server.ts (3 hunks)
  • src/tools/filesystem.ts (2 hunks)
  • src/tools/schemas.ts (1 hunks)
  • src/tools/search.ts (1 hunks)
  • src/utils/usageTracker.ts (1 hunks)
  • test/test-search-code-edge-cases.js (19 hunks)
  • test/test-search-code.js (17 hunks)
  • test/test_improved_search_truncation.js (1 hunks)
  • test/test_search_truncation.js (1 hunks)
💤 Files with no reviewable changes (1)
  • src/handlers/filesystem-handlers.ts
🧰 Additional context used
🧬 Code graph analysis (11)
src/tools/search.ts (1)
src/utils/capture.ts (1)
  • capture (277-284)
src/tools/filesystem.ts (3)
src/search-manager.ts (1)
  • searchManager (431-431)
src/utils/capture.ts (1)
  • capture (277-284)
src/handlers/filesystem-handlers.ts (1)
  • handleSearchFiles (244-281)
src/handlers/search-handlers.ts (4)
src/types.ts (1)
  • ServerResult (51-55)
src/tools/schemas.ts (3)
  • StartSearchArgsSchema (101-111)
  • GetMoreSearchResultsArgsSchema (113-117)
  • StopSearchArgsSchema (119-121)
src/search-manager.ts (1)
  • searchManager (431-431)
src/utils/capture.ts (1)
  • capture (277-284)
test/test_search_truncation.js (1)
src/handlers/search-handlers.ts (3)
  • handleStartSearch (14-79)
  • handleGetMoreSearchResults (84-167)
  • handleStopSearch (172-207)
src/search-manager.ts (3)
src/tools/search.ts (1)
  • SearchResult (9-13)
src/tools/filesystem.ts (1)
  • validatePath (226-281)
src/utils/capture.ts (1)
  • capture (277-284)
src/handlers/edit-search-handlers.ts (2)
src/tools/edit.ts (3)
  • handleEditBlock (345-354)
  • performSearchReplace (96-300)
  • SearchReplace (12-15)
test/test-edit-block-occurrences.js (1)
  • testEmptySearchString (282-306)
src/tools/schemas.ts (1)
src/tools/edit.ts (2)
  • performSearchReplace (96-300)
  • SearchReplace (12-15)
test/test_improved_search_truncation.js (1)
src/handlers/search-handlers.ts (3)
  • handleStartSearch (14-79)
  • handleGetMoreSearchResults (84-167)
  • handleStopSearch (172-207)
src/server.ts (1)
src/tools/schemas.ts (4)
  • StartSearchArgsSchema (101-111)
  • GetMoreSearchResultsArgsSchema (113-117)
  • StopSearchArgsSchema (119-121)
  • ListSearchesArgsSchema (123-123)
test/test-search-code-edge-cases.js (3)
test/test-search-code.js (33)
  • searchAndWaitForCompletion (203-207)
  • searchAndWaitForCompletion (227-232)
  • searchAndWaitForCompletion (247-252)
  • searchAndWaitForCompletion (269-274)
  • searchAndWaitForCompletion (291-296)
  • searchAndWaitForCompletion (319-324)
  • searchAndWaitForCompletion (344-349)
  • searchAndWaitForCompletion (369-374)
  • searchAndWaitForCompletion (393-397)
  • searchAndWaitForCompletion (475-479)
  • result (34-34)
  • result (415-419)
  • result (442-445)
  • result (455-458)
  • sessionIdMatch (37-37)
  • sessionId (41-41)
  • sessionId (165-165)
  • startTime (45-45)
  • moreResults (47-47)
  • text (212-212)
  • text (234-234)
  • text (254-254)
  • text (276-276)
  • text (301-301)
  • text (326-326)
  • text (351-351)
  • text (379-379)
  • text (402-402)
  • text (423-423)
  • text (446-446)
  • text (459-459)
  • text (481-481)
  • isValidResponse (424-424)
src/handlers/search-handlers.ts (4)
  • handleStartSearch (14-79)
  • handleGetMoreSearchResults (84-167)
  • handleStopSearch (172-207)
  • handleListSearches (212-248)
src/search-manager.ts (2)
  • searchManager (431-431)
  • stopSearchManagerCleanup (459-464)
test/test-search-code.js (3)
test/test-search-code-edge-cases.js (22)
  • result (30-30)
  • result (285-290)
  • result (309-314)
  • result (328-333)
  • result (348-353)
  • result (370-374)
  • sessionIdMatch (33-33)
  • sessionId (37-37)
  • sessionId (125-125)
  • startTime (41-41)
  • startTime (234-234)
  • text (164-164)
  • text (184-184)
  • text (202-202)
  • text (220-220)
  • text (246-246)
  • text (293-293)
  • text (335-335)
  • text (356-356)
  • text (378-378)
  • text (419-419)
  • text (451-451)
src/handlers/search-handlers.ts (4)
  • handleStartSearch (14-79)
  • handleGetMoreSearchResults (84-167)
  • handleStopSearch (172-207)
  • handleListSearches (212-248)
src/search-manager.ts (2)
  • searchManager (431-431)
  • stopSearchManagerCleanup (459-464)
🔇 Additional comments (7)
test/test-search-code.js (1)

470-485: Strengthen file-search test to assert exclusions

  • File: test/test-search-code.js, lines 470–485

Add negative assertions to ensure only .js files are returned and catch any regression in the glob‐filtering logic:

   const text = finalResult.content[0].text;
   assert(text.includes('test1.js'), 'Should find JavaScript files');
+  assert(!text.includes('test2.ts'),  'Should not list non-matching TypeScript files');
+  assert(!text.includes('nested.py'), 'Should not list non-matching Python files');

Please verify that using pattern: '*.js' is correctly applied in buildRipgrepArgs and that finalResult.content[0].text reflects the complete list of matched files so these negative checks will fail if non‐matching files slip through.

src/tools/schemas.ts (2)

98-99: Empty feedback schema: OK to leave empty

This matches the tool description that the form is user-filled. No issues.


119-123: Stop/List search schemas look good

Schemas are minimal and consistent with tool wiring.

src/handlers/search-handlers.ts (1)

23-34: No changes needed: startSearch arguments conform to SearchSessionOptions

I’ve verified that SearchSessionOptions is defined as:

export interface SearchSessionOptions {
  rootPath: string;
  pattern: string;
  searchType: 'files' | 'content';
  filePattern?: string;
  ignoreCase?: boolean;
  maxResults?: number;
  includeHidden?: boolean;
  contextLines?: number;
  timeout?: number;
}

And the handler passes exactly those keys—rootPath, pattern, searchType, filePattern, ignoreCase, maxResults, includeHidden, contextLines, and timeout—to startSearch. No renaming or adjustments are required.

src/server.ts (3)

42-46: Schema imports: OK

New search schemas are correctly imported and used below.


333-357: Get-more tool doc aligns with handler behavior

Offset/length semantics and tail behavior match the implementation.


374-383: List tool doc consistent

Docs reflect the output built in handleListSearches.

Comment thread src/handlers/search-handlers.ts
Comment on lines +48 to +54
for (const searchResult of result.results.slice(0, 10)) {
if (searchResult.type === 'content') {
output += `📄 ${searchResult.file}:${searchResult.line} - ${searchResult.match?.substring(0, 100)}${searchResult.match && searchResult.match.length > 100 ? '...' : ''}\n`;
} else {
output += `📁 ${searchResult.file}\n`;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard against undefined match text to avoid “undefined” in output

When match is missing, template interpolation prints “undefined”. Use a safe default and compute ellipsis robustly.

-      for (const searchResult of result.results.slice(0, 10)) {
-        if (searchResult.type === 'content') {
-          output += `📄 ${searchResult.file}:${searchResult.line} - ${searchResult.match?.substring(0, 100)}${searchResult.match && searchResult.match.length > 100 ? '...' : ''}\n`;
-        } else {
-          output += `📁 ${searchResult.file}\n`;
-        }
-      }
+      for (const searchResult of result.results.slice(0, 10)) {
+        if (searchResult.type === 'content') {
+          const raw = searchResult.match ?? '';
+          const snippet = raw.slice(0, 100);
+          const ellipsis = raw.length > 100 ? '...' : '';
+          output += `📄 ${searchResult.file}:${searchResult.line} - ${snippet}${ellipsis}\n`;
+        } else {
+          output += `📁 ${searchResult.file}\n`;
+        }
+      }
-      for (const result of results.results) {
-        if (result.type === 'content') {
-          output += `📄 ${result.file}:${result.line} - ${result.match?.substring(0, 100)}${result.match && result.match.length > 100 ? '...' : ''}\n`;
-        } else {
-          output += `📁 ${result.file}\n`;
-        }
-      }
+      for (const result of results.results) {
+        if (result.type === 'content') {
+          const raw = result.match ?? '';
+          const snippet = raw.slice(0, 100);
+          const ellipsis = raw.length > 100 ? '...' : '';
+          output += `📄 ${result.file}:${result.line} - ${snippet}${ellipsis}\n`;
+        } else {
+          output += `📁 ${result.file}\n`;
+        }
+      }

Also applies to: 137-143

🤖 Prompt for AI Agents
In src/handlers/search-handlers.ts around lines 48-54 (and similarly at
137-143), the template interpolation can print "undefined" when
searchResult.match is missing; change the logic to first coerce match to a safe
string (e.g. const matchText = (searchResult.match ?? '') as string), compute a
boolean for truncation using matchText.length > 100, and then use
matchText.substring(0, 100) plus '...' only when truncated; update both
locations so they never interpolate undefined and the ellipsis is added only
when the actual string is longer than the limit.

Comment on lines +71 to +73
const errorMessage = error instanceof Error ? error.message : String(error);
capture('search_session_start_error', { error: errorMessage });

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Telemetry may leak PII/file paths via raw error messages

capture('search_session_start_error', { error: errorMessage }) can include absolute paths, patterns, or environment details. The capture pipeline sends to Google Analytics (see src/utils/capture.ts lines 276-283), which risks unintended data exfiltration.

Harden by redacting the message and only sending error class + coarse flags:

   } catch (error) {
     const errorMessage = error instanceof Error ? error.message : String(error);
-    capture('search_session_start_error', { error: errorMessage });
+    const errorName = error instanceof Error ? error.name : 'UnknownError';
+    const containsPathLikeData = /[\\/][^ \n\r\t]{1,}([\\/][^ \n\r\t]{1,})+/.test(errorMessage);
+    capture('search_session_start_error', { error_name: errorName, contains_path_like_data: containsPathLikeData });

Optionally log the full error locally (not to telemetry) if needed. Do not send raw strings off-box.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const errorMessage = error instanceof Error ? error.message : String(error);
capture('search_session_start_error', { error: errorMessage });
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
const errorName = error instanceof Error ? error.name : 'UnknownError';
const containsPathLikeData = /[\\/][^ \n\r\t]{1,}([\\/][^ \n\r\t]{1,})+/.test(errorMessage);
capture('search_session_start_error', { error_name: errorName, contains_path_like_data: containsPathLikeData });
// …any additional error handling…
}
🤖 Prompt for AI Agents
In src/handlers/search-handlers.ts around lines 71-73, sending the raw error
message to telemetry can leak PII or file paths; instead redact the message
before sending by replacing it with the error class/name and coarse flags (e.g.,
{ type: error.constructor?.name || 'Error', isTimeout: boolean, isNetwork:
boolean }) and remove or truncate any string details; optionally write the full
error (stack/message) to local logs only (not via capture) for debugging; ensure
capture('search_session_start_error', ...) no longer includes raw errorMessage
but only the sanitized fields.

Comment thread src/search-manager.ts
Comment thread src/server.ts
Comment on lines +359 to +372
name: "stop_search",
description: `
Stop an active search.

Stops the background search process gracefully. Use this when you've found
what you need or if a search is taking too long. Similar to force_terminate
for terminal processes.

The search will still be available for reading final results until it's
automatically cleaned up after 5 minutes.

${CMD_PREFIX_DESCRIPTION}`,
inputSchema: zodToJsonSchema(StopSearchArgsSchema),
},

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Stop tool doc: verify auto-cleanup timing

Doc promises cleanup after 5 minutes. Ensure searchManager actually performs TTL cleanup.

If not implemented, either implement cleanup or adjust the description.


🏁 Script executed:

#!/bin/bash
rg -n "cleanup|TTL|setTimeout|setInterval|expire|gc|Garbage|5 ?min" src -C2

Length of output: 11491


🏁 Script executed:

#!/bin/bash
sed -n '430,480p' src/search-manager.ts

Length of output: 1122


Stop tool doc: clarify cleanup timing

The implementation does include a periodic TTL cleanup that runs every 5 minutes (via startCleanupIfNeeded → setInterval) and deletes sessions older than 5 minutes (cleanupSessions default maxAge = 5 * 60 * 1000). However, completed (and errored) searches are also removed after just 2 minutes by a separate setTimeout in SearchManager.

Please update the stop_search description in src/server.ts (around lines 366–372) to accurately reflect the two-phase cleanup:

  • Completed/error sessions auto-clean after ~2 minutes.
  • A full sweep removes any remaining stale sessions every 5 minutes.

This will keep the documentation in sync with the actual behavior.

🤖 Prompt for AI Agents
In src/server.ts around lines 359 to 372, the stop_search description text is
inaccurate about cleanup timing; update the multi-line description so it states
that completed or errored searches are removed after approximately 2 minutes
(via SearchManager setTimeout), and that a periodic full sweep runs every 5
minutes to clean up any remaining stale sessions (via the TTL cleanup); keep the
rest of the wording (including ${CMD_PREFIX_DESCRIPTION}) intact and ensure the
new text clearly mentions the two-phase cleanup durations (~2 minutes for
completed/errored sessions, 5 minutes for full sweeps).

Comment thread src/tools/filesystem.ts
Comment on lines 905 to +969
export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> {
// Use the new search manager for better performance
// This provides a temporary compatibility layer until we fully migrate to search sessions
const { searchManager } = await import('../search-manager.js');

try {
const result = await searchManager.startSearch({
rootPath,
pattern,
searchType: 'files',
ignoreCase: true,
maxResults: 5000, // Higher limit for compatibility
});

const sessionId = result.sessionId;

// Poll for results until complete
let allResults: string[] = [];
let isComplete = result.isComplete;
let startTime = Date.now();

// Add initial results
for (const searchResult of result.results) {
if (searchResult.type === 'file') {
allResults.push(searchResult.file);
}
}

while (!isComplete) {
await new Promise(resolve => setTimeout(resolve, 100)); // Wait 100ms

const results = searchManager.readSearchResults(sessionId);
isComplete = results.isComplete;

// Add new file paths to results
for (const searchResult of results.results) {
if (searchResult.file !== '__LAST_READ_MARKER__' && searchResult.type === 'file') {
allResults.push(searchResult.file);
}
}

// Safety check to prevent infinite loops (30 second timeout)
if (Date.now() - startTime > 30000) {
searchManager.terminateSearch(sessionId);
break;
}
}

// Log only the count of found files, not their paths
capture('server_search_files_complete', {
resultsCount: allResults.length,
patternLength: pattern.length,
usedRipgrep: true
});

return allResults;
} catch (error) {
// Fallback to original Node.js implementation if ripgrep fails
capture('server_search_files_ripgrep_fallback', {
error: error instanceof Error ? error.message : 'Unknown error'
});

return await searchFilesNodeJS(rootPath, pattern);
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Duplicate and truncated results when polling search sessions (offset is never advanced).

searchFiles repeatedly calls searchManager.readSearchResults(sessionId) without offset, which always returns the first page. You then keep appending, causing duplicates and possibly never enumerating beyond the first 100 entries.

Advance the offset and de-duplicate:

 export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> {
@@
-        // Poll for results until complete
-        let allResults: string[] = [];
+        // Poll for results until complete
+        const allResults: string[] = [];
+        const seen = new Set<string>();
         let isComplete = result.isComplete;
         let startTime = Date.now();
+        let offset = 0;
+        const PAGE = 1000;
         
         // Add initial results
-        for (const searchResult of result.results) {
+        for (const searchResult of result.results) {
             if (searchResult.type === 'file') {
-                allResults.push(searchResult.file);
+                if (!seen.has(searchResult.file)) {
+                    seen.add(searchResult.file);
+                    allResults.push(searchResult.file);
+                }
             }
         }
+        offset += result.results.length;
         
         while (!isComplete) {
           await new Promise(resolve => setTimeout(resolve, 100)); // Wait 100ms
-          
-          const results = searchManager.readSearchResults(sessionId);
-          isComplete = results.isComplete;
-          
-          // Add new file paths to results
-          for (const searchResult of results.results) {
-              if (searchResult.file !== '__LAST_READ_MARKER__' && searchResult.type === 'file') {
-                  allResults.push(searchResult.file);
-              }
-          }
+          const page = searchManager.readSearchResults(sessionId, offset, PAGE);
+          isComplete = page.isComplete;
+          // Add new file paths to results
+          for (const searchResult of page.results) {
+              if (searchResult.type === 'file' && !seen.has(searchResult.file)) {
+                  seen.add(searchResult.file);
+                  allResults.push(searchResult.file);
+              }
+          }
+          offset += page.returnedCount;
@@
           if (Date.now() - startTime > 30000) {
             searchManager.terminateSearch(sessionId);
             break;
           }
         }

Also, __LAST_READ_MARKER__ isn’t produced anywhere; remove related filtering.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> {
// Use the new search manager for better performance
// This provides a temporary compatibility layer until we fully migrate to search sessions
const { searchManager } = await import('../search-manager.js');
try {
const result = await searchManager.startSearch({
rootPath,
pattern,
searchType: 'files',
ignoreCase: true,
maxResults: 5000, // Higher limit for compatibility
});
const sessionId = result.sessionId;
// Poll for results until complete
let allResults: string[] = [];
let isComplete = result.isComplete;
let startTime = Date.now();
// Add initial results
for (const searchResult of result.results) {
if (searchResult.type === 'file') {
allResults.push(searchResult.file);
}
}
while (!isComplete) {
await new Promise(resolve => setTimeout(resolve, 100)); // Wait 100ms
const results = searchManager.readSearchResults(sessionId);
isComplete = results.isComplete;
// Add new file paths to results
for (const searchResult of results.results) {
if (searchResult.file !== '__LAST_READ_MARKER__' && searchResult.type === 'file') {
allResults.push(searchResult.file);
}
}
// Safety check to prevent infinite loops (30 second timeout)
if (Date.now() - startTime > 30000) {
searchManager.terminateSearch(sessionId);
break;
}
}
// Log only the count of found files, not their paths
capture('server_search_files_complete', {
resultsCount: allResults.length,
patternLength: pattern.length,
usedRipgrep: true
});
return allResults;
} catch (error) {
// Fallback to original Node.js implementation if ripgrep fails
capture('server_search_files_ripgrep_fallback', {
error: error instanceof Error ? error.message : 'Unknown error'
});
return await searchFilesNodeJS(rootPath, pattern);
}
}
export async function searchFiles(rootPath: string, pattern: string): Promise<string[]> {
// Use the new search manager for better performance
// This provides a temporary compatibility layer until we fully migrate to search sessions
const { searchManager } = await import('../search-manager.js');
try {
const result = await searchManager.startSearch({
rootPath,
pattern,
searchType: 'files',
ignoreCase: true,
maxResults: 5000, // Higher limit for compatibility
});
const sessionId = result.sessionId;
// Poll for results until complete
const allResults: string[] = [];
const seen = new Set<string>();
let isComplete = result.isComplete;
let startTime = Date.now();
let offset = 0;
const PAGE = 1000;
// Add initial results
for (const searchResult of result.results) {
if (searchResult.type === 'file') {
if (!seen.has(searchResult.file)) {
seen.add(searchResult.file);
allResults.push(searchResult.file);
}
}
}
offset += result.results.length;
while (!isComplete) {
await new Promise(resolve => setTimeout(resolve, 100)); // Wait 100ms
const page = searchManager.readSearchResults(sessionId, offset, PAGE);
isComplete = page.isComplete;
// Add new file paths to results
for (const searchResult of page.results) {
if (searchResult.type === 'file' && !seen.has(searchResult.file)) {
seen.add(searchResult.file);
allResults.push(searchResult.file);
}
}
offset += page.returnedCount;
// Safety check to prevent infinite loops (30 second timeout)
if (Date.now() - startTime > 30000) {
searchManager.terminateSearch(sessionId);
break;
}
}
// Log only the count of found files, not their paths
capture('server_search_files_complete', {
resultsCount: allResults.length,
patternLength: pattern.length,
usedRipgrep: true
});
return allResults;
} catch (error) {
// Fallback to original Node.js implementation if ripgrep fails
capture('server_search_files_ripgrep_fallback', {
error: error instanceof Error ? error.message : 'Unknown error'
});
return await searchFilesNodeJS(rootPath, pattern);
}
}
🤖 Prompt for AI Agents
In src/tools/filesystem.ts around lines 905 to 969, the polling loop calls
searchManager.readSearchResults(sessionId) without passing an offset so it
always returns the first page, causing duplicates and truncation; remove the
bogus '__LAST_READ_MARKER__' check, track and advance an offset on each read
(initialize offset to the number of items already consumed from the initial
result), pass that offset into readSearchResults(sessionId, offset), and append
only new file entries while deduplicating (e.g., maintain a Set of seen file
paths or check against allResults) so you never re-add the same path; also
increment the offset by the number of results returned each poll and keep the
existing timeout/termination logic.

Comment thread src/tools/schemas.ts
Comment on lines +101 to +111
export const StartSearchArgsSchema = z.object({
path: z.string(),
pattern: z.string(),
searchType: z.enum(['files', 'content']).default('files'),
filePattern: z.string().optional(),
ignoreCase: z.boolean().optional().default(true),
maxResults: z.number().optional(),
includeHidden: z.boolean().optional().default(false),
contextLines: z.number().optional().default(5),
timeout_ms: z.number().optional(), // Match process naming convention
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Tighten validation and simplify defaults in StartSearchArgsSchema

  • Prevent empty strings and invalid ranges.
  • Remove redundant .optional() before .default(); default() already handles undefined.
  • Add integer/positive guards to avoid resource spikes and accidental negatives.

Apply:

 export const StartSearchArgsSchema = z.object({
-  path: z.string(),
-  pattern: z.string(),
+  path: z.string().min(1, 'path cannot be empty'),
+  pattern: z.string().min(1, 'pattern cannot be empty'),
   searchType: z.enum(['files', 'content']).default('files'),
   filePattern: z.string().optional(),
-  ignoreCase: z.boolean().optional().default(true),
-  maxResults: z.number().optional(),
-  includeHidden: z.boolean().optional().default(false),
-  contextLines: z.number().optional().default(5),
-  timeout_ms: z.number().optional(), // Match process naming convention
+  ignoreCase: z.boolean().default(true),
+  maxResults: z.number().int().positive().max(50000).optional(),
+  includeHidden: z.boolean().default(false),
+  contextLines: z.number().int().min(0).default(5),
+  timeout_ms: z.number().int().positive().max(300000).optional(), // cap to 5 minutes
 });
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export const StartSearchArgsSchema = z.object({
path: z.string(),
pattern: z.string(),
searchType: z.enum(['files', 'content']).default('files'),
filePattern: z.string().optional(),
ignoreCase: z.boolean().optional().default(true),
maxResults: z.number().optional(),
includeHidden: z.boolean().optional().default(false),
contextLines: z.number().optional().default(5),
timeout_ms: z.number().optional(), // Match process naming convention
});
export const StartSearchArgsSchema = z.object({
path: z.string().min(1, 'path cannot be empty'),
pattern: z.string().min(1, 'pattern cannot be empty'),
searchType: z.enum(['files', 'content']).default('files'),
filePattern: z.string().optional(),
ignoreCase: z.boolean().default(true),
maxResults: z.number().int().positive().max(50000).optional(),
includeHidden: z.boolean().default(false),
contextLines: z.number().int().min(0).default(5),
timeout_ms: z.number().int().positive().max(300000).optional(), // cap to 5 minutes
});
🤖 Prompt for AI Agents
In src/tools/schemas.ts around lines 101 to 111, tighten validation and simplify
defaults for StartSearchArgsSchema: make path and pattern required non-empty
strings (e.g., z.string().min(1)), make filePattern optional but if present
require non-empty string, remove unnecessary .optional() before .default() on
boolean/defaulted fields (use z.boolean().default(...)), enforce
integer/positive guards on numeric fields (e.g., maxResults:
z.number().int().positive().max(10000).optional(), contextLines:
z.number().int().min(0).max(100).default(5), timeout_ms:
z.number().int().positive().max(1200000).optional()), and keep searchType
enum/default as is; adjust ranges/caps as shown to prevent negatives and
resource spikes.

Comment thread src/utils/usageTracker.ts
Comment on lines +43 to 46
search: ['start_search', 'get_more_search_results', 'stop_search', 'list_searches'],
config: ['get_config', 'set_config_value'],
process: ['list_processes', 'kill_process']
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

New streaming search tools are correctly mapped to searchOperations.

Looks good and aligns with the new session-based API. Recommend scanning the repo for stale references to the old tools to keep analytics clean.

Run this quick check to ensure no lingering old tool names remain:


🏁 Script executed:

#!/bin/bash
# Find references to deprecated tool names
rg -nP --ignore-case '(search_files|search_code)\b' -g '!dist/**' -S

Length of output: 354


Documentation Cleanup: Remove deprecated search_files and search_code references

We’ve identified two occurrences of the old tool names that should be updated:

  • FAQ.md (line 271): “Both search_files and search_code have a default 30-second timeout”
  • README.md (lines 404–405): table entries for search_files and search_code

Please update or remove these references to keep the docs in sync with the new session-based API.

🤖 Prompt for AI Agents
In src/utils/usageTracker.ts around lines 43 to 46, the documentation references
to deprecated tools `search_files` and `search_code` are out of sync with the
new session-based API; update the docs by removing or replacing those tool
names: remove occurrences in FAQ.md (line ~271) and README.md (lines ~404-405)
or replace them with the current session-based equivalents and any relevant
timeout text, ensuring table entries and explanatory sentences reflect the new
API naming and behavior.

Comment thread test/test_improved_search_truncation.js
// Test script to verify search result truncation
import { handleSearchCode } from '../dist/handlers/edit-search-handlers.js';
// Test script to verify search result behavior using new streaming API
import { handleStartSearch, handleGetMoreSearchResults } from '../dist/handlers/search-handlers.js';

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

ReferenceError: handleStopSearch is used but not imported.

handleStopSearch is called in finally but isn't imported, which will break the test at runtime.

Apply this diff:

-import { handleStartSearch, handleGetMoreSearchResults } from '../dist/handlers/search-handlers.js';
+import { handleStartSearch, handleGetMoreSearchResults, handleStopSearch } from '../dist/handlers/search-handlers.js';
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import { handleStartSearch, handleGetMoreSearchResults } from '../dist/handlers/search-handlers.js';
import { handleStartSearch, handleGetMoreSearchResults, handleStopSearch } from '../dist/handlers/search-handlers.js';
🤖 Prompt for AI Agents
In test/test_search_truncation.js around line 2, the test calls handleStopSearch
in a finally block but the import statement only imports handleStartSearch and
handleGetMoreSearchResults; add handleStopSearch to the named import from
'../dist/handlers/search-handlers.js' so the function is available at runtime
and the test no longer throws a ReferenceError.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

♻️ Duplicate comments (3)
src/search-manager.ts (3)

293-301: Scope -i to content searches only.

-i has no effect for rg --files and can confuse behavior. Limit it to content mode.

-    // Common options
-    if (options.ignoreCase !== false) {
-      args.push('-i');
-    }
+    // Case-insensitive: only meaningful for content searches
+    if (options.searchType === 'content' && options.ignoreCase !== false) {
+      args.push('-i');
+    }

306-316: filePattern flag: use '-g' for content filters, '--glob' for file mode.

Keeps behavior consistent and avoids option collisions with --files. Also split multiple patterns by '|'.

-    if (options.filePattern) {
+    if (options.filePattern) {
       const patterns = options.filePattern
         .split('|')
         .map(p => p.trim())
         .filter(Boolean);
-      
-      patterns.forEach(pattern => {
-        args.push('-g', pattern);
-      });
+      for (const p of patterns) {
+        if (options.searchType === 'content') {
+          args.push('-g', p);
+        } else {
+          args.push('--glob', p);
+        }
+      }
     }

318-335: Content mode missing '--' separator; may treat pattern starting with '-' as a flag.

Insert '--' before PATTERN and pass PATH after it. This prevents accidental option parsing.

-    if (options.searchType === 'files') {
+    if (options.searchType === 'files') {
@@
-    } else {
-      // Content search: pattern is the search term
-      args.push(options.pattern);
-    }
-    
-    // Add the root path
-    args.push(options.rootPath);
+    } else {
+      // Content search: terminate options before the pattern
+      args.push('--', options.pattern, options.rootPath);
+      return args; // rootPath already added
+    }
+    // Add the root path (files mode)
+    args.push(options.rootPath);

Also applies to: 336-338

🧹 Nitpick comments (5)
src/search-manager.ts (5)

97-107: Timeout telemetry and default may mislead; record actual timeout used.

You set a default 1000ms for exact filenames but log hasTimeout based only on options.timeout. Capture the resolved timeout and consider a more generous default.

-    const timeoutMs = options.timeout || (this.isExactFilename(options.pattern) ? 1000 : undefined);
+    const timeoutMs = options.timeout ?? (this.isExactFilename(options.pattern) ? 1500 : undefined);
@@
-    capture('search_session_started', {
+    capture('search_session_started', {
       sessionId,
       searchType: options.searchType,
-      hasTimeout: !!options.timeout
+      hasTimeout: !!timeoutMs,
+      timeoutMs
     });

Also applies to: 109-114


115-119: Avoid arbitrary sleep before returning initial state.

Fixed waits (50/100ms) add latency and are brittle. Prefer returning immediately and let the client poll, or await the first 'data' event (with a short cap).

-    const waitTime = this.isExactFilename(options.pattern) ? 50 : 100;
-    await new Promise(resolve => setTimeout(resolve, waitTime));
+    const firstChunk = new Promise<void>(resolve => {
+      const onData = () => { session.process.stdout?.off('data', onData); resolve(); };
+      session.process.stdout?.once('data', onData);
+      setTimeout(resolve, 40); // cap
+    });
+    await firstChunk;

359-385: Double cleanup logic; unify retention policy.

You both auto-delete after 2 minutes in handlers and have cleanupSessions(maxAge=5m). Pick one source of truth to avoid surprise eviction.

-      // Auto-cleanup completed sessions after 2 minutes
-      setTimeout(() => {
-        this.sessions.delete(session.id);
-        capture('search_session_auto_cleaned', { sessionId: session.id });
-      }, 2 * 60 * 1000);
+      // Rely on cleanupSessions(maxAge) only; no per-session timer

And same for the error handler block.

Also applies to: 397-402


426-464: Content JSON parsing: handle 'summary' and missing submatches defensively.

Avoid accessing submatches[0] when empty; optionally use summary to confirm totals.

-        if (parsed.type === 'match') {
-          // Return first submatch (ripgrep can have multiple matches per line)
-          const submatch = parsed.data.submatches[0];
+        if (parsed.type === 'match') {
+          const submatch = parsed.data?.submatches?.[0];
           return {
             file: parsed.data.path.text,
             line: parsed.data.line_number,
             match: submatch?.match?.text || parsed.data.lines.text,
             type: 'content'
           };
         }
+        if (parsed.type === 'summary') {
+          // Optional: could reconcile totalMatches with parsed.data.stats?.matchedLines
+          return null;
+        }

471-493: Cleanup interval start condition may stop too early.

You start interval only when there are active (not complete) sessions. If all sessions become complete quickly, the immediate 1s check may stop the interval before cleanupSessions can retire old sessions. Consider running cleanup on a fixed schedule or track completed-but-not-cleaned sessions count.

-  if (!cleanupInterval && searchManager.getActiveSessionCount() > 0) {
+  if (!cleanupInterval) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 02ed3f9 and 428a606.

📒 Files selected for processing (1)
  • src/search-manager.ts (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/search-manager.ts (3)
src/tools/search.ts (4)
  • SearchResult (9-13)
  • searchCode (16-146)
  • code (98-144)
  • resolve (78-145)
src/tools/filesystem.ts (1)
  • validatePath (226-281)
src/utils/capture.ts (1)
  • capture (277-284)
🔇 Additional comments (1)
src/search-manager.ts (1)

7-12: Remove misleading SearchResult divergence warning
No file imports both SearchResult definitions—each lives in its own module without collision risk. Interfaces can remain as-is.

Likely an incorrect or invalid review comment.

Comment thread src/search-manager.ts
Comment on lines +134 to +171
readSearchResults(
sessionId: string,
offset: number = 0,
length: number = 100
): {
results: SearchResult[];
returnedCount: number; // Renamed from newResultsCount
totalResults: number;
isComplete: boolean;
isError: boolean;
error?: string;
hasMoreResults: boolean; // New field
runtime: number;
} {
const session = this.sessions.get(sessionId);

if (!session) {
throw new Error(`Search session ${sessionId} not found`);
}

// Get all results (excluding internal markers)
const allResults = session.results.filter(r => r.file !== '__LAST_READ_MARKER__');

// Handle negative offsets (tail behavior) - like file reading
if (offset < 0) {
const tailCount = Math.abs(offset);
const tailResults = allResults.slice(-tailCount);
return {
results: tailResults,
returnedCount: tailResults.length,
totalResults: session.totalMatches,
isComplete: session.isComplete,
isError: session.isError,
error: session.error,
hasMoreResults: false, // Tail always returns what's available
runtime: Date.now() - session.startTime
};
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Tail reads don't refresh lastReadTime, risking premature cleanup.

Update lastReadTime for all reads, including tails, to honor activity.

-    // Handle negative offsets (tail behavior) - like file reading
+    // Refresh last read time for any access
+    session.lastReadTime = Date.now();
+    // Handle negative offsets (tail behavior) - like file reading
     if (offset < 0) {
@@
-    session.lastReadTime = Date.now();
+    // lastReadTime already updated above

Also applies to: 177-179

🤖 Prompt for AI Agents
In src/search-manager.ts around lines 134-171 (and also apply the same change at
177-179), the tail-read branch returns results without updating the session's
lastReadTime which can cause the session to be cleaned up prematurely; update
session.lastReadTime = Date.now() for all read paths (including the negative
offset/tail branch) before any return, and likewise ensure the normal (non-tail)
return path and the subsequent returns at lines ~177-179 also set
session.lastReadTime so every read counts as activity.

Comment thread src/search-manager.ts
Comment on lines +302 to +305
if (options.maxResults && options.maxResults > 0) {
args.push('-m', options.maxResults.toString());
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

Apply -m only to content mode; enforce maxResults in file mode in-process.

rg -m limits matches, not files from --files. Gate it to content; for file mode, kill the process once enough paths are collected.

-    if (options.maxResults && options.maxResults > 0) {
-      args.push('-m', options.maxResults.toString());
-    }
+    if (options.searchType === 'content' && options.maxResults && options.maxResults > 0) {
+      args.push('-m', options.maxResults.toString());
+    }

And enforce for file mode in the collector:

@@
       const result = this.parseLine(line, session.options.searchType);
       if (result) {
         session.results.push(result);
         session.totalMatches++;
+        // Enforce file-mode maxResults proactively
+        if (
+          session.options.searchType === 'files' &&
+          session.options.maxResults &&
+          session.options.maxResults > 0 &&
+          session.results.filter(r => r.type === 'file').length >= session.options.maxResults &&
+          !session.process.killed
+        ) {
+          session.process.kill('SIGTERM');
+        }
       }
🤖 Prompt for AI Agents
In src/search-manager.ts around lines 302-305, the code currently
unconditionally appends the rg "-m" arg from options.maxResults; change this so
"-m" is only added when options.mode === 'content'; when options.mode === 'file'
do not pass "-m" to rg, and instead enforce the limit in-process by counting
unique file paths in the collector and terminating the ripgrep child process
once the count reaches options.maxResults (close/kill the child and stop
collecting). Also ensure the collector returns at most maxResults items
(deduplicate file paths if needed) and that any cleanup/stream listeners are
handled when you terminate the process.

Comment thread src/search-manager.ts
Comment thread src/search-manager.ts
High Priority Fixes:
- Fixed ripgrep command construction: case-insensitive flag only for content searches
- Fixed missing '--' separator for content searches to prevent pattern interpretation as flags
- Fixed different glob flag usage: use '--glob' for file mode, '-g' for content mode
- Fixed search result counting: separate matches from context lines to avoid inflated totals
- Fixed fallback search behavior: consistent exclusions between ripgrep and NodeJS fallback

Medium Priority Fixes:
- Improved timeout management: added timer cleanup and better defaults
- Better type safety: use 'unknown' instead of 'any' for handler inputs
- Improved test reliability: increased timeouts from 10s to 30s, tighter regex for session ID extraction
- Fixed API limit constants: use single source of truth (1048576) across tests
- Enhanced search result reporting: show separate match counts vs total results

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

♻️ Duplicate comments (4)
test/test_improved_search_truncation.js (1)

2-2: Import handleStopSearch to prevent ReferenceError.

handleStopSearch is used later (Line 39) but not imported.

Apply:

-import { handleStartSearch, handleGetMoreSearchResults } from '../dist/handlers/search-handlers.js';
+import { handleStartSearch, handleGetMoreSearchResults, handleStopSearch } from '../dist/handlers/search-handlers.js';
src/search-manager.ts (3)

329-331: Apply -m only in content mode and enforce maxResults in file mode by terminating rg

-m has no effect with --files. Gate it to content searches and proactively stop the process once enough file paths are collected.

@@
-    if (options.maxResults && options.maxResults > 0) {
-      args.push('-m', options.maxResults.toString());
-    }
+    if (options.searchType === 'content' && options.maxResults && options.maxResults > 0) {
+      args.push('-m', options.maxResults.toString());
+    }
@@
       if (result) {
         session.results.push(result);
         // Separate counting of matches vs context lines
         if (result.type === 'content' && line.includes('"type":"context"')) {
           session.totalContextLines++;
         } else {
           session.totalMatches++;
         }
+        // Enforce file-mode limits locally (rg -m doesn't limit --files)
+        if (
+          session.options.searchType === 'files' &&
+          session.options.maxResults &&
+          session.options.maxResults > 0 &&
+          session.totalMatches >= session.options.maxResults &&
+          !session.process.killed
+        ) {
+          session.process.kill('SIGTERM');
+        }
       }

Also applies to: 439-451


349-364: Support multi-glob patterns in file mode (a|b) via repeated --glob and reduce fuzzy matches

Split options.pattern on | and push one --glob per token; only wrap non-glob tokens with wildcards.

-    if (options.searchType === 'files') {
-      // For file search: determine how to treat the pattern
-      if (this.isExactFilename(options.pattern)) {
-        // Exact filename: use --glob with the exact pattern
-        args.push('--glob', options.pattern);
-      } else if (this.isGlobPattern(options.pattern)) {
-        // Already a glob pattern: use --glob as-is
-        args.push('--glob', options.pattern);
-      } else {
-        // Substring/fuzzy search: wrap with wildcards
-        args.push('--glob', `*${options.pattern}*`);
-      }
-      // Add the root path for file mode
-      args.push(options.rootPath);
-    } else {
+    if (options.searchType === 'files') {
+      const tokens = (options.pattern || '')
+        .split('|')
+        .map(t => t.trim())
+        .filter(Boolean);
+      for (const token of tokens.length ? tokens : [options.pattern]) {
+        if (this.isExactFilename(token) || this.isGlobPattern(token)) {
+          args.push('--glob', token);
+        } else {
+          args.push('--glob', `*${token}*`);
+        }
+      }
+      // Add the root path for file mode
+      args.push(options.rootPath);
+    } else {

173-197: Refresh lastReadTime for all reads; tail should still reflect “more to come” while session is open

Tail reads don’t update activity and set hasMoreResults=false even if the process is still running.

@@
-    const session = this.sessions.get(sessionId);
+    const session = this.sessions.get(sessionId);
@@
-    // Handle negative offsets (tail behavior) - like file reading
+    // Refresh last read time for any access
+    session.lastReadTime = Date.now();
+    // Handle negative offsets (tail behavior) - like file reading
     if (offset < 0) {
@@
-      return {
+      return {
         results: tailResults,
         returnedCount: tailResults.length,
         totalResults: session.totalMatches + session.totalContextLines,
         totalMatches: session.totalMatches, // Actual matches only
         isComplete: session.isComplete,
         isError: session.isError,
         error: session.error,
-        hasMoreResults: false, // Tail always returns what's available
+        hasMoreResults: !session.isComplete,
         runtime: Date.now() - session.startTime
       };
     }
🧹 Nitpick comments (5)
test/test_improved_search_truncation.js (3)

51-57: Reduce flakiness: bound search scope and set backend timeout.

Searching . with pattern . and maxResults: 50000 can overwhelm CI and cause timeouts. Narrow the scope and set timeout_ms.

Apply:

         const searchArgs = {
             path: '.',
             pattern: '.',  // Match almost every line - this should be a lot of results
             searchType: 'content',
             maxResults: 50000,  // Very high limit to get lots of results, but may be capped
-            ignoreCase: true
+            ignoreCase: true,
+            filePattern: '**/*.{ts,tsx,js,jsx,md,json}',
+            includeHidden: false,
+            timeout_ms: 20000
         };

11-15: Session ID regex: align with other tests or relax.

Other tests use /Started .+ session: (.+)/. Your stricter class may break if session IDs include characters beyond [A-Za-z0-9_-].

Apply:

-  const sessionIdMatch = result.content[0].text.match(/Started .* session:\s*([a-zA-Z0-9_-]+)/);
+  const sessionIdMatch = result.content[0].text.match(/Started .+ session:\s*(\S+)/);

82-88: Don’t rely on a “Results truncated” sentinel not emitted by handlers.

search-handlers don’t guarantee that phrase. Prefer checking for pagination hints or completion status.

Apply:

-        if (finalResult.content[0].text.includes('Results truncated')) {
+        if (finalResult.content[0].text.includes('Results truncated')) {
             console.log('✅ Results properly truncated with warning message');
             const truncationIndex = finalResult.content[0].text.indexOf('Results truncated');
             console.log('Truncation message:', finalResult.content[0].text.substring(truncationIndex, truncationIndex + 150));
-        } else {
-            console.log('ℹ️  Results complete, no truncation needed with new streaming API');
+        } else if (finalResult.content[0].text.includes('More results available')) {
+            console.log('ℹ️  Many results; pagination hint present.');
+        } else {
+            console.log('ℹ️  Results complete, no truncation or pagination hint.');
         }
src/search-manager.ts (2)

3-3: Remove unused import

path is unused.

-import path from 'path';

7-12: Rename SearchResult in src/search-manager.ts to avoid collision
Two distinct SearchResult interfaces exist (src/tools/search.ts vs src/search-manager.ts); rename this one to SearchStreamResult (re-export an alias for compatibility).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 428a606 and 79f9114.

📒 Files selected for processing (5)
  • src/handlers/search-handlers.ts (1 hunks)
  • src/search-manager.ts (1 hunks)
  • src/tools/search.ts (1 hunks)
  • test/test_improved_search_truncation.js (1 hunks)
  • test/test_search_truncation.js (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/tools/search.ts
  • test/test_search_truncation.js
  • src/handlers/search-handlers.ts
🧰 Additional context used
🧬 Code graph analysis (2)
src/search-manager.ts (3)
src/tools/search.ts (1)
  • SearchResult (9-13)
src/tools/filesystem.ts (1)
  • validatePath (226-281)
src/utils/capture.ts (1)
  • capture (277-284)
test/test_improved_search_truncation.js (3)
test/test-search-code.js (7)
  • result (34-34)
  • result (415-419)
  • result (442-445)
  • result (455-458)
  • sessionIdMatch (37-37)
  • sessionId (41-41)
  • sessionId (165-165)
test/test-search-code-edge-cases.js (9)
  • result (30-30)
  • result (285-290)
  • result (309-314)
  • result (328-333)
  • result (348-353)
  • result (370-374)
  • sessionIdMatch (33-33)
  • sessionId (37-37)
  • sessionId (125-125)
src/handlers/search-handlers.ts (3)
  • handleStartSearch (13-78)
  • handleGetMoreSearchResults (83-166)
  • handleStopSearch (171-206)
🪛 Biome (2.1.2)
test/test_improved_search_truncation.js

[error] 74-74: This variable is used before its declaration.

The variable is declared here:

(lint/correctness/noInvalidUseBeforeDeclaration)


[error] 76-77: This variable is used before its declaration.

The variable is declared here:

(lint/correctness/noInvalidUseBeforeDeclaration)


[error] 94-94: Shouldn't redeclare 'apiLimit'. Consider to delete it or rename it.

'apiLimit' is defined here:

(lint/suspicious/noRedeclare)

🔇 Additional comments (3)
test/test_improved_search_truncation.js (1)

109-115: process.exit in tests: confirm runner expectations.

Hard exits can bypass cleanup/reporting in some runners. If this test is executed standalone, fine; otherwise consider returning a rejected promise instead of process.exit.

src/search-manager.ts (2)

320-323: Good: -i scoped to content and -- used before the pattern

These address flag misparsing and unnecessary case folding.

Also applies to: 365-368


445-450: Good: context lines tracked separately from matches

This aligns counters with rg’s match semantics.

Comment thread src/search-manager.ts
Comment on lines +144 to +151
return {
sessionId,
isComplete: session.isComplete,
isError: session.isError,
results: [...session.results],
totalResults: session.totalMatches,
runtime: Date.now() - session.startTime
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Align totalResults semantics across startSearch/read/list to avoid confusion

startSearch returns matches-only, while readSearchResults and listSearchSessions include context lines. Return the same aggregate everywhere.

   return {
     sessionId,
     isComplete: session.isComplete,
     isError: session.isError,
     results: [...session.results],
-    totalResults: session.totalMatches,
+    totalResults: session.totalMatches + session.totalContextLines,
     runtime: Date.now() - session.startTime
   };

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In src/search-manager.ts around lines 144 to 151, startSearch currently returns
totalResults as session.totalMatches (matches-only) while readSearchResults and
listSearchSessions return an aggregate that includes context lines; change
startSearch to return the same aggregate by using the aggregate field (e.g.
session.totalResults) instead of session.totalMatches, and ensure all three
functions consistently use the same field name and value (or compute the
aggregate the same way) so totalResults semantics match across
startSearch/read/list.

Comment thread src/search-manager.ts
Comment on lines +381 to +388
process.stderr?.on('data', (data: Buffer) => {
const errorText = data.toString();
session.error = (session.error || '') + errorText;
capture('search_session_error', {
sessionId: session.id,
error: errorText.substring(0, 200) // Limit error length for telemetry
});
});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Sanitize error telemetry (paths/patterns can leak PII to GA)

Avoid sending raw stderr (absolute paths, query patterns). Sanitize before storing/sending.

-    process.stderr?.on('data', (data: Buffer) => {
-      const errorText = data.toString();
-      session.error = (session.error || '') + errorText;
-      capture('search_session_error', { 
-        sessionId: session.id,
-        error: errorText.substring(0, 200) // Limit error length for telemetry
-      });
-    });
+    process.stderr?.on('data', (data: Buffer) => {
+      const errorText = data.toString();
+      const sanitized = this.sanitizeError(errorText, session.options.rootPath, session.options.pattern);
+      session.error = (session.error || '') + sanitized;
+      capture('search_session_error', {
+        sessionId: session.id,
+        error: sanitized.substring(0, 200)
+      });
+    });

Add this helper inside the class:

private sanitizeError(text: string, rootPath?: string, pattern?: string): string {
  let t = text.replaceAll('\u0000', '');
  if (rootPath) {
    // Collapse absolute paths under the validated root
    const normRoot = rootPath.replace(/[-/\\^$*+?.()|[\]{}]/g, '\\$&');
    t = t.replace(new RegExp(normRoot + '[^\\s"]*', 'g'), '<path>');
  }
  if (pattern) {
    const escaped = pattern.replace(/[-/\\^$*+?.()|[\]{}]/g, '\\$&');
    t = t.replace(new RegExp(escaped, 'g'), '<pattern>');
  }
  // Drop overly verbose permission lines
  return t.slice(0, 2000);
}
🤖 Prompt for AI Agents
In src/search-manager.ts around lines 381 to 388, raw stderr is being appended
to session.error and sent to telemetry which may leak absolute paths or query
patterns; add a private sanitizeError(text: string, rootPath?: string, pattern?:
string): string helper inside the class that removes null chars, collapses any
occurrences of a validated rootPath to '<path>' (escaping regex metacharacters),
replaces occurrences of a provided pattern with '<pattern>' (escaped), and trims
the result to a safe maximum length (e.g. 2000 chars); then call this sanitizer
before appending to session.error and before capture(), and still send only the
first 200 chars to telemetry (capture uses sanitizedText.substring(0,200)).

Comment on lines +8 to +16
const result = await handleStartSearch(searchArgs);

// Extract session ID from result with tighter regex
const sessionIdMatch = result.content[0].text.match(/Started .* session:\s*([a-zA-Z0-9_-]+)/);
if (!sessionIdMatch) {
throw new Error('Could not extract session ID from search result');
}
const sessionId = sessionIdMatch[1];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Guard against start_search failures.

If handleStartSearch returns isError, the test keeps going and fails later with a vague error.

Apply:

   const result = await handleStartSearch(searchArgs);
+  if (result.isError) {
+    throw new Error(`start_search failed: ${result.content?.[0]?.text ?? 'Unknown error'}`);
+  }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const result = await handleStartSearch(searchArgs);
// Extract session ID from result with tighter regex
const sessionIdMatch = result.content[0].text.match(/Started .* session:\s*([a-zA-Z0-9_-]+)/);
if (!sessionIdMatch) {
throw new Error('Could not extract session ID from search result');
}
const sessionId = sessionIdMatch[1];
const result = await handleStartSearch(searchArgs);
if (result.isError) {
throw new Error(
`start_search failed: ${result.content?.[0]?.text ?? 'Unknown error'}`
);
}
// Extract session ID from result with tighter regex
const sessionIdMatch = result.content[0].text.match(
/Started .* session:\s*([a-zA-Z0-9_-]+)/
);
if (!sessionIdMatch) {
throw new Error('Could not extract session ID from search result');
}
const sessionId = sessionIdMatch[1];
🤖 Prompt for AI Agents
In test/test_improved_search_truncation.js around lines 8 to 16, the test
assumes handleStartSearch succeeded; add an early guard that checks if
result.isError (or equivalent failure flag) and immediately throw or fail the
test with the error message/details from result to stop execution and produce a
clear failure; then proceed to extract sessionId as before, but also validate
result.content and result.content[0].text exist before running the regex and
throw an explicit error if they are missing.

Comment on lines +19 to +33
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const moreResults = await handleGetMoreSearchResults({ sessionId });

if (moreResults.content[0].text.includes('✅ Search completed')) {
return { initialResult: result, finalResult: moreResults, sessionId };
}

if (moreResults.content[0].text.includes('❌ ERROR')) {
throw new Error(`Search failed: ${moreResults.content[0].text}`);
}

// Wait a bit before polling again
await new Promise(resolve => setTimeout(resolve, 100));
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Poll with explicit tail pagination and check isError.

get_more schema typically expects offset and length. Relying on implicit defaults is brittle; also checking for '❌ ERROR' text is unreliable—use isError.

Apply:

-    // Wait for completion by polling
-    const startTime = Date.now();
+    // Wait for completion by polling (tail last N results)
+    const startTime = Date.now();
+    const length = 200;
+    const offset = -length;

     while (Date.now() - startTime < timeout) {
-      const moreResults = await handleGetMoreSearchResults({ sessionId });
+      const moreResults = await handleGetMoreSearchResults({ sessionId, offset, length });
       
-      if (moreResults.content[0].text.includes('✅ Search completed')) {
+      if (moreResults.content[0].text.includes('✅ Search completed')) {
         return { initialResult: result, finalResult: moreResults, sessionId };
       }
       
-      if (moreResults.content[0].text.includes('❌ ERROR')) {
-        throw new Error(`Search failed: ${moreResults.content[0].text}`);
-      }
+      if (moreResults.isError) {
+        throw new Error(`get_more_search_results failed: ${moreResults.content?.[0]?.text ?? 'Unknown error'}`);
+      }
       
       // Wait a bit before polling again
-      await new Promise(resolve => setTimeout(resolve, 100));
+      await new Promise(resolve => setTimeout(resolve, 200));
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const startTime = Date.now();
while (Date.now() - startTime < timeout) {
const moreResults = await handleGetMoreSearchResults({ sessionId });
if (moreResults.content[0].text.includes('✅ Search completed')) {
return { initialResult: result, finalResult: moreResults, sessionId };
}
if (moreResults.content[0].text.includes('❌ ERROR')) {
throw new Error(`Search failed: ${moreResults.content[0].text}`);
}
// Wait a bit before polling again
await new Promise(resolve => setTimeout(resolve, 100));
}
// Wait for completion by polling (tail last N results)
const startTime = Date.now();
const length = 200;
const offset = -length;
while (Date.now() - startTime < timeout) {
const moreResults = await handleGetMoreSearchResults({ sessionId, offset, length });
if (moreResults.content[0].text.includes('✅ Search completed')) {
return { initialResult: result, finalResult: moreResults, sessionId };
}
if (moreResults.isError) {
throw new Error(`get_more_search_results failed: ${moreResults.content?.[0]?.text ?? 'Unknown error'}`);
}
// Wait a bit before polling again
await new Promise(resolve => setTimeout(resolve, 200));
}
🤖 Prompt for AI Agents
In test/test_improved_search_truncation.js around lines 19 to 33, the polling
loop calls handleGetMoreSearchResults without explicit tail pagination and
checks for errors by inspecting result text; update the call to pass explicit
pagination parameters (e.g., include offset and length or a tail object as the
get_more schema requires) so we don't rely on implicit defaults, and replace the
text-based error detection with checking the returned isError flag (throw if
isError is true); keep the existing completion check but reference the correct
returned fields after adding pagination and isError checks.

Comment on lines +71 to +77
const apiLimit = 1048576; // 1 MiB - use consistent constant

// Check if we're within the safe limits using single source of truth
if (totalLength > apiLimit) {
console.log('❌ Results still quite large - over 1MB combined');
} else if (totalLength > Math.floor(0.8 * apiLimit)) {
console.log('⚠️ Results approaching limits but acceptable');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix duplicate/redeclared apiLimit and “use before declaration” lints.

apiLimit is declared twice (Lines 71, 94), triggering Biome errors. Define once and reuse.

Apply:

-        const apiLimit = 1048576; // 1 MiB - use consistent constant
+        const API_LIMIT_BYTES = 1024 * 1024; // 1 MiB

-        if (totalLength > apiLimit) {
+        if (totalLength > API_LIMIT_BYTES) {
-        } else if (totalLength > Math.floor(0.8 * apiLimit)) {
+        } else if (totalLength > Math.floor(0.8 * API_LIMIT_BYTES)) {
-        const apiLimit = 1048576; // 1MB API limit
-        const safetyMargin = apiLimit - totalLength;
+        // reuse API_LIMIT_BYTES defined above
+        const safetyMargin = API_LIMIT_BYTES - totalLength;
-        console.log(`   API limit: ${apiLimit.toLocaleString()} characters`);
+        console.log(`   API limit: ${API_LIMIT_BYTES.toLocaleString()} characters`);
         console.log(`   Safety margin: ${safetyMargin.toLocaleString()} characters`);
-        console.log(`   Utilization: ${((totalLength / apiLimit) * 100).toFixed(1)}%`);
+        console.log(`   Utilization: ${((totalLength / API_LIMIT_BYTES) * 100).toFixed(1)}%`);

Also applies to: 94-103

🧰 Tools
🪛 Biome (2.1.2)

[error] 74-74: This variable is used before its declaration.

The variable is declared here:

(lint/correctness/noInvalidUseBeforeDeclaration)


[error] 76-77: This variable is used before its declaration.

The variable is declared here:

(lint/correctness/noInvalidUseBeforeDeclaration)

🤖 Prompt for AI Agents
In test/test_improved_search_truncation.js around lines 71-77 and 94-103,
apiLimit is declared twice causing duplicate/redeclare and
use-before-declaration lint errors; remove the second declaration and define
apiLimit once (e.g., at the top of the test or before its first use), then
update all references to reuse that single constant so both checks use the same
apiLimit and no variable is redeclared or referenced before initialization.

@wonderwhy-er wonderwhy-er merged commit 227050b into main Aug 28, 2025
2 checks passed
@wonderwhy-er wonderwhy-er deleted the refactor-file-search branch September 10, 2025 08:49
@coderabbitai coderabbitai Bot mentioned this pull request Sep 12, 2025
@coderabbitai coderabbitai Bot mentioned this pull request Nov 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant