Skip to content

Fix file pattern multiple values#221

Merged
wonderwhy-er merged 6 commits into
mainfrom
fix-file-pattern-multiple-values
Aug 22, 2025
Merged

Fix file pattern multiple values#221
wonderwhy-er merged 6 commits into
mainfrom
fix-file-pattern-multiple-values

Conversation

@wonderwhy-er

@wonderwhy-er wonderwhy-er commented Aug 22, 2025

Copy link
Copy Markdown
Owner

Based on #159

Summary by CodeRabbit

  • New Features

    • Search now supports multiple file patterns separated by “|”. Whitespace is trimmed, empty entries are ignored, and if no valid patterns remain, no results are returned. Behavior is consistent across all search modes.
  • Tests

    • Added edge-case tests validating multiple pattern parsing (including whitespace and empty tokens) to ensure accurate file matching across supported extensions.

@coderabbitai

coderabbitai Bot commented Aug 22, 2025

Copy link
Copy Markdown
Contributor

Walkthrough

Updated search file-pattern handling to support multiple patterns separated by '|', with trimming and empty-token filtering. The CLI path passes multiple -g arguments; the fallback builds a combined full-match regex. Added edge-case tests covering multiple, whitespace-padded, empty-token, and empty-only patterns.

Changes

Cohort / File(s) Summary
Search tool: multi-pattern support
src/tools/search.ts
Refactored filePattern parsing to split on '|', trim, and drop empties. For rg, add one -g per non-empty pattern. Fallback builds a single ^(...)$ regex converting wildcards (* -> .* and ? -> .). Early return when no valid patterns. Minor formatting tweaks.
Edge-case tests
test/test-search-code-edge-cases.js
Added tests validating multiple file patterns, whitespace handling, empty tokens, and empty-only cases. Integrated into main edge-case runner.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant U as Caller
  participant H as handleSearchCode
  participant S as searchCode
  participant RG as rg (ripgrep)
  participant F as Fallback (FS+Regex)

  U->>H: handleSearchCode(query, filePattern="*.ts|*.js|*.py")
  H->>S: searchCode(query, filePattern)
  Note over S: Split on '|' → trim → drop empties
  alt Valid patterns exist
    S->>RG: rg ... -g *.ts -g *.js -g *.py
    RG-->>S: Matches
    S-->>H: Results
  else No valid patterns
    S-->>H: []
  end
  opt rg error/unsupported
    H->>F: Fallback search
    Note over F: Build ^(pattern1\|pattern2\|...)$ regex<br/>Convert \* → .*, ? → .
    F-->>H: Filtered results
  end
  H-->>U: Results
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

I twitch my ears at pipes that gleam, |
Split and trim—a tidy dream.
With many * I hop through code,
And ?s dot my hopping road.
When ripgrep naps, I regex-scout—
A clever bun who sniffs it out. 🐇✨

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix-file-pattern-multiple-values

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
src/tools/search.ts (1)

61-71: Multiple -g patterns: good; consider de-duping inputs

Splitting on '|' and trimming/ignoring empties is spot on for rg's -g semantics. To avoid redundant work (and noisy args) when callers repeat patterns, de-duplicate after filtering.

Apply this minimal change:

-    const patterns = filePattern
-      .split('|')
-      .map(p => p.trim())      // remove surrounding spaces
-      .filter(Boolean);        // drop empty tokens
+    const patterns = [
+      ...new Set(
+        filePattern
+          .split('|')
+          .map(p => p.trim())  // remove surrounding spaces
+          .filter(Boolean)     // drop empty tokens
+      ),
+    ];
test/test-search-code-edge-cases.js (1)

364-424: Good coverage for multi-value filePattern parsing

This exercises the intended behaviors: multiple values, whitespace tolerance, empty-token filtering, and “all-empty” yielding no matches. Nicely scoped and readable.

Consider adding a couple of patterns that include:

  • ? and character classes [] (glob features users commonly try).
  • Nested path globs (sub/**/file?.ext) to catch the fallback’s path-vs-name handling.

This will harden the suite against regressions when the fallback is used.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 218531b and efeb5d0.

📒 Files selected for processing (2)
  • src/tools/search.ts (4 hunks)
  • test/test-search-code-edge-cases.js (2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/test-search-code-edge-cases.js (3)
test/run-all-tests.js (2)
  • colors (16-25)
  • result (140-140)
test/test-search-code.js (24)
  • colors (22-28)
  • result (136-139)
  • result (160-164)
  • result (179-183)
  • result (200-204)
  • result (221-225)
  • result (263-267)
  • result (287-291)
  • result (310-314)
  • result (333-336)
  • result (355-358)
  • result (409-412)
  • text (145-145)
  • text (166-166)
  • text (185-185)
  • text (206-206)
  • text (231-231)
  • text (269-269)
  • text (293-293)
  • text (319-319)
  • text (342-342)
  • text (362-362)
  • text (414-414)
  • testFilePatternFiltering (196-212)
src/handlers/edit-search-handlers.ts (1)
  • handleSearchCode (25-112)
src/tools/search.ts (1)
test/test-search-code.js (1)
  • testFilePatternFiltering (196-212)
🪛 ast-grep (0.38.6)
src/tools/search.ts

[warning] 188-188: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^(${combinedPattern})$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

🔇 Additional comments (2)
src/tools/search.ts (1)

171-190: Add tests to lock in path-aware globs and glob features

To prevent regressions and ensure parity with ripgrep, add edge-case tests that exercise:

  • Path globs: subdir/**/*.ts and **/file?.js
  • Character classes: file[0-3].txt
  • Mixed multi-patterns: *.ts|**/*.spec.js|src/**/index.*

Here’s a small addition to your edge-case suite (illustrative):

// e.g., in testFilePatternWithMultipleValues()
await fs.mkdir(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b'), { recursive: true });
await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b/deep.ts'), 'const x = "pattern";');
await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b/file2.js'), 'const y = "pattern";');
await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'file1.txt'), 'pattern');
await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'file2.txt'), 'pattern');

let r = await handleSearchCode({
  path: EDGE_CASE_TEST_DIR,
  pattern: 'pattern',
  filePattern: 'sub/**/*.ts|**/file?.js'
});
let t = r.content[0].text;
assert(t.includes('sub/a/b/deep.ts'), 'Should match deep path TypeScript file');
assert(t.includes('sub/a/b/file2.js'), 'Should match file? pattern for JS');
test/test-search-code-edge-cases.js (1)

451-452: Integration into the edge-case suite looks good

Including this test in the main runner ensures it executes with the rest of the edge cases. No issues spotted.

Comment thread src/tools/search.ts
Comment on lines +171 to +190

// Handle filePattern similarly to main implementation
let fileRegex: RegExp | null = null;
if (filePattern) {
const patterns = filePattern
.split('|')
.map(p => p.trim())
.filter(Boolean);

// If all patterns were empty, return no results
if (patterns.length === 0) {
return [];
}

// Create a regex that matches any of the patterns
const combinedPattern = patterns.map(p =>
p.replace(/\./g, '\\.').replace(/\*/g, '.*').replace(/\?/g, '.')
).join('|');
fileRegex = new RegExp(`^(${combinedPattern})$`);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Fallback glob handling: ReDoS risk and mismatch with ripgrep path semantics

The fallback currently synthesizes a single regex from user-supplied globs. Two problems:

  • Security/DoS: Variable-driven regex with partial escaping can be exploited for catastrophic backtracking (ReDoS), and allows unintended regex metacharacters (e.g., +, {}, [], ()), as flagged by static analysis.
  • Behavior mismatch: ripgrep’s -g matches against the full relative path. The fallback only tests entry.name, so patterns like src/**/*.ts won’t work consistently when the fallback is used.

Prefer using a robust glob matcher (e.g., picomatch/micromatch) and match against the file’s relative path from rootPath. This removes the need for hand-rolled regex and aligns behavior with rg.

Apply this focused refactor within this hunk:

-  // Handle filePattern similarly to main implementation
-  let fileRegex: RegExp | null = null;
+  // Handle filePattern similarly to main implementation
+  // Use a safe glob matcher instead of constructing a regex from user input
+  let fileMatcher: ((relPath: string) => boolean) | null = null;
   if (filePattern) {
     const patterns = filePattern
       .split('|')
       .map(p => p.trim())
       .filter(Boolean);
 
     // If all patterns were empty, return no results
     if (patterns.length === 0) {
       return [];
     }
 
-    // Create a regex that matches any of the patterns
-    const combinedPattern = patterns.map(p => 
-      p.replace(/\./g, '\\.').replace(/\*/g, '.*').replace(/\?/g, '.')
-    ).join('|');
-    fileRegex = new RegExp(`^(${combinedPattern})$`);
+    // Defer to a glob engine for correctness and safety
+    // Note: match against normalized relative paths to mimic ripgrep -g behavior
+    const picomatch = (await import('picomatch')).default;
+    const normalized = patterns.map(p => p.replaceAll('\\', '/'));
+    const isMatch = picomatch(normalized, { nocase: false }); // keep case-sensitive unless you intentionally want otherwise
+    fileMatcher = (relPath: string) => isMatch(relPath.replaceAll('\\', '/'));
   }

And update the usage a few lines below to test the relative path (example snippet; adjust exact placement accordingly):

// near lines ~201-213 where files are handled
const relPath = path.relative(validPath, fullPath);
// If no matcher, accept all; else match by relative path (mimics rg -g)
if (!fileMatcher || fileMatcher(relPath)) {
  const content = await fs.readFile(fullPath, 'utf-8');
  // ...
}

Optionally, if you want parity with the includeHidden flag, plumb it into searchCodeFallback and pass { dot: includeHidden === true } to picomatch.

This removes the ReDoS vector, aligns behavior with ripgrep, and supports path-aware globs like src/**/*.ts.

🧰 Tools
🪛 ast-grep (0.38.6)

[warning] 188-188: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^(${combinedPattern})$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html

(regexp-from-variable)

@wonderwhy-er wonderwhy-er merged commit 9189937 into main Aug 22, 2025
2 checks passed
@wonderwhy-er wonderwhy-er deleted the fix-file-pattern-multiple-values branch September 10, 2025 08:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant