Fix file pattern multiple values#221
Conversation
…*.ts|*.js|*.py|*.java|*.go`
…t contains multiple file extensions separated via `|`
WalkthroughUpdated search file-pattern handling to support multiple patterns separated by '|', with trimming and empty-token filtering. The CLI path passes multiple -g arguments; the fallback builds a combined full-match regex. Added edge-case tests covering multiple, whitespace-padded, empty-token, and empty-only patterns. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant U as Caller
participant H as handleSearchCode
participant S as searchCode
participant RG as rg (ripgrep)
participant F as Fallback (FS+Regex)
U->>H: handleSearchCode(query, filePattern="*.ts|*.js|*.py")
H->>S: searchCode(query, filePattern)
Note over S: Split on '|' → trim → drop empties
alt Valid patterns exist
S->>RG: rg ... -g *.ts -g *.js -g *.py
RG-->>S: Matches
S-->>H: Results
else No valid patterns
S-->>H: []
end
opt rg error/unsupported
H->>F: Fallback search
Note over F: Build ^(pattern1\|pattern2\|...)$ regex<br/>Convert \* → .*, ? → .
F-->>H: Filtered results
end
H-->>U: Results
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
src/tools/search.ts (1)
61-71: Multiple -g patterns: good; consider de-duping inputsSplitting on '|' and trimming/ignoring empties is spot on for rg's -g semantics. To avoid redundant work (and noisy args) when callers repeat patterns, de-duplicate after filtering.
Apply this minimal change:
- const patterns = filePattern - .split('|') - .map(p => p.trim()) // remove surrounding spaces - .filter(Boolean); // drop empty tokens + const patterns = [ + ...new Set( + filePattern + .split('|') + .map(p => p.trim()) // remove surrounding spaces + .filter(Boolean) // drop empty tokens + ), + ];test/test-search-code-edge-cases.js (1)
364-424: Good coverage for multi-value filePattern parsingThis exercises the intended behaviors: multiple values, whitespace tolerance, empty-token filtering, and “all-empty” yielding no matches. Nicely scoped and readable.
Consider adding a couple of patterns that include:
?and character classes[](glob features users commonly try).- Nested path globs (
sub/**/file?.ext) to catch the fallback’s path-vs-name handling.This will harden the suite against regressions when the fallback is used.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/tools/search.ts(4 hunks)test/test-search-code-edge-cases.js(2 hunks)
🧰 Additional context used
🧬 Code graph analysis (2)
test/test-search-code-edge-cases.js (3)
test/run-all-tests.js (2)
colors(16-25)result(140-140)test/test-search-code.js (24)
colors(22-28)result(136-139)result(160-164)result(179-183)result(200-204)result(221-225)result(263-267)result(287-291)result(310-314)result(333-336)result(355-358)result(409-412)text(145-145)text(166-166)text(185-185)text(206-206)text(231-231)text(269-269)text(293-293)text(319-319)text(342-342)text(362-362)text(414-414)testFilePatternFiltering(196-212)src/handlers/edit-search-handlers.ts (1)
handleSearchCode(25-112)
src/tools/search.ts (1)
test/test-search-code.js (1)
testFilePatternFiltering(196-212)
🪛 ast-grep (0.38.6)
src/tools/search.ts
[warning] 188-188: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^(${combinedPattern})$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
🔇 Additional comments (2)
src/tools/search.ts (1)
171-190: Add tests to lock in path-aware globs and glob featuresTo prevent regressions and ensure parity with ripgrep, add edge-case tests that exercise:
- Path globs:
subdir/**/*.tsand**/file?.js- Character classes:
file[0-3].txt- Mixed multi-patterns:
*.ts|**/*.spec.js|src/**/index.*Here’s a small addition to your edge-case suite (illustrative):
// e.g., in testFilePatternWithMultipleValues() await fs.mkdir(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b'), { recursive: true }); await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b/deep.ts'), 'const x = "pattern";'); await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'sub/a/b/file2.js'), 'const y = "pattern";'); await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'file1.txt'), 'pattern'); await fs.writeFile(path.join(EDGE_CASE_TEST_DIR, 'file2.txt'), 'pattern'); let r = await handleSearchCode({ path: EDGE_CASE_TEST_DIR, pattern: 'pattern', filePattern: 'sub/**/*.ts|**/file?.js' }); let t = r.content[0].text; assert(t.includes('sub/a/b/deep.ts'), 'Should match deep path TypeScript file'); assert(t.includes('sub/a/b/file2.js'), 'Should match file? pattern for JS');test/test-search-code-edge-cases.js (1)
451-452: Integration into the edge-case suite looks goodIncluding this test in the main runner ensures it executes with the rest of the edge cases. No issues spotted.
|
|
||
| // Handle filePattern similarly to main implementation | ||
| let fileRegex: RegExp | null = null; | ||
| if (filePattern) { | ||
| const patterns = filePattern | ||
| .split('|') | ||
| .map(p => p.trim()) | ||
| .filter(Boolean); | ||
|
|
||
| // If all patterns were empty, return no results | ||
| if (patterns.length === 0) { | ||
| return []; | ||
| } | ||
|
|
||
| // Create a regex that matches any of the patterns | ||
| const combinedPattern = patterns.map(p => | ||
| p.replace(/\./g, '\\.').replace(/\*/g, '.*').replace(/\?/g, '.') | ||
| ).join('|'); | ||
| fileRegex = new RegExp(`^(${combinedPattern})$`); | ||
| } |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Fallback glob handling: ReDoS risk and mismatch with ripgrep path semantics
The fallback currently synthesizes a single regex from user-supplied globs. Two problems:
- Security/DoS: Variable-driven regex with partial escaping can be exploited for catastrophic backtracking (ReDoS), and allows unintended regex metacharacters (e.g.,
+,{},[],()), as flagged by static analysis. - Behavior mismatch: ripgrep’s -g matches against the full relative path. The fallback only tests
entry.name, so patterns likesrc/**/*.tswon’t work consistently when the fallback is used.
Prefer using a robust glob matcher (e.g., picomatch/micromatch) and match against the file’s relative path from rootPath. This removes the need for hand-rolled regex and aligns behavior with rg.
Apply this focused refactor within this hunk:
- // Handle filePattern similarly to main implementation
- let fileRegex: RegExp | null = null;
+ // Handle filePattern similarly to main implementation
+ // Use a safe glob matcher instead of constructing a regex from user input
+ let fileMatcher: ((relPath: string) => boolean) | null = null;
if (filePattern) {
const patterns = filePattern
.split('|')
.map(p => p.trim())
.filter(Boolean);
// If all patterns were empty, return no results
if (patterns.length === 0) {
return [];
}
- // Create a regex that matches any of the patterns
- const combinedPattern = patterns.map(p =>
- p.replace(/\./g, '\\.').replace(/\*/g, '.*').replace(/\?/g, '.')
- ).join('|');
- fileRegex = new RegExp(`^(${combinedPattern})$`);
+ // Defer to a glob engine for correctness and safety
+ // Note: match against normalized relative paths to mimic ripgrep -g behavior
+ const picomatch = (await import('picomatch')).default;
+ const normalized = patterns.map(p => p.replaceAll('\\', '/'));
+ const isMatch = picomatch(normalized, { nocase: false }); // keep case-sensitive unless you intentionally want otherwise
+ fileMatcher = (relPath: string) => isMatch(relPath.replaceAll('\\', '/'));
}And update the usage a few lines below to test the relative path (example snippet; adjust exact placement accordingly):
// near lines ~201-213 where files are handled
const relPath = path.relative(validPath, fullPath);
// If no matcher, accept all; else match by relative path (mimics rg -g)
if (!fileMatcher || fileMatcher(relPath)) {
const content = await fs.readFile(fullPath, 'utf-8');
// ...
}Optionally, if you want parity with the includeHidden flag, plumb it into searchCodeFallback and pass { dot: includeHidden === true } to picomatch.
This removes the ReDoS vector, aligns behavior with ripgrep, and supports path-aware globs like src/**/*.ts.
🧰 Tools
🪛 ast-grep (0.38.6)
[warning] 188-188: Regular expression constructed from variable input detected. This can lead to Regular Expression Denial of Service (ReDoS) attacks if the variable contains malicious patterns. Use libraries like 'recheck' to validate regex safety or use static patterns.
Context: new RegExp(^(${combinedPattern})$)
Note: [CWE-1333] Inefficient Regular Expression Complexity [REFERENCES]
- https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS
- https://cwe.mitre.org/data/definitions/1333.html
(regexp-from-variable)
Based on #159
Summary by CodeRabbit
New Features
Tests