feat: DOCX support via existing read_file, write_file, edit_block, and start_search tools#341
Conversation
- Add DocxFileHandler (src/utils/files/docx.ts) - read_file default: returns text-bearing outline (paragraphs, tables, images, headers/footers) - read_file with offset/length: returns raw pretty-printed XML with line pagination - edit_block: find/replace on pretty-printed XML, compact + repack into valid DOCX - Also searches headers/footers when edit target not found in document body - Register DocxFileHandler in factory.ts (priority before binary detection) - Route .docx files through DocxFileHandler.editRange in edit.ts - Update read_file and edit_block tool descriptions with DOCX usage - Add isDocx to FileMetadata in base.ts - Add pizzip dependency for DOCX ZIP handling No new tools added — DOCX works through the same read_file/edit_block workflow that already exists for text, Excel, and PDF files.
Only switch to raw XML mode when offset is explicitly non-zero. Previously, the default fileReadLineLimit (1000) passed as length would incorrectly trigger raw XML mode for large documents.
…y w:t The regex <w:t[^>]*> matched any tag starting with 'w:t' including w:tbl, w:tc, w:tr etc. Fixed to <w:t(?:\s[^>]*)?> which requires the tag name to end at > or whitespace. Also added full-document text preview at top of outline so content is always visible regardless of XML nesting depth (text boxes, etc).
The regex fix made the outline show text correctly in all elements including table cells with text boxes. A flat text dump loses the structural awareness that makes the outline useful for editing.
- Paragraphs and SDTs now show <w:t>text</w:t> fragments instead of plain text, so Claude can copy them directly into edit_block calls - Table cells still show plain text for readability - Updated tool description: offset must be non-zero for raw XML mode - Added extractTextFragments() helper
Extracts <w:t> text from document.xml, headers, and footers, then searches with regex matching. Runs async alongside ripgrep, same pattern as searchExcelFiles. Triggered when filePattern includes *.docx or rootPath is a .docx file.
- Show all table rows instead of truncating at 4 (tables contain real content) - Increase cell text preview to 60 chars, paragraph text to 500 chars - Update outline header: show edit_block example with <w:t> syntax - Add guidance for bulk operations (translation): use Python script - Clarify raw XML access requires offset=1
write_file('doc.docx', content) creates a valid DOCX file with:
- Markdown headings (#, ##, ###) converted to Word heading styles
- Plain text lines as Normal paragraphs
- Empty lines as empty paragraphs
- Proper Calibri font defaults and page margins
- Complete ZIP structure with styles.xml, content types, relationships
Updated write_file tool description to reflect DOCX creation support.
|
CodeAnt AI is reviewing your PR. Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
📝 WalkthroughWalkthroughThis change adds DOCX support: a new runtime dependency ( Changes
Sequence Diagram(s)sequenceDiagram
actor Client
participant SearchTool
participant SearchManager
participant DocxHandler
participant FileSystem
Client->>SearchTool: submit search query
SearchTool->>SearchManager: startSearch(pattern, options)
SearchManager->>SearchManager: shouldIncludeDocxSearch()
alt DOCX search enabled
SearchManager->>SearchManager: findDocxFiles(root)
loop per found file
SearchManager->>DocxHandler: searchDocxFiles(file, pattern)
DocxHandler->>FileSystem: read .docx (binary)
FileSystem-->>DocxHandler: docx buffer
DocxHandler->>DocxHandler: extract document.xml, headers/footers
DocxHandler->>DocxHandler: parse XML, apply regex, build matches
DocxHandler-->>SearchManager: return matches
end
end
SearchManager->>SearchTool: merge results (text/excel/docx)
SearchTool-->>Client: return aggregated results
sequenceDiagram
actor Client
participant EditTool
participant EditHandler
participant DocxHandler
participant FileSystem
Client->>EditTool: request edit (path, old_string/new_string or range+content)
EditTool->>EditHandler: handleEditBlock(...)
EditHandler->>EditHandler: resolve handler & hasEditRange?
alt handler supports editRange (.docx)
EditHandler->>DocxHandler: editRange(path, range|payload, content)
DocxHandler->>FileSystem: read .docx
FileSystem-->>DocxHandler: docx buffer
DocxHandler->>DocxHandler: pretty-print XML, apply find/replace
DocxHandler->>FileSystem: write updated .docx
FileSystem-->>DocxHandler: write result
DocxHandler-->>EditHandler: EditResult
else fallback
EditHandler->>EditHandler: performSearchReplace(...) (legacy flow)
end
EditHandler-->>EditTool: result/confirmation
EditTool-->>Client: reply
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Sequence DiagramThe PR adds a DocxFileHandler and wires it into read_file, edit_block, and start_search so .docx files are exposed as a readable outline or raw pretty-printed XML, editable via XML find/replace, and searchable by extracting <w:t> text from DOCX parts. sequenceDiagram
participant Client
participant Server
participant DocxHandler
participant ZIP
participant FS
%% Read flow (default outline)
Client->>Server: read_file(path.docx, offset=0)
Server->>DocxHandler: handle read(path)
DocxHandler->>ZIP: unzip & load word/document.xml
DocxHandler-->>Server: return Outline (paragraphs, tables, headers/footers)
Server-->>Client: 200 OK (outline)
%% Edit flow (XML find/replace)
Client->>Server: edit_block(file.docx, old_string, new_string)
Server->>DocxHandler: editRange(path, old_string, new_string)
DocxHandler->>ZIP: pretty-print target XML part, apply replacement, compact & repack
DocxHandler->>FS: write updated .docx
DocxHandler-->>Server: editsApplied / success
Server-->>Client: Edit result
%% Search flow (content)
Client->>Server: start_search(root or pattern *.docx, regex)
Server->>DocxHandler: searchDocxFiles(root, pattern)
DocxHandler->>ZIP: read document/header/footer XML parts
DocxHandler-->>Server: matches (file:part, line, snippet)
Server-->>Client: aggregated search results
Generated by CodeAnt AI |
Nitpicks 🔍
|
There was a problem hiding this comment.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.
In `@src/search-manager.ts`:
- Line 8: Remove the top-level PizZip import and instead dynamically import it
inside searchDocxFiles just like ExcelJS is imported; update searchDocxFiles to
await import('pizzip') (e.g., const { default: PizZip } = await
import('pizzip')) before using PizZip so the ZIP library is only loaded when
docx search runs and startup costs remain minimal.
- Around line 610-643: The findDocxFiles method duplicates logic from
findExcelFiles; extract a shared helper like findFilesByExtension(rootPath:
string, isMatch: (name: string) => boolean) that contains the recursive walk,
hidden/node_modules skip, and single-file root handling, then refactor
findDocxFiles and findExcelFiles to call that helper (use the existing isDocx
and corresponding isExcel predicates when invoking the new function).
In `@src/tools/edit.ts`:
- Around line 421-451: The DOCX branch can fall through if the handler lacks an
editRange method, causing later performSearchReplace to treat a .docx as plain
text; after obtaining handler from getFileHandler(validatedPath) add a defensive
return when 'editRange' is not in handler or not a function (e.g., return
createErrorResponse or a structured failure) so execution exits the
.docx-specific branch instead of continuing to performSearchReplace; reference
parsed.file_path, validatePath, getFileHandler, handler and editRange to locate
the exact spot to add this early return.
In `@src/utils/files/docx.ts`:
- Around line 569-606: The write method in docx.ts generates Heading1–Heading6
paragraph styles from the /^(#{1,6})\s+(.+)/ regex but createMinimalDocxZip only
defines Heading1–Heading3, causing references to Heading4–6 that don't exist;
fix by either capping detected heading levels to 3 in write (map any level >3 to
Heading3 before emitting w:pStyle) or by adding Heading4, Heading5 and Heading6
style definitions inside createMinimalDocxZip so styles.xml includes those
entries, and ensure the unique symbols referenced are write and
createMinimalDocxZip.
- Around line 353-405: The function splitTopLevelElements can enter an infinite
loop when xml.indexOf('>', i) returns -1; update splitTopLevelElements to check
for a -1 result for every call that assigns tagEnd or closeEnd (used in the
closing-tag branch, processing-instruction/comment branch,
self-closing/opening-tag branch) and handle it safely (e.g., bail out by
breaking the loop, set i = xml.length to terminate, or throw a controlled error)
instead of doing i = tagEnd + 1; ensure currentStart is cleared or any partially
collected element is handled before exiting so the function returns
deterministically for malformed input.
- Around line 750-758: The returned object in docx.ts currently sets fileType:
'docx' as any which bypasses the FileInfo type; update the FileInfo union in
base.ts (the FileInfo type / fileType property) to include 'docx' as a valid
member, then remove the cast in the object returned by the function in
src/utils/files/docx.ts (the return that currently sets fileType) so it reads
fileType: 'docx' without any casting; alternatively, if DOCX should be
categorized under an existing type, change the returned fileType to that
existing union value (for example 'binary') and remove the as any cast.
🧹 Nitpick comments (3)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed. In `@src/search-manager.ts`: - Line 8: Remove the top-level PizZip import and instead dynamically import it inside searchDocxFiles just like ExcelJS is imported; update searchDocxFiles to await import('pizzip') (e.g., const { default: PizZip } = await import('pizzip')) before using PizZip so the ZIP library is only loaded when docx search runs and startup costs remain minimal. - Around line 610-643: The findDocxFiles method duplicates logic from findExcelFiles; extract a shared helper like findFilesByExtension(rootPath: string, isMatch: (name: string) => boolean) that contains the recursive walk, hidden/node_modules skip, and single-file root handling, then refactor findDocxFiles and findExcelFiles to call that helper (use the existing isDocx and corresponding isExcel predicates when invoking the new function).src/search-manager.ts (3)
8-8: Top-levelPizZipimport diverges from the dynamic-import pattern used forExcelJS.
ExcelJSis dynamically imported on line 380 (await import('exceljs')) to avoid loading it at startup when Excel search isn't needed.PizZipshould follow the same pattern to keep startup costs consistent and avoid loading the ZIP library unconditionally.♻️ Proposed fix — dynamic import inside searchDocxFiles
Remove line 8 and dynamically import inside
searchDocxFiles:-import PizZip from 'pizzip';Then in
searchDocxFiles(around line 560):try { const buf = await fs.readFile(filePath); - const zip = new PizZip(buf); + const PizZip = (await import('pizzip')).default; + const zip = new PizZip(buf);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/search-manager.ts` at line 8, Remove the top-level PizZip import and instead dynamically import it inside searchDocxFiles just like ExcelJS is imported; update searchDocxFiles to await import('pizzip') (e.g., const { default: PizZip } = await import('pizzip')) before using PizZip so the ZIP library is only loaded when docx search runs and startup costs remain minimal.
610-643:findDocxFilesduplicatesfindExcelFiles— consider extracting a shared directory walker.Both methods have identical structure: recursive walk, skip hidden/
node_modulesdirs, filter by extension, handle single-file root paths. A sharedfindFilesByExtension(rootPath, extensionCheck)helper would eliminate the duplication.The existing TODO on line 335 already acknowledges that file-type search should be modularized, so this can be deferred.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/search-manager.ts` around lines 610 - 643, The findDocxFiles method duplicates logic from findExcelFiles; extract a shared helper like findFilesByExtension(rootPath: string, isMatch: (name: string) => boolean) that contains the recursive walk, hidden/node_modules skip, and single-file root handling, then refactor findDocxFiles and findExcelFiles to call that helper (use the existing isDocx and corresponding isExcel predicates when invoking the new function).
575-600: DOCX text search only matches within individual<w:t>elements — cross-run patterns will be missed.Word frequently splits logical text across multiple
<w:t>runs (e.g., due to spell-check, formatting, or revision marks). A search for"Hello World"won't match if the DOCX stores<w:t>Hello </w:t>and<w:t>World</w:t>in separate runs.This is acceptable for a first implementation, but worth noting as a known limitation — especially since the Excel search concatenates all cell values per row (line 425) to enable cross-cell matching.
| async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> { | ||
| if (mode === 'append') { | ||
| throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.'); | ||
| } | ||
|
|
||
| const text = typeof content === 'string' ? content : String(content); | ||
| const lines = text.split('\n'); | ||
|
|
||
| // Build paragraph XML from lines | ||
| const paragraphs: string[] = []; | ||
| for (const line of lines) { | ||
| const headingMatch = line.match(/^(#{1,6})\s+(.+)/); | ||
| if (headingMatch) { | ||
| const level = headingMatch[1].length; | ||
| const headingText = headingMatch[2]; | ||
| paragraphs.push( | ||
| `<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` + | ||
| `<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>` | ||
| ); | ||
| } else if (line.trim() === '') { | ||
| paragraphs.push(`<w:p/>`); | ||
| } else { | ||
| paragraphs.push( | ||
| `<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>` | ||
| ); | ||
| } | ||
| } | ||
|
|
||
| const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` + | ||
| `<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` + | ||
| `<w:body>${paragraphs.join('')}` + | ||
| `<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` + | ||
| `</w:body></w:document>`; | ||
|
|
||
| const zip = createMinimalDocxZip(docXml); | ||
| const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } }); | ||
| await fs.writeFile(path, buf); | ||
| } |
There was a problem hiding this comment.
Heading levels 4–6 produce style references (Heading4–Heading6) not defined in the generated styles.xml.
The heading regex on line 580 matches up to #{1,6}, but createMinimalDocxZip (line 466–477) only defines Heading1–Heading3 styles. Using ####, #####, or ###### in content will create <w:pStyle w:val="Heading4"/> etc., referencing non-existent styles.
Word typically degrades gracefully (falls back to Normal), but it may show warnings or produce inconsistent formatting.
🔧 Option: Cap heading level at 3
- const headingMatch = line.match(/^(#{1,6})\s+(.+)/);
+ const headingMatch = line.match(/^(#{1,3})\s+(.+)/);Or add Heading4–Heading6 styles to createMinimalDocxZip.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> { | |
| if (mode === 'append') { | |
| throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.'); | |
| } | |
| const text = typeof content === 'string' ? content : String(content); | |
| const lines = text.split('\n'); | |
| // Build paragraph XML from lines | |
| const paragraphs: string[] = []; | |
| for (const line of lines) { | |
| const headingMatch = line.match(/^(#{1,6})\s+(.+)/); | |
| if (headingMatch) { | |
| const level = headingMatch[1].length; | |
| const headingText = headingMatch[2]; | |
| paragraphs.push( | |
| `<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` + | |
| `<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>` | |
| ); | |
| } else if (line.trim() === '') { | |
| paragraphs.push(`<w:p/>`); | |
| } else { | |
| paragraphs.push( | |
| `<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>` | |
| ); | |
| } | |
| } | |
| const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` + | |
| `<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` + | |
| `<w:body>${paragraphs.join('')}` + | |
| `<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` + | |
| `</w:body></w:document>`; | |
| const zip = createMinimalDocxZip(docXml); | |
| const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } }); | |
| await fs.writeFile(path, buf); | |
| } | |
| async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> { | |
| if (mode === 'append') { | |
| throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.'); | |
| } | |
| const text = typeof content === 'string' ? content : String(content); | |
| const lines = text.split('\n'); | |
| // Build paragraph XML from lines | |
| const paragraphs: string[] = []; | |
| for (const line of lines) { | |
| const headingMatch = line.match(/^(#{1,3})\s+(.+)/); | |
| if (headingMatch) { | |
| const level = headingMatch[1].length; | |
| const headingText = headingMatch[2]; | |
| paragraphs.push( | |
| `<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` + | |
| `<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>` | |
| ); | |
| } else if (line.trim() === '') { | |
| paragraphs.push(`<w:p/>`); | |
| } else { | |
| paragraphs.push( | |
| `<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>` | |
| ); | |
| } | |
| } | |
| const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` + | |
| `<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` + | |
| `<w:body>${paragraphs.join('')}` + | |
| `<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` + | |
| `</w:body></w:document>`; | |
| const zip = createMinimalDocxZip(docXml); | |
| const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } }); | |
| await fs.writeFile(path, buf); | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/files/docx.ts` around lines 569 - 606, The write method in docx.ts
generates Heading1–Heading6 paragraph styles from the /^(#{1,6})\s+(.+)/ regex
but createMinimalDocxZip only defines Heading1–Heading3, causing references to
Heading4–6 that don't exist; fix by either capping detected heading levels to 3
in write (map any level >3 to Heading3 before emitting w:pStyle) or by adding
Heading4, Heading5 and Heading6 style definitions inside createMinimalDocxZip so
styles.xml includes those entries, and ensure the unique symbols referenced are
write and createMinimalDocxZip.
| return { | ||
| size: stats.size, | ||
| created: stats.birthtime, | ||
| modified: stats.mtime, | ||
| accessed: stats.atime, | ||
| isDirectory: false, | ||
| isFile: true, | ||
| permissions: (stats.mode & 0o777).toString(8), | ||
| fileType: 'docx' as any, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check if base.ts exists and view FileInfo type definition
if [ -f "src/utils/files/base.ts" ]; then
echo "=== FileInfo type in base.ts ==="
rg -n "fileType" src/utils/files/base.ts -B2 -A2
else
echo "base.ts not found"
fiRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 253
🏁 Script executed:
# View the actual code in docx.ts at lines 750-758
if [ -f "src/utils/files/docx.ts" ]; then
echo "=== docx.ts lines 750-758 ==="
sed -n '745,760p' src/utils/files/docx.ts
else
echo "docx.ts not found"
fiRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 562
🏁 Script executed:
# Search for fileType usage patterns in the codebase
echo "=== All fileType references ==="
rg -n "fileType" --type=ts -B1 -A1Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 9275
🏁 Script executed:
# Check PreviewFileType definition in types.ts
echo "=== PreviewFileType in types.ts ==="
rg -n "PreviewFileType" src/types.ts -B3 -A3Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 605
🏁 Script executed:
# Search for switch/case or pattern matching on FileInfo.fileType
echo "=== Switch/case/pattern matching on fileType from FileInfo ==="
rg -n "switch.*fileType|case.*['\"](?:text|excel|image|binary)" --type=ts src/utils/files/ src/tools/Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 138
🏁 Script executed:
# Check if docx is handled elsewhere or expected to be a valid type
echo "=== References to 'docx' fileType ==="
rg -n "docx" --type=ts -i | grep -i "file\|type"Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 6194
🏁 Script executed:
# Check PreviewFileType definition
if [ -f "src/ui/file-preview/shared/preview-file-types.js" ]; then
echo "=== PreviewFileType definition ==="
cat src/ui/file-preview/shared/preview-file-types.js
else
echo "File not found, searching for preview-file-types"
find . -name "*preview-file-types*" -type f
fiRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 175
🏁 Script executed:
# Check how FileInfo is converted to FilePreviewStructuredContent
echo "=== Mapping from FileInfo to FilePreviewStructuredContent ==="
rg -n "FilePreviewStructuredContent|FileInfo" src/handlers/filesystem-handlers.ts -B2 -A2 | head -100Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 705
🏁 Script executed:
# Verify if docx fileType is ever actually used in a way that would break
echo "=== Check downstream usage of fileType property ==="
rg -n "\.fileType" src/ --type=ts | head -50Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 2288
🏁 Script executed:
# Check the actual PreviewFileType definition
echo "=== PreviewFileType definition ==="
cat src/ui/file-preview/shared/preview-file-types.tsRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 1604
🏁 Script executed:
# Check how FileInfo.fileType is converted/mapped to FilePreviewStructuredContent.fileType
echo "=== Check conversion logic from FileInfo to FilePreviewStructuredContent ==="
rg -n "fileType:" src/handlers/filesystem-handlers.ts -B5 -A5Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 1564
🏁 Script executed:
# Check if 'docx' value ever appears as FilePreviewStructuredContent fileType
echo "=== Look for 'docx' in FilePreviewStructuredContent creation ==="
rg -n "previewFileType|FilePreviewStructuredContent" src/ --type=ts -A3 | grep -E "docx|fileType" | head -30Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 529
🏁 Script executed:
# Let's verify the actual flow - does getFileInfo output get used anywhere with switch statements?
echo "=== Trace FileInfo usage in filesystem.ts and tools ==="
rg -n "getFileInfo|fileInfo.fileType" src/tools/filesystem.ts src/handlers/ --type=ts -B2 -A2 | head -80Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 1403
🏁 Script executed:
# Check if docx.ts fileInfo is used downstream at all
echo "=== Check where getFileInfo is called and how result is used ==="
rg -n "await getFileInfo|await.*getFileInfo\|getFileInfo\(" src/ --type=ts -B1 -A3 | head -100Repository: wonderwhy-er/DesktopCommanderMCP
Length of output: 551
🏁 Script executed:
# Let's see the full context of how the FileInfo result is used
echo "=== Full handleGetFileInfo function ==="
sed -n '390,410p' src/handlers/filesystem-handlers.tsRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 809
🏁 Script executed:
# Double-check if FileInfo type is ever actually checked with pattern matching
echo "=== Search for any conditional logic on fileType values ==="
rg -n "fileType.*==|fileType.*===|case.*['\"]text['\"]|case.*['\"]excel['\"]|case.*['\"]binary['\"]|case.*['\"]image['\"]" --type=tsRepository: wonderwhy-er/DesktopCommanderMCP
Length of output: 1716
Remove as any bypass by extending the FileInfo union type.
fileType: 'docx' as any violates the FileInfo.fileType contract defined in base.ts line 218. While the architectural separation (FileInfo is independent from PreviewFileType used in the UI) prevents immediate downstream breakage, this type safety violation creates a maintenance risk and should be corrected.
Either extend the union in src/utils/files/base.ts:
- fileType: 'text' | 'excel' | 'image' | 'binary';
+ fileType: 'text' | 'excel' | 'image' | 'binary' | 'docx';Then remove the as any cast in docx.ts:
- fileType: 'docx' as any,
+ fileType: 'docx',Alternatively, map to 'binary' if docx should not be a distinct type category.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/files/docx.ts` around lines 750 - 758, The returned object in
docx.ts currently sets fileType: 'docx' as any which bypasses the FileInfo type;
update the FileInfo union in base.ts (the FileInfo type / fileType property) to
include 'docx' as a valid member, then remove the cast in the object returned by
the function in src/utils/files/docx.ts (the return that currently sets
fileType) so it reads fileType: 'docx' without any casting; alternatively, if
DOCX should be categorized under an existing type, change the returned fileType
to that existing union value (for example 'binary') and remove the as any cast.
| const xmlParts = ['word/document.xml', 'word/header1.xml', 'word/header2.xml', | ||
| 'word/header3.xml', 'word/footer1.xml', 'word/footer2.xml', 'word/footer3.xml']; |
There was a problem hiding this comment.
Suggestion: The DOCX search only checks a hard-coded set of header and footer XML parts (header1-3.xml, footer1-3.xml), so any text in additional headers/footers like header4.xml or footer5.xml will never be found, even though the DOCX outline and metadata logic handle all header/footer parts dynamically; this causes real content in some documents to be silently missed by start_search. [logic error]
Severity Level: Major ⚠️
- ❌ start_search misses text in DOCX section headers.
- ⚠️ Multi-section Word documents return incomplete search results.
- ⚠️ Inconsistency between DOCX read_file outline and search coverage.| const xmlParts = ['word/document.xml', 'word/header1.xml', 'word/header2.xml', | |
| 'word/header3.xml', 'word/footer1.xml', 'word/footer2.xml', 'word/footer3.xml']; | |
| const xmlParts: string[] = ['word/document.xml']; | |
| const zipFiles = zip.files; | |
| for (const relativePath of Object.keys(zipFiles)) { | |
| if ( | |
| (relativePath.startsWith('word/header') || relativePath.startsWith('word/footer')) && | |
| relativePath.endsWith('.xml') | |
| ) { | |
| xmlParts.push(relativePath); | |
| } | |
| } |
Steps of Reproduction ✅
1. From the MCP client, call the `start_search` tool, which is handled by
`handleStartSearch()` in `src/handlers/search-handlers.ts:13-35`. Pass `searchType:
"content"`, `pattern: "UniqueHeaderText"`, `path: "/some/dir"`, and `filePattern:
"*.docx"` so that content search is started against DOCX files.
2. Ensure `/some/dir` contains a DOCX file (e.g., `multi-section.docx`) whose body text
does NOT contain `"UniqueHeaderText"`, but whose additional section header (e.g.,
`word/header4.xml` or `word/header5.xml`) does contain `"UniqueHeaderText"`. Word will
create these additional header parts when the document has different headers in later
sections.
3. In `SearchManager.startSearch()` at `src/search-manager.ts:59-218`, the call is routed
to `this.searchDocxFiles(...)` (lines 175-194) because `searchType === 'content'` and
`this.shouldIncludeDocxSearch(options.filePattern, validPath)` (lines 495-517) returns
true for `filePattern: "*.docx"` and the validated root path.
4. Inside `searchDocxFiles()` at `src/search-manager.ts:523-607`, the DOCX is opened and
only the hard-coded XML parts listed in `xmlParts = ['word/document.xml',
'word/header1.xml', 'word/header2.xml', 'word/header3.xml', 'word/footer1.xml',
'word/footer2.xml', 'word/footer3.xml'];` (lines 564-566) are scanned. Because
`header4.xml` and above are never iterated in the `for (const xmlPath of xmlParts)` loop
at line 568, `"UniqueHeaderText"` in those later headers is never matched, so the
`start_search` response (initial result from `handleStartSearch` or later from
`handleGetMoreSearchResults` in `src/handlers/search-handlers.ts:85-147`) shows no
results, even though the DOCX actually contains the text.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/search-manager.ts
**Line:** 565:566
**Comment:**
*Logic Error: The DOCX search only checks a hard-coded set of header and footer XML parts (`header1-3.xml`, `footer1-3.xml`), so any text in additional headers/footers like `header4.xml` or `footer5.xml` will never be found, even though the DOCX outline and metadata logic handle all header/footer parts dynamically; this causes real content in some documents to be silently missed by `start_search`.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI finished reviewing your PR. |
|
@scutuatua-crypto can you share what and how you reviewed and tested? |
- Add 'docx' to FileInfo.fileType union in base.ts; remove `as any` cast in DocxFileHandler.getInfo - Fix splitTopLevelElements infinite loop: indexOf returning -1 would set i=0 and restart the loop; now bails out with break on malformed XML in all three tag branches (closing, processing-instruction, opening/self-closing) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relax validation so new_string can be an empty string (delete the matched text), while old_string must still be non-empty. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ispatch
- Validate path and resolve file handler once at the top instead of
duplicating validatePath + getFileHandler in each branch
- Compute hasEditRange once and reuse across both dispatch paths
- Remove hardcoded .endsWith('.docx') extension check — DocxFileHandler
is picked up automatically via hasEditRange like any other handler
- Path 1 (range + content): Excel and any future range-based handlers
- Path 2 (old_string + new_string): handlers with editRange own their
own text-replacement (DOCX); plain text falls through to performSearchReplace
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Feat/docx via existing tools
|
CodeAnt AI is running Incremental review Thanks for using CodeAnt! 🎉We're free for open-source projects. if you're enjoying it, help us grow by sharing. Share on X · |
|
|
||
| // Parse content if it's a JSON string (AI often sends arrays as JSON strings) | ||
| let content = parsed.content; | ||
| if (typeof content === 'string') { | ||
| try { | ||
| content = JSON.parse(content); | ||
| } catch { | ||
| // Leave as-is if not valid JSON - let handler decide | ||
| } | ||
| // Path 1: Range rewrite (Excel, etc.) — range + content | ||
| if (hasRange && hasContent) { | ||
| // Parse content if it's a JSON string (AI often sends arrays as JSON strings) | ||
| let content = parsed.content; | ||
| if (typeof content === 'string') { | ||
| try { | ||
| content = JSON.parse(content); | ||
| } catch { | ||
| // Leave as-is if not valid JSON - let handler decide | ||
| } | ||
| } | ||
|
|
||
| // Check if handler supports range editing | ||
| if ('editRange' in handler && typeof handler.editRange === 'function') { | ||
| if (hasEditRange) { | ||
| try { | ||
| // parsed.range is guaranteed non-empty string by hasRange check above | ||
| await handler.editRange(validatedPath, parsed.range!, content, parsed.options); | ||
| await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options); | ||
| return { | ||
| content: [{ | ||
| type: "text", | ||
| text: `Successfully updated range ${parsed.range} in ${parsed.file_path}` | ||
| }], | ||
| }; | ||
| } else { | ||
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | ||
| } catch (error) { | ||
| const errorMessage = error instanceof Error ? error.message : String(error); | ||
| return createErrorResponse(errorMessage); | ||
| } | ||
| } catch (error) { | ||
| const errorMessage = error instanceof Error ? error.message : String(error); | ||
| return createErrorResponse(errorMessage); | ||
| } | ||
|
|
||
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | ||
| } | ||
|
|
||
| // Text files: String replacement | ||
| // Validate required parameters for text replacement | ||
| // Path 2: Text replacement — old_string + new_string | ||
| if (parsed.old_string === undefined || parsed.new_string === undefined) { | ||
| return createErrorResponse(`Text replacement requires both old_string and new_string parameters`); | ||
| } | ||
|
|
||
| const searchReplace = { | ||
| // If the handler implements editRange it owns text-replacement for its file type | ||
| // (e.g. DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes). | ||
| // Plain text files fall through to performSearchReplace. | ||
| if (hasEditRange) { |
There was a problem hiding this comment.
Suggestion: Text-replacement requests for structured files that implement editRange (notably Excel) will now incorrectly be routed through editRange with an empty range and a DOCX-style { old_string, new_string } payload, even though editRange is intended only for range rewrites on those types; this can cause Excel edits to behave unexpectedly or fail, and silently changes the previous behavior where such calls fell back to plain text search/replace. Restricting the editRange-based text path to DOCX files only restores the intended contract (Excel uses range+content only, DOCX uses editRange for XML-aware text edits) while preserving the new DOCX functionality. [logic error]
Severity Level: Major ⚠️
- ⚠️ Excel edit_block text replacements always error with 2D-array message.
- ⚠️ Structured handlers cannot support simple text find/replace semantics.
- ⚠️ Client behavior diverges from documented text-edit expectations.
- ⚠️ Future editRange handlers risk same misrouting for text edits.| // Parse content if it's a JSON string (AI often sends arrays as JSON strings) | |
| let content = parsed.content; | |
| if (typeof content === 'string') { | |
| try { | |
| content = JSON.parse(content); | |
| } catch { | |
| // Leave as-is if not valid JSON - let handler decide | |
| } | |
| // Path 1: Range rewrite (Excel, etc.) — range + content | |
| if (hasRange && hasContent) { | |
| // Parse content if it's a JSON string (AI often sends arrays as JSON strings) | |
| let content = parsed.content; | |
| if (typeof content === 'string') { | |
| try { | |
| content = JSON.parse(content); | |
| } catch { | |
| // Leave as-is if not valid JSON - let handler decide | |
| } | |
| } | |
| // Check if handler supports range editing | |
| if ('editRange' in handler && typeof handler.editRange === 'function') { | |
| if (hasEditRange) { | |
| try { | |
| // parsed.range is guaranteed non-empty string by hasRange check above | |
| await handler.editRange(validatedPath, parsed.range!, content, parsed.options); | |
| await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options); | |
| return { | |
| content: [{ | |
| type: "text", | |
| text: `Successfully updated range ${parsed.range} in ${parsed.file_path}` | |
| }], | |
| }; | |
| } else { | |
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | |
| } catch (error) { | |
| const errorMessage = error instanceof Error ? error.message : String(error); | |
| return createErrorResponse(errorMessage); | |
| } | |
| } catch (error) { | |
| const errorMessage = error instanceof Error ? error.message : String(error); | |
| return createErrorResponse(errorMessage); | |
| } | |
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | |
| } | |
| // Text files: String replacement | |
| // Validate required parameters for text replacement | |
| // Path 2: Text replacement — old_string + new_string | |
| if (parsed.old_string === undefined || parsed.new_string === undefined) { | |
| return createErrorResponse(`Text replacement requires both old_string and new_string parameters`); | |
| } | |
| const searchReplace = { | |
| // If the handler implements editRange it owns text-replacement for its file type | |
| // (e.g. DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes). | |
| // Plain text files fall through to performSearchReplace. | |
| if (hasEditRange) { | |
| const isDocx = parsed.file_path.toLowerCase().endsWith('.docx'); | |
| // Path 1: Range rewrite (Excel, etc.) — range + content | |
| if (hasRange && hasContent) { | |
| // Parse content if it's a JSON string (AI often sends arrays as JSON strings) | |
| let content = parsed.content; | |
| if (typeof content === 'string') { | |
| try { | |
| content = JSON.parse(content); | |
| } catch { | |
| // Leave as-is if not valid JSON - let handler decide | |
| } | |
| } | |
| if (hasEditRange) { | |
| try { | |
| // parsed.range is guaranteed non-empty string by hasRange check above | |
| await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options); | |
| return { | |
| content: [{ | |
| type: "text", | |
| text: `Successfully updated range ${parsed.range} in ${parsed.file_path}` | |
| }], | |
| }; | |
| } catch (error) { | |
| const errorMessage = error instanceof Error ? error.message : String(error); | |
| return createErrorResponse(errorMessage); | |
| } | |
| } | |
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | |
| } | |
| // Path 2: Text replacement — old_string + new_string | |
| if (parsed.old_string === undefined || parsed.new_string === undefined) { | |
| return createErrorResponse(`Text replacement requires both old_string and new_string parameters`); | |
| } | |
| // If the handler implements editRange it owns text-replacement for DOCX files | |
| // (DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes). | |
| // Other file types (including Excel) fall through to performSearchReplace. | |
| if (hasEditRange && isDocx) { |
Steps of Reproduction ✅
1. Start the MCP server and ensure it exposes the `edit_block` command via
`handleEditBlock` re-exported in `src/handlers/edit-search-handlers.ts:1-13`.
2. Create or reuse an Excel file (e.g., `EDIT_EXCEL` in
`test/test-excel-files.js:211-218`, which is a `.xlsx` created via `writeFile`).
3. From any client (or by modifying `test/test-excel-files.js`), call `handleEditBlock`
with that `.xlsx` file path, *without* `range` or `content`, but with `old_string` and
`new_string`, for example:
- `handleEditBlock({ file_path: EDIT_EXCEL, old_string: 'Apple', new_string: 'Pear' })`
This mirrors the existing call pattern in `test/test-excel-files.js:221-226` but omits
`range`/`content` and supplies text parameters instead.
4. The call reaches `handleEditBlock` in `src/tools/edit.ts:371-459`, where:
- `hasRange` and `hasContent` are false, so Path 1 is skipped.
- `validatedPath` is resolved and `handler` is obtained from `getFileHandler`, which
returns an `ExcelFileHandler` for `.xlsx` files (validated by
`test/test-excel-files.js:81-87`).
- `hasEditRange` is true for `ExcelFileHandler` (`src/utils/files/excel.ts:158-236`
implements `editRange`).
- In the text-replacement path at `src/tools/edit.ts:423-454`, the code calls
`handler.editRange!(validatedPath!, '', { old_string, new_string, expected_replacements
})`.
5. Inside `ExcelFileHandler.editRange` (`src/utils/files/excel.ts:158-236`), the method
validates `content` with `if (!Array.isArray(content)) { throw new Error('Content must be
a 2D array for range editing'); }` at lines 170-173; since the payload is an object `{
old_string, new_string, ... }` instead of a 2D array, this throws.
6. The thrown error is caught back in `handleEditBlock` at `src/tools/edit.ts:404-417`,
and `createErrorResponse` is returned to the client with the message `"Content must be a
2D array for range editing"`, instead of performing any text search/replace or falling
through to `performSearchReplace`.
7. This misrouting will occur for any file type whose handler implements `editRange` but
expects structured range edits (currently `ExcelFileHandler`, per
`src/utils/files/base.ts:31-41` and `src/utils/files/excel.ts:33-40`), whenever a caller
issues a text-style `edit_block` request (old_string/new_string only) against such a file.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** src/tools/edit.ts
**Line:** 391:431
**Comment:**
*Logic Error: Text-replacement requests for structured files that implement `editRange` (notably Excel) will now incorrectly be routed through `editRange` with an empty range and a DOCX-style `{ old_string, new_string }` payload, even though `editRange` is intended only for range rewrites on those types; this can cause Excel edits to behave unexpectedly or fail, and silently changes the previous behavior where such calls fell back to plain text search/replace. Restricting the `editRange`-based text path to DOCX files only restores the intended contract (Excel uses range+content only, DOCX uses `editRange` for XML-aware text edits) while preserving the new DOCX functionality.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.|
CodeAnt AI Incremental review completed. |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (2)
src/tools/edit.ts (1)
456-459: PassvalidatedPathtoperformSearchReplaceto avoid re-validating the same path.
validatedPathis already resolved on line 382. Passingparsed.file_pathcausesperformSearchReplace→validatePath→writeFile→validatePathto re-run the same validation three times for a single edit.♻️ Proposed fix
- return performSearchReplace(parsed.file_path, { + return performSearchReplace(validatedPath, { search: parsed.old_string, replace: parsed.new_string }, parsed.expected_replacements);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/tools/edit.ts` around lines 456 - 459, The call to performSearchReplace currently passes parsed.file_path causing redundant validations; instead pass the already-resolved validatedPath variable to avoid repeated validatePath/writeFile validatePath cycles. Update the call site that invokes performSearchReplace to use validatedPath (instead of parsed.file_path) so performSearchReplace, validatePath, and writeFile operate on the pre-validated path.src/utils/files/docx.ts (1)
629-632:||operator silently bypasses an explicit emptynew_string— use??instead.newStr = content.newStr || content.new_string || content.replace || '';
||treats''as falsy, sonew_string: ''(a deletion) falls through tocontent.replace. In the current call path fromedit.tscontent.replaceis alwaysundefined, so the final value happens to be''— correct only by coincidence. If a future caller passes{ new_string: '', replace: 'fallback' }, the deletion intent is silently discarded.The
oldStrline has the same pattern, though emptyold_stringis prevented by schema validation.♻️ Proposed fix — use nullish coalescing
- oldStr = content.oldStr || content.old_string || content.search || ''; - newStr = content.newStr || content.new_string || content.replace || ''; + oldStr = content.oldStr ?? content.old_string ?? content.search ?? ''; + newStr = content.newStr ?? content.new_string ?? content.replace ?? '';🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/files/docx.ts` around lines 629 - 632, The object field fallbacks for oldStr, newStr and expectedReplacements use the || operator which treats empty strings and 0 as falsy and can silently override explicit values; update those fallbacks to use nullish coalescing (??) instead so explicit empty_string or 0 are preserved — specifically change the expressions that set oldStr, newStr and expectedReplacements (the code that reads content.oldStr / content.old_string / content.search and content.newStr / content.new_string / content.replace and content.expectedReplacements / content.expected_replacements) to use ?? with the same final defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/tools/edit.ts`:
- Around line 404-418: The range-edit branch currently awaits handler.editRange!
but discards its EditResult, always returning a success message; update the
hasEditRange branch in the function handling edits to capture the result from
handler.editRange!(validatedPath!, parsed.range!, content, parsed.options), then
check the returned EditResult (e.g., result.success): if false, return
createErrorResponse with the result.errors (or a joined message) so failures
from DocxFileHandler.editRange propagate, otherwise return the existing success
content; keep the existing try/catch to handle true exceptions and convert them
to createErrorResponse as before.
In `@src/utils/files/docx.ts`:
- Around line 312-344: The loop in extractNestedElements can infinite-loop when
xml.indexOf('>', openPos) returns -1 (malformed XML); add the same safeguard
used in splitTopLevelElements: if nextClose === -1 then break out of the outer
loop (or set searchFrom to xml.length) before proceeding to the nesting logic so
pos doesn't become 0 and re-find the same openPos; update the block that
computes nextClose (and references openPos/nextClose) to bail when no '>' is
found.
---
Duplicate comments:
In `@src/utils/files/docx.ts`:
- Around line 572-598: The write method emits Heading1–Heading6 (via the regex
in write) but createMinimalDocxZip only defines Heading1–Heading3, causing
undefined style references; either map any heading levels >3 to "Heading3"
inside write (e.g., clamp level = Math.min(level, 3) before emitting w:pStyle)
or update createMinimalDocxZip to include style definitions for
Heading4–Heading6 in styles.xml so Heading4/5/6 exist; locate the write function
and createMinimalDocxZip to implement one of these fixes and ensure escapeXml
usage remains unchanged.
---
Nitpick comments:
In `@src/tools/edit.ts`:
- Around line 456-459: The call to performSearchReplace currently passes
parsed.file_path causing redundant validations; instead pass the
already-resolved validatedPath variable to avoid repeated validatePath/writeFile
validatePath cycles. Update the call site that invokes performSearchReplace to
use validatedPath (instead of parsed.file_path) so performSearchReplace,
validatePath, and writeFile operate on the pre-validated path.
In `@src/utils/files/docx.ts`:
- Around line 629-632: The object field fallbacks for oldStr, newStr and
expectedReplacements use the || operator which treats empty strings and 0 as
falsy and can silently override explicit values; update those fallbacks to use
nullish coalescing (??) instead so explicit empty_string or 0 are preserved —
specifically change the expressions that set oldStr, newStr and
expectedReplacements (the code that reads content.oldStr / content.old_string /
content.search and content.newStr / content.new_string / content.replace and
content.expectedReplacements / content.expected_replacements) to use ?? with the
same final defaults.
| if (hasEditRange) { | ||
| try { | ||
| // parsed.range is guaranteed non-empty string by hasRange check above | ||
| await handler.editRange(validatedPath, parsed.range!, content, parsed.options); | ||
| await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options); | ||
| return { | ||
| content: [{ | ||
| type: "text", | ||
| text: `Successfully updated range ${parsed.range} in ${parsed.file_path}` | ||
| }], | ||
| }; | ||
| } else { | ||
| return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`); | ||
| } catch (error) { | ||
| const errorMessage = error instanceof Error ? error.message : String(error); | ||
| return createErrorResponse(errorMessage); | ||
| } | ||
| } catch (error) { | ||
| const errorMessage = error instanceof Error ? error.message : String(error); | ||
| return createErrorResponse(errorMessage); | ||
| } |
There was a problem hiding this comment.
editRange return value discarded in the range-rewrite path — failures silently reported as success.
await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
content: [{ type: "text", text: `Successfully updated range ${parsed.range} ...` }],
};DocxFileHandler.editRange never throws — it returns { success: false, errors: [...] } for all error conditions (missing old_string, no match, etc.). If a caller uses range + content syntax on a .docx file, the EditResult is discarded and the caller unconditionally receives a success message regardless of the actual outcome. Errors are visible only in Path 2 (old_string/new_string) where the result is checked.
🐛 Proposed fix — propagate EditResult failure
- await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
- return {
- content: [{
- type: "text",
- text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
- }],
- };
+ const rangeResult = await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
+ if (rangeResult && !rangeResult.success) {
+ const errorMsg = rangeResult.errors?.map(e => e.error).join('; ') || 'Unknown error';
+ return createErrorResponse(errorMsg);
+ }
+ return {
+ content: [{
+ type: "text",
+ text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
+ }],
+ };🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/tools/edit.ts` around lines 404 - 418, The range-edit branch currently
awaits handler.editRange! but discards its EditResult, always returning a
success message; update the hasEditRange branch in the function handling edits
to capture the result from handler.editRange!(validatedPath!, parsed.range!,
content, parsed.options), then check the returned EditResult (e.g.,
result.success): if false, return createErrorResponse with the result.errors (or
a joined message) so failures from DocxFileHandler.editRange propagate,
otherwise return the existing success content; keep the existing try/catch to
handle true exceptions and convert them to createErrorResponse as before.
| // Check for self-closing | ||
| const nextClose = xml.indexOf('>', openPos); | ||
| if (nextClose !== -1 && xml[nextClose - 1] === '/') { | ||
| results.push(xml.substring(openPos, nextClose + 1)); | ||
| searchFrom = nextClose + 1; | ||
| continue; | ||
| } | ||
|
|
||
| // Find matching close tag respecting nesting | ||
| let depth = 1; | ||
| let pos = nextClose + 1; | ||
| while (depth > 0 && pos < xml.length) { | ||
| const nextOpen = xml.indexOf(openTag, pos); | ||
| const nextCloseTag = xml.indexOf(closeTag, pos); | ||
|
|
||
| if (nextCloseTag === -1) break; // malformed XML | ||
|
|
||
| if (nextOpen !== -1 && nextOpen < nextCloseTag) { | ||
| // Check it's actually an open tag | ||
| const ca = xml[nextOpen + openTag.length]; | ||
| if (ca === ' ' || ca === '>' || ca === '/') { | ||
| depth++; | ||
| } | ||
| pos = nextOpen + openTag.length; | ||
| } else { | ||
| depth--; | ||
| if (depth === 0) { | ||
| results.push(xml.substring(openPos, nextCloseTag + closeTag.length)); | ||
| } | ||
| pos = nextCloseTag + closeTag.length; | ||
| } | ||
| } | ||
| searchFrom = pos; |
There was a problem hiding this comment.
Potential infinite loop in extractNestedElements when > is missing — same class of bug as the fixed splitTopLevelElements.
Line 313 assigns nextClose = xml.indexOf('>', openPos). If this returns -1 (malformed XML), the self-closing branch on line 314 is skipped (short-circuit on nextClose !== -1), and line 322 evaluates to pos = -1 + 1 = 0. The inner while loop then re-searches from position 0 with depth = 1; if the close tag is also absent (nextCloseTag === -1), the inner loop breaks and searchFrom = pos = 0, so the outer loop finds the same openPos again — infinite loop.
splitTopLevelElements had an identical issue and was already fixed (guarded with if (... === -1) break); the same guard is missing here.
🐛 Proposed fix — bail out when no `>` is found
// Check for self-closing
const nextClose = xml.indexOf('>', openPos);
+ if (nextClose === -1) { searchFrom = xml.length; continue; } // malformed — bail out
if (nextClose !== -1 && xml[nextClose - 1] === '/') {
results.push(xml.substring(openPos, nextClose + 1));🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/files/docx.ts` around lines 312 - 344, The loop in
extractNestedElements can infinite-loop when xml.indexOf('>', openPos) returns
-1 (malformed XML); add the same safeguard used in splitTopLevelElements: if
nextClose === -1 then break out of the outer loop (or set searchFrom to
xml.length) before proceeding to the nesting logic so pos doesn't become 0 and
re-find the same openPos; update the block that computes nextClose (and
references openPos/nextClose) to bail when no '>' is found.
User description
Summary
Adds comprehensive DOCX file support without introducing any new MCP tools. Everything works through the existing tool interface.
Features
Read (
read_file)<w:t>XML fragments, tables with cell content, styles, image refs, headers/footersoffset≠0): returns pretty-printed XML with line pagination for drill-downCreate (
write_file)write_file('doc.docx', content)creates a valid DOCX with proper styles, fonts, and page layout#,##,###) converted to Word heading styles (Heading1-3)Edit (
edit_block)old_string/new_stringSearch (
start_search)<w:t>text from document.xml, headers, and footersfilePatternincludes*.docxor root path is a.docxfileTechnical Details
src/utils/files/docx.ts— DocxFileHandler implementing FileHandler interfacepizzip(for ZIP manipulation, already used pattern from PR review)Key Fixes During Development
<w:t>regex was matching<w:tbl>,<w:tc>,<w:tr>— fixed to require tag name boundaryfileReadLineLimitlength parameterFiles Changed
src/utils/files/docx.ts(new — handler, outline, XML helpers, creation)src/utils/files/factory.ts(register DocxFileHandler)src/utils/files/base.ts(add isDocx metadata)src/tools/edit.ts(route .docx to DocxFileHandler)src/search-manager.ts(DOCX content search)src/server.ts(updated tool descriptions)package.json(added pizzip)Summary by CodeRabbit
New Features
Documentation
Behavior
CodeAnt-AI Description
Add first-class DOCX support through existing read_file, write_file, edit_block, and search
What Changed
Impact
✅ Clearer DOCX edit workflow (outline → raw XML → XML find/replace)✅ Search finds matches inside .docx bodies and headers/footers✅ Create DOCX from plain text and Markdown headings without external tools💡 Usage Guide
Checking Your Pull Request
Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.
Talking to CodeAnt AI
Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:
This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.
Example
Preserve Org Learnings with CodeAnt
You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:
This helps CodeAnt AI learn and adapt to your team's coding style and standards.
Example
Retrigger review
Ask CodeAnt AI to review the PR again, by typing:
Check Your Repository Health
To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.