Skip to content

feat: DOCX support via existing read_file, write_file, edit_block, and start_search tools#341

Merged
edgarsskore merged 14 commits into
mainfrom
feat/docx-via-existing-tools
Feb 19, 2026
Merged

feat: DOCX support via existing read_file, write_file, edit_block, and start_search tools#341
edgarsskore merged 14 commits into
mainfrom
feat/docx-via-existing-tools

Conversation

@wonderwhy-er

@wonderwhy-er wonderwhy-er commented Feb 16, 2026

Copy link
Copy Markdown
Owner

User description

Summary

Adds comprehensive DOCX file support without introducing any new MCP tools. Everything works through the existing tool interface.

Features

Read (read_file)

  • Default mode: returns a text-bearing outline showing paragraphs with <w:t> XML fragments, tables with cell content, styles, image refs, headers/footers
  • Raw XML mode (offset≠0): returns pretty-printed XML with line pagination for drill-down

Create (write_file)

  • write_file('doc.docx', content) creates a valid DOCX with proper styles, fonts, and page layout
  • Markdown headings (#, ##, ###) converted to Word heading styles (Heading1-3)
  • Plain text lines become Normal paragraphs

Edit (edit_block)

  • Find/replace on pretty-printed XML using old_string/new_string
  • Automatically searches headers and footers if not found in document body
  • Lossless round-trip: pretty-print → edit → compact → repack ZIP

Search (start_search)

  • Content search inside DOCX files, parallel to existing Excel search
  • Extracts <w:t> text from document.xml, headers, and footers
  • Triggered when filePattern includes *.docx or root path is a .docx file

Technical Details

  • New file: src/utils/files/docx.ts — DocxFileHandler implementing FileHandler interface
  • Dependencies: pizzip (for ZIP manipulation, already used pattern from PR review)
  • Registered in factory.ts with priority before binary detection
  • DOCX routing added to edit.ts for edit_block
  • Search integration in search-manager.ts following Excel search pattern

Key Fixes During Development

  • <w:t> regex was matching <w:tbl>, <w:tc>, <w:tr> — fixed to require tag name boundary
  • Outline mode was incorrectly triggered by default fileReadLineLimit length parameter
  • Table rows now fully displayed (not truncated at 4) since tables contain actual document content

Files Changed

  • src/utils/files/docx.ts (new — handler, outline, XML helpers, creation)
  • src/utils/files/factory.ts (register DocxFileHandler)
  • src/utils/files/base.ts (add isDocx metadata)
  • src/tools/edit.ts (route .docx to DocxFileHandler)
  • src/search-manager.ts (DOCX content search)
  • src/server.ts (updated tool descriptions)
  • package.json (added pizzip)

Summary by CodeRabbit

  • New Features

    • Full DOCX support: search inside DOCX files, read as outline or raw XML, create DOCX from plain text, and perform range-based or find/replace edits with result counts.
    • File handling updates: DOCX recognized as a distinct file type and included in handler selection.
  • Documentation

    • Tool descriptions updated with DOCX usage modes, editing guidance, and examples.
  • Behavior

    • Edit input validation refined to allow empty replacements when appropriate.

CodeAnt-AI Description

Add first-class DOCX support through existing read_file, write_file, edit_block, and search

What Changed

  • read_file(.docx) now returns a human-friendly outline by default (paragraphs, tables, styles, image references); requesting non-zero offset returns pretty-printed raw XML with line pagination for drilling down
  • write_file(... .docx) creates a valid DOCX from plain text where Markdown headings (#, ##, ###) become Word heading styles and blank lines become paragraphs
  • edit_block for DOCX performs find/replace on the pretty-printed XML inside the .docx (including headers/footers), validates expected match counts, compacts and repacks the DOCX, and reports precise errors when searches fail
  • Global search now includes DOCX content: searches text extracted from <w:t> across document.xml, headers, and footers and returns file:part, line, and match context
  • File handler selection prioritizes .docx, the file metadata now tags DOCX files, and server/tool help text documents the DOCX read/edit workflow and required parameters
  • Fixed XML extraction and outline bugs so text in tables and nested elements appears correctly; raw-XML mode only triggers when offset != 0

Impact

✅ Clearer DOCX edit workflow (outline → raw XML → XML find/replace)
✅ Search finds matches inside .docx bodies and headers/footers
✅ Create DOCX from plain text and Markdown headings without external tools

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

- Add DocxFileHandler (src/utils/files/docx.ts)
  - read_file default: returns text-bearing outline (paragraphs, tables, images, headers/footers)
  - read_file with offset/length: returns raw pretty-printed XML with line pagination
  - edit_block: find/replace on pretty-printed XML, compact + repack into valid DOCX
  - Also searches headers/footers when edit target not found in document body
- Register DocxFileHandler in factory.ts (priority before binary detection)
- Route .docx files through DocxFileHandler.editRange in edit.ts
- Update read_file and edit_block tool descriptions with DOCX usage
- Add isDocx to FileMetadata in base.ts
- Add pizzip dependency for DOCX ZIP handling

No new tools added — DOCX works through the same read_file/edit_block workflow
that already exists for text, Excel, and PDF files.
Only switch to raw XML mode when offset is explicitly non-zero.
Previously, the default fileReadLineLimit (1000) passed as length
would incorrectly trigger raw XML mode for large documents.
…y w:t

The regex <w:t[^>]*> matched any tag starting with 'w:t' including
w:tbl, w:tc, w:tr etc. Fixed to <w:t(?:\s[^>]*)?>  which requires
the tag name to end at > or whitespace.

Also added full-document text preview at top of outline so content
is always visible regardless of XML nesting depth (text boxes, etc).
The regex fix made the outline show text correctly in all elements
including table cells with text boxes. A flat text dump loses the
structural awareness that makes the outline useful for editing.
- Paragraphs and SDTs now show <w:t>text</w:t> fragments instead of
  plain text, so Claude can copy them directly into edit_block calls
- Table cells still show plain text for readability
- Updated tool description: offset must be non-zero for raw XML mode
- Added extractTextFragments() helper
Extracts <w:t> text from document.xml, headers, and footers, then
searches with regex matching. Runs async alongside ripgrep, same
pattern as searchExcelFiles. Triggered when filePattern includes
*.docx or rootPath is a .docx file.
- Show all table rows instead of truncating at 4 (tables contain real content)
- Increase cell text preview to 60 chars, paragraph text to 500 chars
- Update outline header: show edit_block example with <w:t> syntax
- Add guidance for bulk operations (translation): use Python script
- Clarify raw XML access requires offset=1
write_file('doc.docx', content) creates a valid DOCX file with:
- Markdown headings (#, ##, ###) converted to Word heading styles
- Plain text lines as Normal paragraphs
- Empty lines as empty paragraphs
- Proper Calibri font defaults and page margins
- Complete ZIP structure with styles.xml, content types, relationships

Updated write_file tool description to reflect DOCX creation support.
@codeant-ai

codeant-ai Bot commented Feb 16, 2026

Copy link
Copy Markdown
Contributor

CodeAnt AI is reviewing your PR.


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@coderabbitai

coderabbitai Bot commented Feb 16, 2026

Copy link
Copy Markdown
Contributor
📝 Walkthrough

Walkthrough

This change adds DOCX support: a new runtime dependency (pizzip), a comprehensive DocxFileHandler (read/write/edit/outline/XML utilities), integration into the file handler factory, DOCX-aware search in the search manager (non-blocking, parallel to other searches, and a new wasIncomplete flag), edits routed through handler editRange, and related docs/schema adjustments.

Changes

Cohort / File(s) Summary
Dependencies
package.json
Added runtime dependency pizzip ^3.2.0 for DOCX (ZIP) handling.
DOCX File Handler
src/utils/files/docx.ts
New large handler implementing DOCX read/write/edit, XML pretty-print/compact utilities, outline extraction, creating minimal DOCX, XML fragment parsing, and a DocxFileHandler class with read, write, editRange, getInfo, and canHandle.
File System Infrastructure
src/utils/files/base.ts, src/utils/files/factory.ts
Added isDocx?: boolean and extended fileType union to include 'docx'; integrated DocxFileHandler into factory with lazy init and priority before PDF/Excel.
Search Integration
src/search-manager.ts
Added DOCX search: shouldIncludeDocxSearch, findDocxFiles, searchDocxFiles; runs in parallel with existing searches; merges DOCX matches into sessions; added wasIncomplete?: boolean on SearchSession and updated readSearchResults output to include hasMoreResults and optional wasIncomplete.
Edit Integration & Validation
src/tools/edit.ts, src/tools/schemas.ts
Centralized path validation and handler resolution; added hasEditRange branching to call editRange for structured files (including DOCX) and fall back to legacy text-replace when not supported; schema change allows empty new_string while requiring old_string or range+content.
Documentation
src/server.ts
Added descriptive docs for DOCX support in Read File, Write File, and Edit Block tool descriptions (modes, examples, XML find/replace guidance).

Sequence Diagram(s)

sequenceDiagram
    actor Client
    participant SearchTool
    participant SearchManager
    participant DocxHandler
    participant FileSystem

    Client->>SearchTool: submit search query
    SearchTool->>SearchManager: startSearch(pattern, options)
    SearchManager->>SearchManager: shouldIncludeDocxSearch()
    alt DOCX search enabled
        SearchManager->>SearchManager: findDocxFiles(root)
        loop per found file
            SearchManager->>DocxHandler: searchDocxFiles(file, pattern)
            DocxHandler->>FileSystem: read .docx (binary)
            FileSystem-->>DocxHandler: docx buffer
            DocxHandler->>DocxHandler: extract document.xml, headers/footers
            DocxHandler->>DocxHandler: parse XML, apply regex, build matches
            DocxHandler-->>SearchManager: return matches
        end
    end
    SearchManager->>SearchTool: merge results (text/excel/docx)
    SearchTool-->>Client: return aggregated results
Loading
sequenceDiagram
    actor Client
    participant EditTool
    participant EditHandler
    participant DocxHandler
    participant FileSystem

    Client->>EditTool: request edit (path, old_string/new_string or range+content)
    EditTool->>EditHandler: handleEditBlock(...)
    EditHandler->>EditHandler: resolve handler & hasEditRange?
    alt handler supports editRange (.docx)
        EditHandler->>DocxHandler: editRange(path, range|payload, content)
        DocxHandler->>FileSystem: read .docx
        FileSystem-->>DocxHandler: docx buffer
        DocxHandler->>DocxHandler: pretty-print XML, apply find/replace
        DocxHandler->>FileSystem: write updated .docx
        FileSystem-->>DocxHandler: write result
        DocxHandler-->>EditHandler: EditResult
    else fallback
        EditHandler->>EditHandler: performSearchReplace(...) (legacy flow)
    end
    EditHandler-->>EditTool: result/confirmation
    EditTool-->>Client: reply
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested labels

size:L

Suggested reviewers

  • serg33v
  • dmitry-ottic-ai

Poem

🐰 I hopped through folders, found .docx gold,
Zipped, unwrapped, read XML bold,
I searched and replaced with a twitch of my nose,
Packaged it back where the document grows,
A carrot-sized patch that helps text unfold.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 75.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and specifically describes the main feature: adding DOCX support through integration with existing tools (read_file, write_file, edit_block, start_search) without introducing new tools.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/docx-via-existing-tools

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@codeant-ai codeant-ai Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files label Feb 16, 2026
@codeant-ai

codeant-ai Bot commented Feb 16, 2026

Copy link
Copy Markdown
Contributor

Sequence Diagram

The PR adds a DocxFileHandler and wires it into read_file, edit_block, and start_search so .docx files are exposed as a readable outline or raw pretty-printed XML, editable via XML find/replace, and searchable by extracting <w:t> text from DOCX parts.

sequenceDiagram
    participant Client
    participant Server
    participant DocxHandler
    participant ZIP
    participant FS

    %% Read flow (default outline)
    Client->>Server: read_file(path.docx, offset=0)
    Server->>DocxHandler: handle read(path)
    DocxHandler->>ZIP: unzip & load word/document.xml
    DocxHandler-->>Server: return Outline (paragraphs, tables, headers/footers)
    Server-->>Client: 200 OK (outline)

    %% Edit flow (XML find/replace)
    Client->>Server: edit_block(file.docx, old_string, new_string)
    Server->>DocxHandler: editRange(path, old_string, new_string)
    DocxHandler->>ZIP: pretty-print target XML part, apply replacement, compact & repack
    DocxHandler->>FS: write updated .docx
    DocxHandler-->>Server: editsApplied / success
    Server-->>Client: Edit result

    %% Search flow (content)
    Client->>Server: start_search(root or pattern *.docx, regex)
    Server->>DocxHandler: searchDocxFiles(root, pattern)
    DocxHandler->>ZIP: read document/header/footer XML parts
    DocxHandler-->>Server: matches (file:part, line, snippet)
    Server-->>Client: aggregated search results
Loading

Generated by CodeAnt AI

@codeant-ai

codeant-ai Bot commented Feb 16, 2026

Copy link
Copy Markdown
Contributor

Nitpicks 🔍

🔒 No security issues identified
⚡ Recommended areas for review

  • Risky editing workflow
    The edit_block DOCX flow encourages direct XML find/replace and repacking of .docx files. There is no documented or enforced safety net (backup, XML well-formedness validation, mismatch checks for expected_replacements) and careless replacements can corrupt a DOCX. The handler should validate edits, ensure replacement counts match expectations, and create a safe backup or dry-run validation step before writing the final DOCX.

  • Header/Footer search
    Searching only a hard-coded list (header1..3/footer1..3) can miss other header/footer parts (different numbering, multiple headers/footers, alt parts, or files with different names). This can cause edits/searches to silently miss occurrences located in other header/footer parts. Consider scanning the zip entries (xmlParts/zip.files) for any path starting with "word/header" or "word/footer" instead of a fixed list.

  • Regex ReDoS
    The code constructs a RegExp from the user-supplied search pattern and uses it to test extracted text from DOCX parts. Arbitrary/complex regexes (or very long patterns) may cause catastrophic backtracking or excessive CPU usage. Consider limiting pattern length, falling back to literal matching for long/untrusted patterns, or using a safe matching strategy.

  • DOCX routing fallback
    The new routing sends .docx files to handler.editRange. If a handler does not implement editRange the code falls through to the text search/replace flow. Ensure this fallback behavior is intentional (and safe) for docx files, or explicitly handle the non-editRange case to avoid surprising results.

  • XML part discovery coverage
    The DOCX search enumerates a hard-coded list of xmlParts (header1..3/footer1..3). DOCX files may have different header/footer names (e.g., header4,/headerDefault, multiple headers/footers) or numbered sections. Relying on a fixed list can miss matches. Consider enumerating zip entries that match the word/(document|header|footer).*.xml pattern.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Fix all issues with AI agents
Verify each finding against the current code and only fix it if needed.


In `@src/search-manager.ts`:
- Line 8: Remove the top-level PizZip import and instead dynamically import it
inside searchDocxFiles just like ExcelJS is imported; update searchDocxFiles to
await import('pizzip') (e.g., const { default: PizZip } = await
import('pizzip')) before using PizZip so the ZIP library is only loaded when
docx search runs and startup costs remain minimal.
- Around line 610-643: The findDocxFiles method duplicates logic from
findExcelFiles; extract a shared helper like findFilesByExtension(rootPath:
string, isMatch: (name: string) => boolean) that contains the recursive walk,
hidden/node_modules skip, and single-file root handling, then refactor
findDocxFiles and findExcelFiles to call that helper (use the existing isDocx
and corresponding isExcel predicates when invoking the new function).

In `@src/tools/edit.ts`:
- Around line 421-451: The DOCX branch can fall through if the handler lacks an
editRange method, causing later performSearchReplace to treat a .docx as plain
text; after obtaining handler from getFileHandler(validatedPath) add a defensive
return when 'editRange' is not in handler or not a function (e.g., return
createErrorResponse or a structured failure) so execution exits the
.docx-specific branch instead of continuing to performSearchReplace; reference
parsed.file_path, validatePath, getFileHandler, handler and editRange to locate
the exact spot to add this early return.

In `@src/utils/files/docx.ts`:
- Around line 569-606: The write method in docx.ts generates Heading1–Heading6
paragraph styles from the /^(#{1,6})\s+(.+)/ regex but createMinimalDocxZip only
defines Heading1–Heading3, causing references to Heading4–6 that don't exist;
fix by either capping detected heading levels to 3 in write (map any level >3 to
Heading3 before emitting w:pStyle) or by adding Heading4, Heading5 and Heading6
style definitions inside createMinimalDocxZip so styles.xml includes those
entries, and ensure the unique symbols referenced are write and
createMinimalDocxZip.
- Around line 353-405: The function splitTopLevelElements can enter an infinite
loop when xml.indexOf('>', i) returns -1; update splitTopLevelElements to check
for a -1 result for every call that assigns tagEnd or closeEnd (used in the
closing-tag branch, processing-instruction/comment branch,
self-closing/opening-tag branch) and handle it safely (e.g., bail out by
breaking the loop, set i = xml.length to terminate, or throw a controlled error)
instead of doing i = tagEnd + 1; ensure currentStart is cleared or any partially
collected element is handled before exiting so the function returns
deterministically for malformed input.
- Around line 750-758: The returned object in docx.ts currently sets fileType:
'docx' as any which bypasses the FileInfo type; update the FileInfo union in
base.ts (the FileInfo type / fileType property) to include 'docx' as a valid
member, then remove the cast in the object returned by the function in
src/utils/files/docx.ts (the return that currently sets fileType) so it reads
fileType: 'docx' without any casting; alternatively, if DOCX should be
categorized under an existing type, change the returned fileType to that
existing union value (for example 'binary') and remove the as any cast.
🧹 Nitpick comments (3)
🤖 Fix all nitpicks with AI agents
Verify each finding against the current code and only fix it if needed.


In `@src/search-manager.ts`:
- Line 8: Remove the top-level PizZip import and instead dynamically import it
inside searchDocxFiles just like ExcelJS is imported; update searchDocxFiles to
await import('pizzip') (e.g., const { default: PizZip } = await
import('pizzip')) before using PizZip so the ZIP library is only loaded when
docx search runs and startup costs remain minimal.
- Around line 610-643: The findDocxFiles method duplicates logic from
findExcelFiles; extract a shared helper like findFilesByExtension(rootPath:
string, isMatch: (name: string) => boolean) that contains the recursive walk,
hidden/node_modules skip, and single-file root handling, then refactor
findDocxFiles and findExcelFiles to call that helper (use the existing isDocx
and corresponding isExcel predicates when invoking the new function).
src/search-manager.ts (3)

8-8: Top-level PizZip import diverges from the dynamic-import pattern used for ExcelJS.

ExcelJS is dynamically imported on line 380 (await import('exceljs')) to avoid loading it at startup when Excel search isn't needed. PizZip should follow the same pattern to keep startup costs consistent and avoid loading the ZIP library unconditionally.

♻️ Proposed fix — dynamic import inside searchDocxFiles

Remove line 8 and dynamically import inside searchDocxFiles:

-import PizZip from 'pizzip';

Then in searchDocxFiles (around line 560):

       try {
         const buf = await fs.readFile(filePath);
-        const zip = new PizZip(buf);
+        const PizZip = (await import('pizzip')).default;
+        const zip = new PizZip(buf);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/search-manager.ts` at line 8, Remove the top-level PizZip import and
instead dynamically import it inside searchDocxFiles just like ExcelJS is
imported; update searchDocxFiles to await import('pizzip') (e.g., const {
default: PizZip } = await import('pizzip')) before using PizZip so the ZIP
library is only loaded when docx search runs and startup costs remain minimal.

610-643: findDocxFiles duplicates findExcelFiles — consider extracting a shared directory walker.

Both methods have identical structure: recursive walk, skip hidden/node_modules dirs, filter by extension, handle single-file root paths. A shared findFilesByExtension(rootPath, extensionCheck) helper would eliminate the duplication.

The existing TODO on line 335 already acknowledges that file-type search should be modularized, so this can be deferred.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/search-manager.ts` around lines 610 - 643, The findDocxFiles method
duplicates logic from findExcelFiles; extract a shared helper like
findFilesByExtension(rootPath: string, isMatch: (name: string) => boolean) that
contains the recursive walk, hidden/node_modules skip, and single-file root
handling, then refactor findDocxFiles and findExcelFiles to call that helper
(use the existing isDocx and corresponding isExcel predicates when invoking the
new function).

575-600: DOCX text search only matches within individual <w:t> elements — cross-run patterns will be missed.

Word frequently splits logical text across multiple <w:t> runs (e.g., due to spell-check, formatting, or revision marks). A search for "Hello World" won't match if the DOCX stores <w:t>Hello </w:t> and <w:t>World</w:t> in separate runs.

This is acceptable for a first implementation, but worth noting as a known limitation — especially since the Excel search concatenates all cell values per row (line 425) to enable cross-cell matching.

Comment thread src/tools/edit.ts Outdated
Comment thread src/utils/files/docx.ts
Comment thread src/utils/files/docx.ts
Comment on lines +569 to +606
async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> {
if (mode === 'append') {
throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.');
}

const text = typeof content === 'string' ? content : String(content);
const lines = text.split('\n');

// Build paragraph XML from lines
const paragraphs: string[] = [];
for (const line of lines) {
const headingMatch = line.match(/^(#{1,6})\s+(.+)/);
if (headingMatch) {
const level = headingMatch[1].length;
const headingText = headingMatch[2];
paragraphs.push(
`<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` +
`<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>`
);
} else if (line.trim() === '') {
paragraphs.push(`<w:p/>`);
} else {
paragraphs.push(
`<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>`
);
}
}

const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` +
`<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` +
`<w:body>${paragraphs.join('')}` +
`<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` +
`</w:body></w:document>`;

const zip = createMinimalDocxZip(docXml);
const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } });
await fs.writeFile(path, buf);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Heading levels 4–6 produce style references (Heading4Heading6) not defined in the generated styles.xml.

The heading regex on line 580 matches up to #{1,6}, but createMinimalDocxZip (line 466–477) only defines Heading1Heading3 styles. Using ####, #####, or ###### in content will create <w:pStyle w:val="Heading4"/> etc., referencing non-existent styles.

Word typically degrades gracefully (falls back to Normal), but it may show warnings or produce inconsistent formatting.

🔧 Option: Cap heading level at 3
-            const headingMatch = line.match(/^(#{1,6})\s+(.+)/);
+            const headingMatch = line.match(/^(#{1,3})\s+(.+)/);

Or add Heading4Heading6 styles to createMinimalDocxZip.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> {
if (mode === 'append') {
throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.');
}
const text = typeof content === 'string' ? content : String(content);
const lines = text.split('\n');
// Build paragraph XML from lines
const paragraphs: string[] = [];
for (const line of lines) {
const headingMatch = line.match(/^(#{1,6})\s+(.+)/);
if (headingMatch) {
const level = headingMatch[1].length;
const headingText = headingMatch[2];
paragraphs.push(
`<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` +
`<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>`
);
} else if (line.trim() === '') {
paragraphs.push(`<w:p/>`);
} else {
paragraphs.push(
`<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>`
);
}
}
const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` +
`<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` +
`<w:body>${paragraphs.join('')}` +
`<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` +
`</w:body></w:document>`;
const zip = createMinimalDocxZip(docXml);
const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } });
await fs.writeFile(path, buf);
}
async write(path: string, content: any, mode?: 'rewrite' | 'append'): Promise<void> {
if (mode === 'append') {
throw new Error('DOCX append not supported. Use edit_block to modify existing DOCX files.');
}
const text = typeof content === 'string' ? content : String(content);
const lines = text.split('\n');
// Build paragraph XML from lines
const paragraphs: string[] = [];
for (const line of lines) {
const headingMatch = line.match(/^(#{1,3})\s+(.+)/);
if (headingMatch) {
const level = headingMatch[1].length;
const headingText = headingMatch[2];
paragraphs.push(
`<w:p><w:pPr><w:pStyle w:val="Heading${level}"/></w:pPr>` +
`<w:r><w:t>${escapeXml(headingText)}</w:t></w:r></w:p>`
);
} else if (line.trim() === '') {
paragraphs.push(`<w:p/>`);
} else {
paragraphs.push(
`<w:p><w:r><w:t xml:space="preserve">${escapeXml(line)}</w:t></w:r></w:p>`
);
}
}
const docXml = `<?xml version="1.0" encoding="UTF-8" standalone="yes"?>` +
`<w:document xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main">` +
`<w:body>${paragraphs.join('')}` +
`<w:sectPr><w:pgSz w:w="12240" w:h="15840"/><w:pgMar w:top="1440" w:right="1440" w:bottom="1440" w:left="1440"/></w:sectPr>` +
`</w:body></w:document>`;
const zip = createMinimalDocxZip(docXml);
const buf = zip.generate({ type: 'nodebuffer', compression: 'DEFLATE', compressionOptions: { level: 6 } });
await fs.writeFile(path, buf);
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/files/docx.ts` around lines 569 - 606, The write method in docx.ts
generates Heading1–Heading6 paragraph styles from the /^(#{1,6})\s+(.+)/ regex
but createMinimalDocxZip only defines Heading1–Heading3, causing references to
Heading4–6 that don't exist; fix by either capping detected heading levels to 3
in write (map any level >3 to Heading3 before emitting w:pStyle) or by adding
Heading4, Heading5 and Heading6 style definitions inside createMinimalDocxZip so
styles.xml includes those entries, and ensure the unique symbols referenced are
write and createMinimalDocxZip.

Comment thread src/utils/files/docx.ts Outdated
Comment on lines +750 to +758
return {
size: stats.size,
created: stats.birthtime,
modified: stats.mtime,
accessed: stats.atime,
isDirectory: false,
isFile: true,
permissions: (stats.mode & 0o777).toString(8),
fileType: 'docx' as any,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

# Check if base.ts exists and view FileInfo type definition
if [ -f "src/utils/files/base.ts" ]; then
  echo "=== FileInfo type in base.ts ===" 
  rg -n "fileType" src/utils/files/base.ts -B2 -A2
else
  echo "base.ts not found"
fi

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 253


🏁 Script executed:

# View the actual code in docx.ts at lines 750-758
if [ -f "src/utils/files/docx.ts" ]; then
  echo "=== docx.ts lines 750-758 ==="
  sed -n '745,760p' src/utils/files/docx.ts
else
  echo "docx.ts not found"
fi

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 562


🏁 Script executed:

# Search for fileType usage patterns in the codebase
echo "=== All fileType references ==="
rg -n "fileType" --type=ts -B1 -A1

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 9275


🏁 Script executed:

# Check PreviewFileType definition in types.ts
echo "=== PreviewFileType in types.ts ==="
rg -n "PreviewFileType" src/types.ts -B3 -A3

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 605


🏁 Script executed:

# Search for switch/case or pattern matching on FileInfo.fileType
echo "=== Switch/case/pattern matching on fileType from FileInfo ==="
rg -n "switch.*fileType|case.*['\"](?:text|excel|image|binary)" --type=ts src/utils/files/ src/tools/

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 138


🏁 Script executed:

# Check if docx is handled elsewhere or expected to be a valid type
echo "=== References to 'docx' fileType ==="
rg -n "docx" --type=ts -i | grep -i "file\|type"

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 6194


🏁 Script executed:

# Check PreviewFileType definition
if [ -f "src/ui/file-preview/shared/preview-file-types.js" ]; then
  echo "=== PreviewFileType definition ==="
  cat src/ui/file-preview/shared/preview-file-types.js
else
  echo "File not found, searching for preview-file-types"
  find . -name "*preview-file-types*" -type f
fi

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 175


🏁 Script executed:

# Check how FileInfo is converted to FilePreviewStructuredContent
echo "=== Mapping from FileInfo to FilePreviewStructuredContent ==="
rg -n "FilePreviewStructuredContent|FileInfo" src/handlers/filesystem-handlers.ts -B2 -A2 | head -100

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 705


🏁 Script executed:

# Verify if docx fileType is ever actually used in a way that would break
echo "=== Check downstream usage of fileType property ==="
rg -n "\.fileType" src/ --type=ts | head -50

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 2288


🏁 Script executed:

# Check the actual PreviewFileType definition
echo "=== PreviewFileType definition ==="
cat src/ui/file-preview/shared/preview-file-types.ts

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 1604


🏁 Script executed:

# Check how FileInfo.fileType is converted/mapped to FilePreviewStructuredContent.fileType
echo "=== Check conversion logic from FileInfo to FilePreviewStructuredContent ==="
rg -n "fileType:" src/handlers/filesystem-handlers.ts -B5 -A5

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 1564


🏁 Script executed:

# Check if 'docx' value ever appears as FilePreviewStructuredContent fileType
echo "=== Look for 'docx' in FilePreviewStructuredContent creation ==="
rg -n "previewFileType|FilePreviewStructuredContent" src/ --type=ts -A3 | grep -E "docx|fileType" | head -30

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 529


🏁 Script executed:

# Let's verify the actual flow - does getFileInfo output get used anywhere with switch statements?
echo "=== Trace FileInfo usage in filesystem.ts and tools ==="
rg -n "getFileInfo|fileInfo.fileType" src/tools/filesystem.ts src/handlers/ --type=ts -B2 -A2 | head -80

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 1403


🏁 Script executed:

# Check if docx.ts fileInfo is used downstream at all
echo "=== Check where getFileInfo is called and how result is used ==="
rg -n "await getFileInfo|await.*getFileInfo\|getFileInfo\(" src/ --type=ts -B1 -A3 | head -100

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 551


🏁 Script executed:

# Let's see the full context of how the FileInfo result is used
echo "=== Full handleGetFileInfo function ==="
sed -n '390,410p' src/handlers/filesystem-handlers.ts

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 809


🏁 Script executed:

# Double-check if FileInfo type is ever actually checked with pattern matching
echo "=== Search for any conditional logic on fileType values ==="
rg -n "fileType.*==|fileType.*===|case.*['\"]text['\"]|case.*['\"]excel['\"]|case.*['\"]binary['\"]|case.*['\"]image['\"]" --type=ts

Repository: wonderwhy-er/DesktopCommanderMCP

Length of output: 1716


Remove as any bypass by extending the FileInfo union type.

fileType: 'docx' as any violates the FileInfo.fileType contract defined in base.ts line 218. While the architectural separation (FileInfo is independent from PreviewFileType used in the UI) prevents immediate downstream breakage, this type safety violation creates a maintenance risk and should be corrected.

Either extend the union in src/utils/files/base.ts:

-    fileType: 'text' | 'excel' | 'image' | 'binary';
+    fileType: 'text' | 'excel' | 'image' | 'binary' | 'docx';

Then remove the as any cast in docx.ts:

-            fileType: 'docx' as any,
+            fileType: 'docx',

Alternatively, map to 'binary' if docx should not be a distinct type category.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/files/docx.ts` around lines 750 - 758, The returned object in
docx.ts currently sets fileType: 'docx' as any which bypasses the FileInfo type;
update the FileInfo union in base.ts (the FileInfo type / fileType property) to
include 'docx' as a valid member, then remove the cast in the object returned by
the function in src/utils/files/docx.ts (the return that currently sets
fileType) so it reads fileType: 'docx' without any casting; alternatively, if
DOCX should be categorized under an existing type, change the returned fileType
to that existing union value (for example 'binary') and remove the as any cast.

Comment thread src/search-manager.ts
Comment on lines +565 to +566
const xmlParts = ['word/document.xml', 'word/header1.xml', 'word/header2.xml',
'word/header3.xml', 'word/footer1.xml', 'word/footer2.xml', 'word/footer3.xml'];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: The DOCX search only checks a hard-coded set of header and footer XML parts (header1-3.xml, footer1-3.xml), so any text in additional headers/footers like header4.xml or footer5.xml will never be found, even though the DOCX outline and metadata logic handle all header/footer parts dynamically; this causes real content in some documents to be silently missed by start_search. [logic error]

Severity Level: Major ⚠️
- ❌ start_search misses text in DOCX section headers.
- ⚠️ Multi-section Word documents return incomplete search results.
- ⚠️ Inconsistency between DOCX read_file outline and search coverage.
Suggested change
const xmlParts = ['word/document.xml', 'word/header1.xml', 'word/header2.xml',
'word/header3.xml', 'word/footer1.xml', 'word/footer2.xml', 'word/footer3.xml'];
const xmlParts: string[] = ['word/document.xml'];
const zipFiles = zip.files;
for (const relativePath of Object.keys(zipFiles)) {
if (
(relativePath.startsWith('word/header') || relativePath.startsWith('word/footer')) &&
relativePath.endsWith('.xml')
) {
xmlParts.push(relativePath);
}
}
Steps of Reproduction ✅
1. From the MCP client, call the `start_search` tool, which is handled by
`handleStartSearch()` in `src/handlers/search-handlers.ts:13-35`. Pass `searchType:
"content"`, `pattern: "UniqueHeaderText"`, `path: "/some/dir"`, and `filePattern:
"*.docx"` so that content search is started against DOCX files.

2. Ensure `/some/dir` contains a DOCX file (e.g., `multi-section.docx`) whose body text
does NOT contain `"UniqueHeaderText"`, but whose additional section header (e.g.,
`word/header4.xml` or `word/header5.xml`) does contain `"UniqueHeaderText"`. Word will
create these additional header parts when the document has different headers in later
sections.

3. In `SearchManager.startSearch()` at `src/search-manager.ts:59-218`, the call is routed
to `this.searchDocxFiles(...)` (lines 175-194) because `searchType === 'content'` and
`this.shouldIncludeDocxSearch(options.filePattern, validPath)` (lines 495-517) returns
true for `filePattern: "*.docx"` and the validated root path.

4. Inside `searchDocxFiles()` at `src/search-manager.ts:523-607`, the DOCX is opened and
only the hard-coded XML parts listed in `xmlParts = ['word/document.xml',
'word/header1.xml', 'word/header2.xml', 'word/header3.xml', 'word/footer1.xml',
'word/footer2.xml', 'word/footer3.xml'];` (lines 564-566) are scanned. Because
`header4.xml` and above are never iterated in the `for (const xmlPath of xmlParts)` loop
at line 568, `"UniqueHeaderText"` in those later headers is never matched, so the
`start_search` response (initial result from `handleStartSearch` or later from
`handleGetMoreSearchResults` in `src/handlers/search-handlers.ts:85-147`) shows no
results, even though the DOCX actually contains the text.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/search-manager.ts
**Line:** 565:566
**Comment:**
	*Logic Error: The DOCX search only checks a hard-coded set of header and footer XML parts (`header1-3.xml`, `footer1-3.xml`), so any text in additional headers/footers like `header4.xml` or `footer5.xml` will never be found, even though the DOCX outline and metadata logic handle all header/footer parts dynamically; this causes real content in some documents to be silently missed by `start_search`.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai

codeant-ai Bot commented Feb 16, 2026

Copy link
Copy Markdown
Contributor

CodeAnt AI finished reviewing your PR.

@wonderwhy-er

Copy link
Copy Markdown
Owner Author

@scutuatua-crypto can you share what and how you reviewed and tested?

edgarsskore and others added 4 commits February 19, 2026 16:15
- Add 'docx' to FileInfo.fileType union in base.ts; remove `as any` cast in DocxFileHandler.getInfo
- Fix splitTopLevelElements infinite loop: indexOf returning -1 would set i=0 and restart the loop; now bails out with break on malformed XML in all three tag branches (closing, processing-instruction, opening/self-closing)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Relax validation so new_string can be an empty string (delete the matched
text), while old_string must still be non-empty.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ispatch

- Validate path and resolve file handler once at the top instead of
  duplicating validatePath + getFileHandler in each branch
- Compute hasEditRange once and reuse across both dispatch paths
- Remove hardcoded .endsWith('.docx') extension check — DocxFileHandler
  is picked up automatically via hasEditRange like any other handler
- Path 1 (range + content): Excel and any future range-based handlers
- Path 2 (old_string + new_string): handlers with editRange own their
  own text-replacement (DOCX); plain text falls through to performSearchReplace

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@codeant-ai

codeant-ai Bot commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

CodeAnt AI is running Incremental review


Thanks for using CodeAnt! 🎉

We're free for open-source projects. if you're enjoying it, help us grow by sharing.

Share on X ·
Reddit ·
LinkedIn

@codeant-ai codeant-ai Bot added size:XXL This PR changes 1000+ lines, ignoring generated files and removed size:XXL This PR changes 1000+ lines, ignoring generated files labels Feb 19, 2026
Comment thread src/tools/edit.ts
Comment on lines 391 to +431

// Parse content if it's a JSON string (AI often sends arrays as JSON strings)
let content = parsed.content;
if (typeof content === 'string') {
try {
content = JSON.parse(content);
} catch {
// Leave as-is if not valid JSON - let handler decide
}
// Path 1: Range rewrite (Excel, etc.) — range + content
if (hasRange && hasContent) {
// Parse content if it's a JSON string (AI often sends arrays as JSON strings)
let content = parsed.content;
if (typeof content === 'string') {
try {
content = JSON.parse(content);
} catch {
// Leave as-is if not valid JSON - let handler decide
}
}

// Check if handler supports range editing
if ('editRange' in handler && typeof handler.editRange === 'function') {
if (hasEditRange) {
try {
// parsed.range is guaranteed non-empty string by hasRange check above
await handler.editRange(validatedPath, parsed.range!, content, parsed.options);
await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
content: [{
type: "text",
text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
}],
};
} else {
return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}

return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
}

// Text files: String replacement
// Validate required parameters for text replacement
// Path 2: Text replacement — old_string + new_string
if (parsed.old_string === undefined || parsed.new_string === undefined) {
return createErrorResponse(`Text replacement requires both old_string and new_string parameters`);
}

const searchReplace = {
// If the handler implements editRange it owns text-replacement for its file type
// (e.g. DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes).
// Plain text files fall through to performSearchReplace.
if (hasEditRange) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: Text-replacement requests for structured files that implement editRange (notably Excel) will now incorrectly be routed through editRange with an empty range and a DOCX-style { old_string, new_string } payload, even though editRange is intended only for range rewrites on those types; this can cause Excel edits to behave unexpectedly or fail, and silently changes the previous behavior where such calls fell back to plain text search/replace. Restricting the editRange-based text path to DOCX files only restores the intended contract (Excel uses range+content only, DOCX uses editRange for XML-aware text edits) while preserving the new DOCX functionality. [logic error]

Severity Level: Major ⚠️
- ⚠️ Excel edit_block text replacements always error with 2D-array message.
- ⚠️ Structured handlers cannot support simple text find/replace semantics.
- ⚠️ Client behavior diverges from documented text-edit expectations.
- ⚠️ Future editRange handlers risk same misrouting for text edits.
Suggested change
// Parse content if it's a JSON string (AI often sends arrays as JSON strings)
let content = parsed.content;
if (typeof content === 'string') {
try {
content = JSON.parse(content);
} catch {
// Leave as-is if not valid JSON - let handler decide
}
// Path 1: Range rewrite (Excel, etc.) — range + content
if (hasRange && hasContent) {
// Parse content if it's a JSON string (AI often sends arrays as JSON strings)
let content = parsed.content;
if (typeof content === 'string') {
try {
content = JSON.parse(content);
} catch {
// Leave as-is if not valid JSON - let handler decide
}
}
// Check if handler supports range editing
if ('editRange' in handler && typeof handler.editRange === 'function') {
if (hasEditRange) {
try {
// parsed.range is guaranteed non-empty string by hasRange check above
await handler.editRange(validatedPath, parsed.range!, content, parsed.options);
await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
content: [{
type: "text",
text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
}],
};
} else {
return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}
return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
}
// Text files: String replacement
// Validate required parameters for text replacement
// Path 2: Text replacement — old_string + new_string
if (parsed.old_string === undefined || parsed.new_string === undefined) {
return createErrorResponse(`Text replacement requires both old_string and new_string parameters`);
}
const searchReplace = {
// If the handler implements editRange it owns text-replacement for its file type
// (e.g. DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes).
// Plain text files fall through to performSearchReplace.
if (hasEditRange) {
const isDocx = parsed.file_path.toLowerCase().endsWith('.docx');
// Path 1: Range rewrite (Excel, etc.) — range + content
if (hasRange && hasContent) {
// Parse content if it's a JSON string (AI often sends arrays as JSON strings)
let content = parsed.content;
if (typeof content === 'string') {
try {
content = JSON.parse(content);
} catch {
// Leave as-is if not valid JSON - let handler decide
}
}
if (hasEditRange) {
try {
// parsed.range is guaranteed non-empty string by hasRange check above
await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
content: [{
type: "text",
text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
}],
};
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}
}
return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
}
// Path 2: Text replacement — old_string + new_string
if (parsed.old_string === undefined || parsed.new_string === undefined) {
return createErrorResponse(`Text replacement requires both old_string and new_string parameters`);
}
// If the handler implements editRange it owns text-replacement for DOCX files
// (DocxFileHandler does find/replace on pretty-printed XML rather than raw bytes).
// Other file types (including Excel) fall through to performSearchReplace.
if (hasEditRange && isDocx) {
Steps of Reproduction ✅
1. Start the MCP server and ensure it exposes the `edit_block` command via
`handleEditBlock` re-exported in `src/handlers/edit-search-handlers.ts:1-13`.

2. Create or reuse an Excel file (e.g., `EDIT_EXCEL` in
`test/test-excel-files.js:211-218`, which is a `.xlsx` created via `writeFile`).

3. From any client (or by modifying `test/test-excel-files.js`), call `handleEditBlock`
with that `.xlsx` file path, *without* `range` or `content`, but with `old_string` and
`new_string`, for example:

   - `handleEditBlock({ file_path: EDIT_EXCEL, old_string: 'Apple', new_string: 'Pear' })`

   This mirrors the existing call pattern in `test/test-excel-files.js:221-226` but omits
   `range`/`content` and supplies text parameters instead.

4. The call reaches `handleEditBlock` in `src/tools/edit.ts:371-459`, where:

   - `hasRange` and `hasContent` are false, so Path 1 is skipped.

   - `validatedPath` is resolved and `handler` is obtained from `getFileHandler`, which
   returns an `ExcelFileHandler` for `.xlsx` files (validated by
   `test/test-excel-files.js:81-87`).

   - `hasEditRange` is true for `ExcelFileHandler` (`src/utils/files/excel.ts:158-236`
   implements `editRange`).

   - In the text-replacement path at `src/tools/edit.ts:423-454`, the code calls
   `handler.editRange!(validatedPath!, '', { old_string, new_string, expected_replacements
   })`.

5. Inside `ExcelFileHandler.editRange` (`src/utils/files/excel.ts:158-236`), the method
validates `content` with `if (!Array.isArray(content)) { throw new Error('Content must be
a 2D array for range editing'); }` at lines 170-173; since the payload is an object `{
old_string, new_string, ... }` instead of a 2D array, this throws.

6. The thrown error is caught back in `handleEditBlock` at `src/tools/edit.ts:404-417`,
and `createErrorResponse` is returned to the client with the message `"Content must be a
2D array for range editing"`, instead of performing any text search/replace or falling
through to `performSearchReplace`.

7. This misrouting will occur for any file type whose handler implements `editRange` but
expects structured range edits (currently `ExcelFileHandler`, per
`src/utils/files/base.ts:31-41` and `src/utils/files/excel.ts:33-40`), whenever a caller
issues a text-style `edit_block` request (old_string/new_string only) against such a file.
Prompt for AI Agent 🤖
This is a comment left during a code review.

**Path:** src/tools/edit.ts
**Line:** 391:431
**Comment:**
	*Logic Error: Text-replacement requests for structured files that implement `editRange` (notably Excel) will now incorrectly be routed through `editRange` with an empty range and a DOCX-style `{ old_string, new_string }` payload, even though `editRange` is intended only for range rewrites on those types; this can cause Excel edits to behave unexpectedly or fail, and silently changes the previous behavior where such calls fell back to plain text search/replace. Restricting the `editRange`-based text path to DOCX files only restores the intended contract (Excel uses range+content only, DOCX uses `editRange` for XML-aware text edits) while preserving the new DOCX functionality.

Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.
👍 | 👎

@codeant-ai

codeant-ai Bot commented Feb 19, 2026

Copy link
Copy Markdown
Contributor

CodeAnt AI Incremental review completed.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
src/tools/edit.ts (1)

456-459: Pass validatedPath to performSearchReplace to avoid re-validating the same path.

validatedPath is already resolved on line 382. Passing parsed.file_path causes performSearchReplacevalidatePathwriteFilevalidatePath to re-run the same validation three times for a single edit.

♻️ Proposed fix
-    return performSearchReplace(parsed.file_path, {
+    return performSearchReplace(validatedPath, {
         search: parsed.old_string,
         replace: parsed.new_string
     }, parsed.expected_replacements);
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/edit.ts` around lines 456 - 459, The call to performSearchReplace
currently passes parsed.file_path causing redundant validations; instead pass
the already-resolved validatedPath variable to avoid repeated
validatePath/writeFile validatePath cycles. Update the call site that invokes
performSearchReplace to use validatedPath (instead of parsed.file_path) so
performSearchReplace, validatePath, and writeFile operate on the pre-validated
path.
src/utils/files/docx.ts (1)

629-632: || operator silently bypasses an explicit empty new_string — use ?? instead.

newStr = content.newStr || content.new_string || content.replace || '';

|| treats '' as falsy, so new_string: '' (a deletion) falls through to content.replace. In the current call path from edit.ts content.replace is always undefined, so the final value happens to be '' — correct only by coincidence. If a future caller passes { new_string: '', replace: 'fallback' }, the deletion intent is silently discarded.

The oldStr line has the same pattern, though empty old_string is prevented by schema validation.

♻️ Proposed fix — use nullish coalescing
-            oldStr = content.oldStr || content.old_string || content.search || '';
-            newStr = content.newStr || content.new_string || content.replace || '';
+            oldStr = content.oldStr ?? content.old_string ?? content.search ?? '';
+            newStr = content.newStr ?? content.new_string ?? content.replace ?? '';
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/files/docx.ts` around lines 629 - 632, The object field fallbacks
for oldStr, newStr and expectedReplacements use the || operator which treats
empty strings and 0 as falsy and can silently override explicit values; update
those fallbacks to use nullish coalescing (??) instead so explicit empty_string
or 0 are preserved — specifically change the expressions that set oldStr, newStr
and expectedReplacements (the code that reads content.oldStr /
content.old_string / content.search and content.newStr / content.new_string /
content.replace and content.expectedReplacements /
content.expected_replacements) to use ?? with the same final defaults.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/edit.ts`:
- Around line 404-418: The range-edit branch currently awaits handler.editRange!
but discards its EditResult, always returning a success message; update the
hasEditRange branch in the function handling edits to capture the result from
handler.editRange!(validatedPath!, parsed.range!, content, parsed.options), then
check the returned EditResult (e.g., result.success): if false, return
createErrorResponse with the result.errors (or a joined message) so failures
from DocxFileHandler.editRange propagate, otherwise return the existing success
content; keep the existing try/catch to handle true exceptions and convert them
to createErrorResponse as before.

In `@src/utils/files/docx.ts`:
- Around line 312-344: The loop in extractNestedElements can infinite-loop when
xml.indexOf('>', openPos) returns -1 (malformed XML); add the same safeguard
used in splitTopLevelElements: if nextClose === -1 then break out of the outer
loop (or set searchFrom to xml.length) before proceeding to the nesting logic so
pos doesn't become 0 and re-find the same openPos; update the block that
computes nextClose (and references openPos/nextClose) to bail when no '>' is
found.

---

Duplicate comments:
In `@src/utils/files/docx.ts`:
- Around line 572-598: The write method emits Heading1–Heading6 (via the regex
in write) but createMinimalDocxZip only defines Heading1–Heading3, causing
undefined style references; either map any heading levels >3 to "Heading3"
inside write (e.g., clamp level = Math.min(level, 3) before emitting w:pStyle)
or update createMinimalDocxZip to include style definitions for
Heading4–Heading6 in styles.xml so Heading4/5/6 exist; locate the write function
and createMinimalDocxZip to implement one of these fixes and ensure escapeXml
usage remains unchanged.

---

Nitpick comments:
In `@src/tools/edit.ts`:
- Around line 456-459: The call to performSearchReplace currently passes
parsed.file_path causing redundant validations; instead pass the
already-resolved validatedPath variable to avoid repeated validatePath/writeFile
validatePath cycles. Update the call site that invokes performSearchReplace to
use validatedPath (instead of parsed.file_path) so performSearchReplace,
validatePath, and writeFile operate on the pre-validated path.

In `@src/utils/files/docx.ts`:
- Around line 629-632: The object field fallbacks for oldStr, newStr and
expectedReplacements use the || operator which treats empty strings and 0 as
falsy and can silently override explicit values; update those fallbacks to use
nullish coalescing (??) instead so explicit empty_string or 0 are preserved —
specifically change the expressions that set oldStr, newStr and
expectedReplacements (the code that reads content.oldStr / content.old_string /
content.search and content.newStr / content.new_string / content.replace and
content.expectedReplacements / content.expected_replacements) to use ?? with the
same final defaults.

Comment thread src/tools/edit.ts
Comment on lines +404 to 418
if (hasEditRange) {
try {
// parsed.range is guaranteed non-empty string by hasRange check above
await handler.editRange(validatedPath, parsed.range!, content, parsed.options);
await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
content: [{
type: "text",
text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
}],
};
} else {
return createErrorResponse(`Range-based editing not supported for ${parsed.file_path}. For text files, use old_string and new_string parameters instead. If your client requires range/content parameters, set them to empty strings ("").`);
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}
} catch (error) {
const errorMessage = error instanceof Error ? error.message : String(error);
return createErrorResponse(errorMessage);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

editRange return value discarded in the range-rewrite path — failures silently reported as success.

await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
return {
    content: [{ type: "text", text: `Successfully updated range ${parsed.range} ...` }],
};

DocxFileHandler.editRange never throws — it returns { success: false, errors: [...] } for all error conditions (missing old_string, no match, etc.). If a caller uses range + content syntax on a .docx file, the EditResult is discarded and the caller unconditionally receives a success message regardless of the actual outcome. Errors are visible only in Path 2 (old_string/new_string) where the result is checked.

🐛 Proposed fix — propagate EditResult failure
-            await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
-            return {
-                content: [{
-                    type: "text",
-                    text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
-                }],
-            };
+            const rangeResult = await handler.editRange!(validatedPath!, parsed.range!, content, parsed.options);
+            if (rangeResult && !rangeResult.success) {
+                const errorMsg = rangeResult.errors?.map(e => e.error).join('; ') || 'Unknown error';
+                return createErrorResponse(errorMsg);
+            }
+            return {
+                content: [{
+                    type: "text",
+                    text: `Successfully updated range ${parsed.range} in ${parsed.file_path}`
+                }],
+            };
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/edit.ts` around lines 404 - 418, The range-edit branch currently
awaits handler.editRange! but discards its EditResult, always returning a
success message; update the hasEditRange branch in the function handling edits
to capture the result from handler.editRange!(validatedPath!, parsed.range!,
content, parsed.options), then check the returned EditResult (e.g.,
result.success): if false, return createErrorResponse with the result.errors (or
a joined message) so failures from DocxFileHandler.editRange propagate,
otherwise return the existing success content; keep the existing try/catch to
handle true exceptions and convert them to createErrorResponse as before.

Comment thread src/utils/files/docx.ts
Comment on lines +312 to +344
// Check for self-closing
const nextClose = xml.indexOf('>', openPos);
if (nextClose !== -1 && xml[nextClose - 1] === '/') {
results.push(xml.substring(openPos, nextClose + 1));
searchFrom = nextClose + 1;
continue;
}

// Find matching close tag respecting nesting
let depth = 1;
let pos = nextClose + 1;
while (depth > 0 && pos < xml.length) {
const nextOpen = xml.indexOf(openTag, pos);
const nextCloseTag = xml.indexOf(closeTag, pos);

if (nextCloseTag === -1) break; // malformed XML

if (nextOpen !== -1 && nextOpen < nextCloseTag) {
// Check it's actually an open tag
const ca = xml[nextOpen + openTag.length];
if (ca === ' ' || ca === '>' || ca === '/') {
depth++;
}
pos = nextOpen + openTag.length;
} else {
depth--;
if (depth === 0) {
results.push(xml.substring(openPos, nextCloseTag + closeTag.length));
}
pos = nextCloseTag + closeTag.length;
}
}
searchFrom = pos;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Potential infinite loop in extractNestedElements when > is missing — same class of bug as the fixed splitTopLevelElements.

Line 313 assigns nextClose = xml.indexOf('>', openPos). If this returns -1 (malformed XML), the self-closing branch on line 314 is skipped (short-circuit on nextClose !== -1), and line 322 evaluates to pos = -1 + 1 = 0. The inner while loop then re-searches from position 0 with depth = 1; if the close tag is also absent (nextCloseTag === -1), the inner loop breaks and searchFrom = pos = 0, so the outer loop finds the same openPos again — infinite loop.

splitTopLevelElements had an identical issue and was already fixed (guarded with if (... === -1) break); the same guard is missing here.

🐛 Proposed fix — bail out when no `>` is found
         // Check for self-closing
         const nextClose = xml.indexOf('>', openPos);
+        if (nextClose === -1) { searchFrom = xml.length; continue; } // malformed — bail out
         if (nextClose !== -1 && xml[nextClose - 1] === '/') {
             results.push(xml.substring(openPos, nextClose + 1));
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/files/docx.ts` around lines 312 - 344, The loop in
extractNestedElements can infinite-loop when xml.indexOf('>', openPos) returns
-1 (malformed XML); add the same safeguard used in splitTopLevelElements: if
nextClose === -1 then break out of the outer loop (or set searchFrom to
xml.length) before proceeding to the nesting logic so pos doesn't become 0 and
re-find the same openPos; update the block that computes nextClose (and
references openPos/nextClose) to bail when no '>' is found.

@edgarsskore edgarsskore merged commit 899879c into main Feb 19, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants