Skip to content

feat(content-router): lossless-excluded compaction (grep/log/json) + enable in coding/general personas#1762

Open
chopratejas wants to merge 3 commits into
mainfrom
tejas/coding-persona-lossless
Open

feat(content-router): lossless-excluded compaction (grep/log/json) + enable in coding/general personas#1762
chopratejas wants to merge 3 commits into
mainfrom
tejas/coding-persona-lossless

Conversation

@chopratejas

@chopratejas chopratejas commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

Builds on the now-merged personas (#1732). Two pieces:

1. Lossless compaction for EXCLUDED tool output

Excluded tools (Read/Grep/Glob/Write/Edit) stay out of lossy compression, but their output is compacted by detected shape:

shape transform guarantee
grep (SEARCH) ripgrep --heading fold byte-lossless (search_unheading recovers)
log (BUILD_OUTPUT) ANSI strip + run-collapse byte-lossless modulo non-semantic ANSI
json whitespace-minify data-lossless (json.loads equal), NOT byte-exact

Source code + glob path-lists → verbatim. grep gated on _try_detect_search (the general/Magika classifier calls grep-over-code SOURCE_CODE and would miss it). Off by default (compact_excluded_lossless).

2. Enable it in the coding/general personas

compact_excluded_lossless=True on the coding + general profiles, threaded via proxy_env + proxy_pipeline_kwargs + a per-request ContentRouter.apply override. So HEADROOM_SAVINGS_PROFILE=coding auto-folds excluded grep/log/json.

Why

The coding persona was getting ~2.5% on OpenCode because its dominant traffic (Grep/Read) is excluded, and RTK (shell-only, lossy) never sees OpenCode's native tools. This recovers those savings losslessly.

Measured (end-to-end via coding-persona kwargs, real rg output)

41,589 → 26,562 chars (−36%), router:excluded:lossless_search, byte-recoverable.

Accuracy

grep/log = byte-lossless → edit-safe. json = data-lossless (edit-caveat for read-then-edit-JSON, documented). Read of source code → untouched (tested).

47 tests (personas + all three tiers + persona-enablement + end-to-end). ruff + mypy clean. No personas duplication — rebased onto main after #1732 landed. Supersedes #1755.

🤖 Generated with Claude Code

…ed grep output

Excluded tools (Read/Grep/Glob/Write/Edit) are protected from *lossy*
compression for accuracy. But grep-shaped output of an excluded tool can still
be search-folded (path:line:content -> ripgrep --heading form), which is
byte-recoverable: search_unheading reproduces the original exactly. Measured
~36% off real code-grep with zero information loss; Read/source and glob
path-lists pass through untouched (not search-shaped -> no-op).

Gated on the dedicated _try_detect_search detector rather than the general
classifier, which labels grep-over-a-codebase SOURCE_CODE (the matched lines
are code) and would otherwise reject the exact case this targets. Off by
default via ContentRouterConfig.compact_excluded_search.

Closes the OpenCode native-Grep savings gap that RTK (shell-only + lossy)
does not cover; the fold is lossless, so it is safe for the accuracy-first
coding workload.
… search/log/json

Extends the excluded-tool lossless compaction beyond grep, dispatched by
detected shape (excluded tools stay out of *lossy* compression for accuracy):

- SEARCH (grep) -> ripgrep --heading fold. Byte-lossless. Gated on
  _try_detect_search (the general/Magika classifier calls grep-over-code
  SOURCE_CODE and would miss it).
- LOG -> ANSI strip + run-collapse. Byte-lossless modulo non-semantic ANSI.
- JSON -> whitespace-minify. DATA-lossless (json.loads equals the original
  object) but NOT byte-exact — a read-then-Edit(old_string) on the same JSON
  file could miss; documented and gated.

Renames compact_excluded_search -> compact_excluded_lossless. Source code and
glob path-lists match nothing -> verbatim. 10 tests (byte-exact for search/log,
data-equal for json, no-op on source/glob, off-by-default). ruff + mypy clean.
…neral personas

Wire compact_excluded_lossless through the persona path: add it to
AgentSavingsProfile (default off), set it True for the coding + general
workload personas, emit it in proxy_env + proxy_pipeline_kwargs, and honor it
as a per-request override in ContentRouter.apply (_runtime_compact_excluded_lossless).

So HEADROOM_SAVINGS_PROFILE=coding now losslessly folds excluded grep/log
(byte-lossless) and minifies excluded json (data-lossless) — recovering the
Read/Grep-heavy savings the exclude list otherwise fully protects, with no
accuracy loss on the byte-lossless tiers. End-to-end: coding-persona kwargs
fold a real grep tool output 36% (router:excluded:lossless_search), recoverable.
@github-actions github-actions Bot added the status: has conflicts Pull request has merge conflicts with the base branch label Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR does not yet satisfy the required template fields:

  • Missing required section Description.
  • Missing required section Type of Change.
  • Missing required section Changes Made.
  • Missing required section Testing.
  • Missing required section Real Behavior Proof.
  • Missing required section Review Readiness.
  • Check I have performed a self-review before requesting human review.
  • Check This PR is ready for human review or convert the PR back to draft.

Please update the PR body, or move the PR back to draft while it is still in progress.

@github-actions github-actions Bot added the status: needs author action Pull request body or readiness checklist still needs author updates label Jul 3, 2026
@chopratejas chopratejas force-pushed the tejas/coding-persona-lossless branch from 48fc85c to 0d77e37 Compare July 3, 2026 14:36
@chopratejas chopratejas changed the title feat: coding/general personas + lossless-excluded compaction (grep/log/json), wired together feat(content-router): lossless-excluded compaction (grep/log/json) + enable in coding/general personas Jul 3, 2026
@github-actions github-actions Bot removed the status: has conflicts Pull request has merge conflicts with the base branch label Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs author action Pull request body or readiness checklist still needs author updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant