Skip to content

fix(content-router): token-measure lossless folds at the acceptance gate#1772

Merged
chopratejas merged 1 commit into
mainfrom
tejas/lossless-gate-byte-ratio
Jul 3, 2026
Merged

fix(content-router): token-measure lossless folds at the acceptance gate#1772
chopratejas merged 1 commit into
mainfrom
tejas/lossless-gate-byte-ratio

Conversation

@chopratejas

Copy link
Copy Markdown
Collaborator

Description

Unit-mismatch bug in the compression acceptance gate. router.apply() computes compression_ratio from len(text.split()) (word count), but a lossless search/log fold (compact_lossless) saves bytes by collapsing a repeated path prefix into a single heading — word count stays flat or even rises (the heading adds a word). So the gate saw ratio ≥ 1.0 and discarded every free, byte-recoverable win as ratio_too_high. (Raising the floor to 1.0 in #1771 did not fix this — the word-ratio was already ≥ 1.0.)

Measure lossless results (those whose strategy_chain carries a lossless_* entry) by byte ratio at the gate and in the result cache — the real saving. Lossy strategies are unchanged (word count tracks their token savings), and the reversibility gate is untouched (LOG/SEARCH/DIFF aren't in LOSSY_UNMARKED_STRATEGIES). The excluded-tool and bash-search paths already bypass this gate via continue; this fixes the main strategy dispatch (the lossless-mode LOG/SEARCH/DIFF path).

Follow-up to #1771.

Closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • At the apply() acceptance gate: compute accept_ratio = byte ratio for lossless results (strategy_chain has lossless_*), else the existing word ratio. Gate + result-cache entry now use accept_ratio.
  • Added an end-to-end regression test that drives the full router.apply() path.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Test Output

tests/test_lossless_mode.py::test_router_apply_accepts_lossless_search_byte_measured PASSED
tests/test_content_router_tool_role_reversibility.py .......... (10 passed)
# broader (pre-move) sweep on the same change:
tests/test_lossless_mode.py / test_transforms/test_content_router.py /
test_lossless_excluded_compaction.py / test_bash_search_lossless_fold.py — 121 passed
ruff check headroom/transforms/content_router.py  -> All checks passed!
mypy headroom/transforms/content_router.py         -> Success: no issues found

Real Behavior Proof

  • Environment: local worktree, Python 3.12, PYTHONPATH pinned to the branch.
  • Exact command / steps: new regression test constructs a single-file grep result, runs it through ContentRouter(lossless=True).apply(...), and asserts the tool output is byte-smaller and recovers exactly (search_unheading(out) == original).
  • Observed result: before this fix the fold was rejected (out == original, counted ratio_too_high); after, it's applied (len(out) < len(original), marker-free, byte-exact recovery). The test also asserts the fold's word count is ≥ the original's, so the test is meaningless if "fixed" by word count.
  • Not tested: no live end-to-end proxy run; validated via the full apply() path in unit tests.

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable (handled at release time)

Additional Notes

Why prior tests missed it: compress() and _apply_strategy_to_content return the folded result directly and never touch the apply() acceptance gate, so the existing lossless-mode unit tests (which call those) passed while the real proxy path silently discarded the fold. The new test exercises apply() end-to-end.

Unit mismatch: the apply() acceptance gate computes compression_ratio from
len(text.split()) (word count), but a lossless search/log fold cuts TOKENS by
collapsing a repeated path prefix into one heading — word count stays flat or
rises (the heading adds a word). So the gate saw ratio >= 1.0 and discarded
every free, recoverable win as ratio_too_high. Raising the floor to 1.0 (#1771)
didn't help; the word-ratio was already >= 1.0.

Measure lossless results (strategy_chain has a lossless_* entry) by REAL TOKEN
count via the tokenizer already in scope — not words, not bytes — so a fold is
accepted iff it genuinely reduces tokens. Gate + result cache use this ratio.
Lossy strategies are unchanged (word count tracks their savings) and the
reversibility gate is untouched (LOG/SEARCH/DIFF aren't lossy-unmarked). The
excluded and bash-search paths already bypass this gate; this fixes the main
strategy dispatch.

Regression test drives the full router.apply() path and asserts fewer TOKENS
(compress()/_apply_strategy_to_content bypass the gate, which is why prior unit
tests missed it).
@chopratejas chopratejas force-pushed the tejas/lossless-gate-byte-ratio branch from eec2f73 to 3e6cb34 Compare July 3, 2026 21:50
@chopratejas chopratejas changed the title fix(content-router): byte-measure lossless folds at the acceptance gate fix(content-router): token-measure lossless folds at the acceptance gate Jul 3, 2026
@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jul 3, 2026
@chopratejas chopratejas merged commit c5493ea into main Jul 3, 2026
28 of 31 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready for review Pull request body is complete and the author marked it ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant