fix(content-router): token-measure lossless folds at the acceptance gate by chopratejas · Pull Request #1772 · headroomlabs-ai/headroom

chopratejas · 2026-07-03T21:44:53Z

Description

Unit-mismatch bug in the compression acceptance gate. router.apply() computes compression_ratio from len(text.split()) (word count), but a lossless search/log fold (compact_lossless) saves bytes by collapsing a repeated path prefix into a single heading — word count stays flat or even rises (the heading adds a word). So the gate saw ratio ≥ 1.0 and discarded every free, byte-recoverable win as ratio_too_high. (Raising the floor to 1.0 in #1771 did not fix this — the word-ratio was already ≥ 1.0.)

Measure lossless results (those whose strategy_chain carries a lossless_* entry) by byte ratio at the gate and in the result cache — the real saving. Lossy strategies are unchanged (word count tracks their token savings), and the reversibility gate is untouched (LOG/SEARCH/DIFF aren't in LOSSY_UNMARKED_STRATEGIES). The excluded-tool and bash-search paths already bypass this gate via continue; this fixes the main strategy dispatch (the lossless-mode LOG/SEARCH/DIFF path).

Follow-up to #1771.

Closes #

Type of Change

Bug fix (non-breaking change that fixes an issue)

Changes Made

At the apply() acceptance gate: compute accept_ratio = byte ratio for lossless results (strategy_chain has lossless_*), else the existing word ratio. Gate + result-cache entry now use accept_ratio.
Added an end-to-end regression test that drives the full router.apply() path.

Testing

Unit tests pass (pytest)
Linting passes (ruff check)
Type checking passes (mypy headroom)
New tests added for new functionality
Manual testing performed

Test Output

tests/test_lossless_mode.py::test_router_apply_accepts_lossless_search_byte_measured PASSED
tests/test_content_router_tool_role_reversibility.py .......... (10 passed)
# broader (pre-move) sweep on the same change:
tests/test_lossless_mode.py / test_transforms/test_content_router.py /
test_lossless_excluded_compaction.py / test_bash_search_lossless_fold.py — 121 passed
ruff check headroom/transforms/content_router.py  -> All checks passed!
mypy headroom/transforms/content_router.py         -> Success: no issues found

Real Behavior Proof

Environment: local worktree, Python 3.12, PYTHONPATH pinned to the branch.
Exact command / steps: new regression test constructs a single-file grep result, runs it through ContentRouter(lossless=True).apply(...), and asserts the tool output is byte-smaller and recovers exactly (search_unheading(out) == original).
Observed result: before this fix the fold was rejected (out == original, counted ratio_too_high); after, it's applied (len(out) < len(original), marker-free, byte-exact recovery). The test also asserts the fold's word count is ≥ the original's, so the test is meaningless if "fixed" by word count.
Not tested: no live end-to-end proxy run; validated via the full apply() path in unit tests.

Review Readiness

I have performed a self-review
This PR is ready for human review

Checklist

My code follows the project's style guidelines
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective
New and existing unit tests pass locally with my changes
I have updated the CHANGELOG.md if applicable (handled at release time)

Additional Notes

Why prior tests missed it: compress() and _apply_strategy_to_content return the folded result directly and never touch the apply() acceptance gate, so the existing lossless-mode unit tests (which call those) passed while the real proxy path silently discarded the fold. The new test exercises apply() end-to-end.

Unit mismatch: the apply() acceptance gate computes compression_ratio from len(text.split()) (word count), but a lossless search/log fold cuts TOKENS by collapsing a repeated path prefix into one heading — word count stays flat or rises (the heading adds a word). So the gate saw ratio >= 1.0 and discarded every free, recoverable win as ratio_too_high. Raising the floor to 1.0 (#1771) didn't help; the word-ratio was already >= 1.0. Measure lossless results (strategy_chain has a lossless_* entry) by REAL TOKEN count via the tokenizer already in scope — not words, not bytes — so a fold is accepted iff it genuinely reduces tokens. Gate + result cache use this ratio. Lossy strategies are unchanged (word count tracks their savings) and the reversibility gate is untouched (LOG/SEARCH/DIFF aren't lossy-unmarked). The excluded and bash-search paths already bypass this gate; this fixes the main strategy dispatch. Regression test drives the full router.apply() path and asserts fewer TOKENS (compress()/_apply_strategy_to_content bypass the gate, which is why prior unit tests missed it).

github-actions · 2026-07-03T21:52:08Z

PR governance

This PR follows the template and is marked ready for human review.

chopratejas requested review from DevanshiVyas and JerrettDavis as code owners July 3, 2026 21:44

chopratejas force-pushed the tejas/lossless-gate-byte-ratio branch from eec2f73 to 3e6cb34 Compare July 3, 2026 21:50

chopratejas changed the title ~~fix(content-router): byte-measure lossless folds at the acceptance gate~~ fix(content-router): token-measure lossless folds at the acceptance gate Jul 3, 2026

github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jul 3, 2026

chopratejas merged commit c5493ea into main Jul 3, 2026
28 of 31 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(content-router): token-measure lossless folds at the acceptance gate#1772

fix(content-router): token-measure lossless folds at the acceptance gate#1772
chopratejas merged 1 commit into
mainfrom
tejas/lossless-gate-byte-ratio

chopratejas commented Jul 3, 2026

Uh oh!

github-actions Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

chopratejas commented Jul 3, 2026

Description

Type of Change

Changes Made

Testing

Test Output

Real Behavior Proof

Review Readiness

Checklist

Additional Notes

Uh oh!

github-actions Bot commented Jul 3, 2026

PR governance

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant