fix(content-router): token-measure lossless folds at the acceptance gate#1772
Merged
Conversation
Unit mismatch: the apply() acceptance gate computes compression_ratio from len(text.split()) (word count), but a lossless search/log fold cuts TOKENS by collapsing a repeated path prefix into one heading — word count stays flat or rises (the heading adds a word). So the gate saw ratio >= 1.0 and discarded every free, recoverable win as ratio_too_high. Raising the floor to 1.0 (#1771) didn't help; the word-ratio was already >= 1.0. Measure lossless results (strategy_chain has a lossless_* entry) by REAL TOKEN count via the tokenizer already in scope — not words, not bytes — so a fold is accepted iff it genuinely reduces tokens. Gate + result cache use this ratio. Lossy strategies are unchanged (word count tracks their savings) and the reversibility gate is untouched (LOG/SEARCH/DIFF aren't lossy-unmarked). The excluded and bash-search paths already bypass this gate; this fixes the main strategy dispatch. Regression test drives the full router.apply() path and asserts fewer TOKENS (compress()/_apply_strategy_to_content bypass the gate, which is why prior unit tests missed it).
eec2f73 to
3e6cb34
Compare
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Unit-mismatch bug in the compression acceptance gate.
router.apply()computescompression_ratiofromlen(text.split())(word count), but a lossless search/log fold (compact_lossless) saves bytes by collapsing a repeated path prefix into a single heading — word count stays flat or even rises (the heading adds a word). So the gate sawratio ≥ 1.0and discarded every free, byte-recoverable win asratio_too_high. (Raising the floor to 1.0 in #1771 did not fix this — the word-ratio was already ≥ 1.0.)Measure lossless results (those whose
strategy_chaincarries alossless_*entry) by byte ratio at the gate and in the result cache — the real saving. Lossy strategies are unchanged (word count tracks their token savings), and the reversibility gate is untouched (LOG/SEARCH/DIFFaren't inLOSSY_UNMARKED_STRATEGIES). The excluded-tool and bash-search paths already bypass this gate viacontinue; this fixes the main strategy dispatch (the lossless-modeLOG/SEARCH/DIFFpath).Follow-up to #1771.
Closes #
Type of Change
Changes Made
apply()acceptance gate: computeaccept_ratio= byte ratio for lossless results (strategy_chainhaslossless_*), else the existing word ratio. Gate + result-cache entry now useaccept_ratio.router.apply()path.Testing
pytest)ruff check)mypy headroom)Test Output
Real Behavior Proof
PYTHONPATHpinned to the branch.ContentRouter(lossless=True).apply(...), and asserts the tool output is byte-smaller and recovers exactly (search_unheading(out) == original).out == original, countedratio_too_high); after, it's applied (len(out) < len(original), marker-free, byte-exact recovery). The test also asserts the fold's word count is ≥ the original's, so the test is meaningless if "fixed" by word count.apply()path in unit tests.Review Readiness
Checklist
Additional Notes
Why prior tests missed it:
compress()and_apply_strategy_to_contentreturn the folded result directly and never touch theapply()acceptance gate, so the existing lossless-mode unit tests (which call those) passed while the real proxy path silently discarded the fold. The new test exercisesapply()end-to-end.