fix(router): stop protecting passing build/test output as error traces#1740
Conversation
PR governanceThis PR does not yet satisfy the required template fields:
Please update the PR body, or move the PR back to draft while it is still in progress. |
content_has_strong_error_indicators() (error_detection.py) protects any message/content-block from compression when it contains 2+ distinct indicator keywords (error, fail, exception, traceback, fatal, panic, crash), on the theory that genuine failure output nearly always pairs two of these while benign mentions rarely do. That assumption breaks on ordinary passing build/test tool output: tsc's "Found 0 errors" plus a passing test run's "0 failures" trips both "error" and "fail" — 2 distinct hits — despite nothing failing. In a long JS/TS coding session this fires on nearly every request (confirmed against the stats.json attached to issue headroomlabs-ai#1696), permanently protecting large amounts of legitimate tool output from ever being compressed and explaining the reported 0.3% savings vs. the advertised 60-95%. Fix: strip common zero-result phrases ("0 errors", "no failures", etc.) before the keyword scan, so a clean tool run no longer masquerades as a failure trace. A genuine second distinct indicator elsewhere in the same text still triggers protection correctly. No prior test coverage existed for this function; added tests/test_error_detection.py covering real errors, single-keyword mentions, tsc/eslint passing summaries, and the case where a real error coexists with a stripped zero-result phrase in the same blob. Fixes headroomlabs-ai#1696.
5795fbc to
8413a5f
Compare
|
What happens when those keywords appear in prompts, code files or code blocks? like
|
|
To build on what @AbelVM is asking, what about other cases of other summary formats like these?
The current proposed fix looks to handle 0/no forms such as 0 errors and 0 failures, which would make sense for JavaScript/Typescript-style output, but I'm wondering if it would catch formats like X: 0 or X=0, would it still leave both fail and error visible to the scan? This seems like it would affect more than just JavaScript and TypeScript, too, since a good amount of test and CI tools report results in summary formats like those. It may be worth adding extra detection for broader coverage! :) |
Summary
content_has_strong_error_indicators()(headroom/transforms/error_detection.py) protects any message/content-block from compression when it contains 2+ distinct indicator keywords (error,fail,exception,traceback,fatal,panic,crash) — the assumption being genuine failure output nearly always pairs two of these, while benign mentions rarely do.tsc's"Found 0 errors"plus a passing test run's"0 failures"trips botherrorandfail— 2 distinct hits — despite nothing failing.stats.jsonattached to [BUG] [PROXY] Long multi-turns coding task: only 0.3% compression #1696 —router:protected:error_outputshows up in almost everytransforms_appliedlist in the request log), permanently protecting large amounts of legitimate tool output from ever being compressed. This explains the reported 0.3% savings vs. the advertised 60-95%."0 errors","no failures", etc.) before the keyword scan, so a clean tool run no longer masquerades as a failure trace. A genuine second distinct indicator elsewhere in the same text still triggers protection correctly.No prior test coverage existed for this function — added
tests/test_error_detection.py.Fixes #1696.
Investigation trail
Ruled out first: the proxy's
protect_recent_reads_fractiondataclass default (0.0) looked like a candidate but is a red herring for the proxy path —server.pyalready overrides it to0.3whenevermode="token"(the default). The real mechanism is the error-protection false-positive above, confirmed against the reporter's ownstats.json.Test plan
tests/test_error_detection.py— 5 tests (real error/traceback still flagged, single-keyword mention not flagged, passingtsc/eslintsummaries not flagged, real error not masked by a stripped zero-result phrase elsewhere in the same blob)content_routersuite unaffected:tests/test_transforms/test_content_router.py+tests/test_transforms_content_router.py+ new file — 86 passedheadroom._corelocally viamaturin develop --releaseto actually run the above (this repo's Rust extension isn't prebuilt in a fresh checkout)