Skip to content

fix(router): stop protecting passing build/test output as error traces#1740

Open
ashishpatel26 wants to merge 1 commit into
headroomlabs-ai:mainfrom
ashishpatel26:fix/1696-error-protection-false-positive
Open

fix(router): stop protecting passing build/test output as error traces#1740
ashishpatel26 wants to merge 1 commit into
headroomlabs-ai:mainfrom
ashishpatel26:fix/1696-error-protection-false-positive

Conversation

@ashishpatel26

@ashishpatel26 ashishpatel26 commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

ChatGPT Image Jul 3, 2026, 03_04_36 PM
  • content_has_strong_error_indicators() (headroom/transforms/error_detection.py) protects any message/content-block from compression when it contains 2+ distinct indicator keywords (error, fail, exception, traceback, fatal, panic, crash) — the assumption being genuine failure output nearly always pairs two of these, while benign mentions rarely do.
  • That assumption breaks on ordinary passing build/test tool output: tsc's "Found 0 errors" plus a passing test run's "0 failures" trips both error and fail — 2 distinct hits — despite nothing failing.
  • In a long JS/TS coding session this fires on nearly every request (confirmed against the stats.json attached to [BUG] [PROXY] Long multi-turns coding task: only 0.3% compression #1696router:protected:error_output shows up in almost every transforms_applied list in the request log), permanently protecting large amounts of legitimate tool output from ever being compressed. This explains the reported 0.3% savings vs. the advertised 60-95%.
  • Fix: strip common zero-result phrases ("0 errors", "no failures", etc.) before the keyword scan, so a clean tool run no longer masquerades as a failure trace. A genuine second distinct indicator elsewhere in the same text still triggers protection correctly.

No prior test coverage existed for this function — added tests/test_error_detection.py.

Fixes #1696.

Investigation trail

Ruled out first: the proxy's protect_recent_reads_fraction dataclass default (0.0) looked like a candidate but is a red herring for the proxy path — server.py already overrides it to 0.3 whenever mode="token" (the default). The real mechanism is the error-protection false-positive above, confirmed against the reporter's own stats.json.

Test plan

  • New: tests/test_error_detection.py — 5 tests (real error/traceback still flagged, single-keyword mention not flagged, passing tsc/eslint summaries not flagged, real error not masked by a stripped zero-result phrase elsewhere in the same blob)
  • Full content_router suite unaffected: tests/test_transforms/test_content_router.py + tests/test_transforms_content_router.py + new file — 86 passed
  • Built headroom._core locally via maturin develop --release to actually run the above (this repo's Rust extension isn't prebuilt in a fresh checkout)

@github-actions

github-actions Bot commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR does not yet satisfy the required template fields:

  • Missing required section Description.
  • Missing required section Type of Change.
  • Missing required section Changes Made.
  • Missing required section Testing.
  • Missing required section Real Behavior Proof.
  • Missing required section Review Readiness.
  • Check I have performed a self-review before requesting human review.
  • Check This PR is ready for human review or convert the PR back to draft.

Please update the PR body, or move the PR back to draft while it is still in progress.

@github-actions github-actions Bot added status: needs author action Pull request body or readiness checklist still needs author updates status: has conflicts Pull request has merge conflicts with the base branch labels Jul 3, 2026
content_has_strong_error_indicators() (error_detection.py) protects any
message/content-block from compression when it contains 2+ distinct
indicator keywords (error, fail, exception, traceback, fatal, panic,
crash), on the theory that genuine failure output nearly always pairs
two of these while benign mentions rarely do.

That assumption breaks on ordinary passing build/test tool output: tsc's
"Found 0 errors" plus a passing test run's "0 failures" trips both
"error" and "fail" — 2 distinct hits — despite nothing failing. In a
long JS/TS coding session this fires on nearly every request (confirmed
against the stats.json attached to issue headroomlabs-ai#1696), permanently protecting
large amounts of legitimate tool output from ever being compressed and
explaining the reported 0.3% savings vs. the advertised 60-95%.

Fix: strip common zero-result phrases ("0 errors", "no failures", etc.)
before the keyword scan, so a clean tool run no longer masquerades as a
failure trace. A genuine second distinct indicator elsewhere in the same
text still triggers protection correctly.

No prior test coverage existed for this function; added
tests/test_error_detection.py covering real errors, single-keyword
mentions, tsc/eslint passing summaries, and the case where a real error
coexists with a stripped zero-result phrase in the same blob.

Fixes headroomlabs-ai#1696.
@ashishpatel26 ashishpatel26 force-pushed the fix/1696-error-protection-false-positive branch from 5795fbc to 8413a5f Compare July 3, 2026 09:40
@github-actions github-actions Bot removed the status: has conflicts Pull request has merge conflicts with the base branch label Jul 3, 2026
@AbelVM

AbelVM commented Jul 3, 2026

Copy link
Copy Markdown

What happens when those keywords appear in prompts, code files or code blocks? like

  • Fix the errors in the code.
  • console.error(...)
  • There are failures in the tests that need fixing.
  • ...

@sparkbugz

Copy link
Copy Markdown

To build on what @AbelVM is asking, what about other cases of other summary formats like these?

  • Failures: 0, Errors: 0
  • failed: 0, errors: 0
  • Errors=0 Failures=0

The current proposed fix looks to handle 0/no forms such as 0 errors and 0 failures, which would make sense for JavaScript/Typescript-style output, but I'm wondering if it would catch formats like X: 0 or X=0, would it still leave both fail and error visible to the scan?

This seems like it would affect more than just JavaScript and TypeScript, too, since a good amount of test and CI tools report results in summary formats like those. It may be worth adding extra detection for broader coverage! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: needs author action Pull request body or readiness checklist still needs author updates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] [PROXY] Long multi-turns coding task: only 0.3% compression

3 participants