feat(proxy): opt-in compression for catch-all passthrough routes#1699
feat(proxy): opt-in compression for catch-all passthrough routes#1699tenderdeve wants to merge 1 commit into
Conversation
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
left a comment
There was a problem hiding this comment.
Thanks for keeping this opt-in and fail-open. The shape of the feature makes sense, but this currently regresses the existing passthrough path when the handler/proxy object does not have a config attribute.
CI is failing in tests/test_proxy_handler_helpers.py with multiple existing passthrough tests raising:
AttributeError: 'OpenAIHandlerMixin' object has no attribute 'config'
The new guard uses getattr(self.config, "compress_passthrough", False), but Python has to resolve self.config before getattr can return the default. Please make the feature flag lookup tolerant of a missing config object, for example by first reading config = getattr(self, "config", None) and then checking getattr(config, "compress_passthrough", False). That keeps the default-off behavior truly no-op for the existing passthrough paths and should make the old tests meaningful again.
Requests whose path doesn't match a built-in API route fall through to `handle_passthrough`, which forwarded the body verbatim — bypassing the whole compression pipeline. Wrapper-proxy setups that front Headroom on custom paths (e.g. `/api/codex-proxy/<key>/v1/responses`) therefore got zero compression on coding-agent traffic and hit context-limit 400s in long sessions (headroomlabs-ai#1546). Add `--compress-passthrough` (env `HEADROOM_COMPRESS_PASSTHROUGH=1`), off by default. When enabled, POST passthrough requests whose path ends in `/responses` and carry an OpenAI Responses-shaped body are routed through the same `_compress_openai_responses_payload_in_executor` path the native `/v1/responses` handler uses, then forwarded to the unknown upstream. Stale Content-Length is dropped so httpx recomputes it. Fail-open by construction: non-JSON, non-Responses payloads, unmodified results, and any compressor error forward the original body unchanged, so opting in can never drop a catch-all request. Anthropic `/messages` and `/chat/completions` passthrough compression are left as follow-ups. Closes headroomlabs-ai#1546
41bce60 to
389fc80
Compare
|
@JerrettDavis updated in 389fc80 — the flag lookup now resolves |
Description
Requests whose path doesn't match a built-in API route fall through to
handle_passthrough, which forwarded the body verbatim — bypassing ContentRouter/Kompress/TOIN entirely. Wrapper-proxy setups that front Headroom on custom paths (e.g./api/codex-proxy/<key>/v1/responses) got zero compression on coding-agent traffic and hit context-limit 400s in long sessions. This adds an opt-in flag that routes OpenAI Responses-shaped passthrough bodies through the same compression path the native/v1/responseshandler uses.Closes #1546
Type of Change
Changes Made
ProxyConfig.compress_passthrough(defaultFalse) +--compress-passthroughCLI flag +HEADROOM_COMPRESS_PASSTHROUGH=1env.handle_passthrough: when enabled, POST requests whose path ends in/responseswith an OpenAI Responses-shaped body are compressed via the existing_compress_openai_responses_payload_in_executorbefore forwarding; staleContent-Lengthis dropped so httpx recomputes it._maybe_compress_passthrough_responseshelper — fail-open: non-JSON, non-Responses payloads, unmodified results, and any compressor error forward the original body unchanged.docs/content/docs/proxy.mdx.Testing
pytest)ruff check .)mypy headroom)Test Output
Real Behavior Proof
.venv..venv/bin/python -m pytest tests/test_compress_passthrough.py -q— covers a Responses-shaped body being compressed, non-JSON passthrough, non-Responses (messages) payload untouched, unmodified-result short-circuit, compressor-error fail-open, andProxyConfig().compress_passthrough is Falsedefault. Plus import smoke:ProxyConfig(compress_passthrough=True), server/handler modules import, helper present./v1/responsesalready exercises in CI.Review Readiness
Checklist
Additional Notes
Scoped to OpenAI Responses-shaped bodies (the reporter's exact case). Anthropic
/messagesand OpenAI/chat/completionspassthrough compression are natural follow-ups — deliberately left out to keep this change focused and fail-safe. CHANGELOG is release-managed, left unchecked.