feat(proxy): make off-path compression pool size configurable#1633
Open
gglucass wants to merge 3 commits into
Open
feat(proxy): make off-path compression pool size configurable#1633gglucass wants to merge 3 commits into
gglucass wants to merge 3 commits into
Conversation
The Phase 3 (headroomlabs-ai#1171) off-path background compression executor was hardcoded to max_workers=1. Under sustained multi-session load, a burst of concurrent cold-start (frozen=0, large) requests all enqueue onto that single thread and drain one at a time (~one Kompress pass each), so token savings dip for the sessions at the back of the queue until it clears. Add HEADROOM_BACKGROUND_COMPRESSION_WORKERS (default 1, so behavior is unchanged unless set) to size the pool. Values < 1 and non-integers clamp to 1. Guidance keeps it small (2-3): these workers run CPU-bound Kompress in parallel, so a wide pool reintroduces the CPU contention the off-path design exists to avoid. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
approved these changes
Jul 1, 2026
JerrettDavis
left a comment
Collaborator
There was a problem hiding this comment.
This looks good. The executor sizing remains single-worker by default, clamps invalid values safely, and the tests cover the new env parsing. I pushed two maintainer cleanup commits: one to remove unrelated uv.lock drift, and one to document the new env var in the config docs/changelog.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The Phase 3 (#1171) off-path background compression executor is hardcoded to
max_workers=1. Background compression fires only on cold-start requests (frozen_message_count == 0and context aboveHEADROOM_BACKGROUND_COMPRESSION_MIN_TOKENS), forwarding uncompressed immediately and compressing off the request path. That single worker is fine at low concurrency, but under sustained multi-session load a burst of concurrent cold-starts all enqueue onto one thread and drain one Kompress pass at a time — so sessions at the back of the queue forward uncompressed (lower token savings) until it clears.This adds an env knob to widen that pool for heavy-concurrency deployments, defaulting to the current behavior.
Closes #
Type of Change
Changes Made
HEADROOM_BACKGROUND_COMPRESSION_WORKERS(default1) and size the backgroundThreadPoolExecutorfrom it. Non-integer and< 1values clamp to1, so behavior is unchanged unless the var is explicitly set to>= 2.self._background_compression_workersfor observability/testing.Testing
pytest)ruff check .)mypy headroom)Test Output
Real Behavior Proof
HEADROOM_BACKGROUND_COMPRESSION_WORKERSunset /=3/ junk, assert the executor's_max_workers.1;=3->3;0/-4/notanint-> clamped to1. Existing background-compression byte-identity tests still pass.Review Readiness
Checklist
Additional Notes
N/A on docs/CHANGELOG: default behavior is unchanged, so this is an additive opt-in knob; happy to add a CHANGELOG entry and a line to the env-var reference if you'd prefer it documented. The default stays
1deliberately so no existing deployment changes behavior on upgrade.🤖 Generated with Claude Code