Skip to content

feat(proxy): make off-path compression pool size configurable#1633

Open
gglucass wants to merge 3 commits into
headroomlabs-ai:mainfrom
gglucass:pr/configurable-bg-compression-pool
Open

feat(proxy): make off-path compression pool size configurable#1633
gglucass wants to merge 3 commits into
headroomlabs-ai:mainfrom
gglucass:pr/configurable-bg-compression-pool

Conversation

@gglucass

@gglucass gglucass commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

Description

The Phase 3 (#1171) off-path background compression executor is hardcoded to max_workers=1. Background compression fires only on cold-start requests (frozen_message_count == 0 and context above HEADROOM_BACKGROUND_COMPRESSION_MIN_TOKENS), forwarding uncompressed immediately and compressing off the request path. That single worker is fine at low concurrency, but under sustained multi-session load a burst of concurrent cold-starts all enqueue onto one thread and drain one Kompress pass at a time — so sessions at the back of the queue forward uncompressed (lower token savings) until it clears.

This adds an env knob to widen that pool for heavy-concurrency deployments, defaulting to the current behavior.

Closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update
  • Performance improvement
  • Code refactoring (no functional changes)

Changes Made

  • Read HEADROOM_BACKGROUND_COMPRESSION_WORKERS (default 1) and size the background ThreadPoolExecutor from it. Non-integer and < 1 values clamp to 1, so behavior is unchanged unless the var is explicitly set to >= 2.
  • Stored the resolved value on self._background_compression_workers for observability/testing.
  • Comment guidance keeps the pool small (2-3): these workers run CPU-bound Kompress in parallel, so a wide pool reintroduces the CPU contention the off-path design exists to avoid.

Testing

  • Unit tests pass (pytest)
  • Linting passes (ruff check .)
  • Type checking passes (mypy headroom)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ pytest tests/test_background_compression_pool.py -q
5 passed in 0.83s

$ pytest tests/test_proxy/test_background_compression.py tests/test_proxy/test_phase3_byte_identity.py tests/test_proxy_compression_executor.py -q
18 passed

$ ruff check headroom/proxy/server.py tests/test_background_compression_pool.py
All checks passed!

$ mypy headroom/proxy/server.py
Success (only pre-existing annotation-unchecked notes on untyped defs)

Real Behavior Proof

  • Environment: macOS, Python 3.10.18, branch off upstream/main @ 0.28.0
  • Exact command / steps: construct the proxy with HEADROOM_BACKGROUND_COMPRESSION_WORKERS unset / =3 / junk, assert the executor's _max_workers.
  • Observed result: unset -> 1; =3 -> 3; 0/-4/notanint -> clamped to 1. Existing background-compression byte-identity tests still pass.
  • Not tested: wall-clock queue-drain improvement under real 5-session load (would need a live multi-session harness).

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have updated the CHANGELOG.md if applicable

Additional Notes

N/A on docs/CHANGELOG: default behavior is unchanged, so this is an additive opt-in knob; happy to add a CHANGELOG entry and a line to the env-var reference if you'd prefer it documented. The default stays 1 deliberately so no existing deployment changes behavior on upgrade.

🤖 Generated with Claude Code

The Phase 3 (headroomlabs-ai#1171) off-path background compression executor was hardcoded to
max_workers=1. Under sustained multi-session load, a burst of concurrent
cold-start (frozen=0, large) requests all enqueue onto that single thread and
drain one at a time (~one Kompress pass each), so token savings dip for the
sessions at the back of the queue until it clears.

Add HEADROOM_BACKGROUND_COMPRESSION_WORKERS (default 1, so behavior is
unchanged unless set) to size the pool. Values < 1 and non-integers clamp to 1.
Guidance keeps it small (2-3): these workers run CPU-bound Kompress in
parallel, so a wide pool reintroduces the CPU contention the off-path design
exists to avoid.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

github-actions Bot commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added the status: ready for review Pull request body is complete and the author marked it ready for human review label Jul 1, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. The executor sizing remains single-worker by default, clamps invalid values safely, and the tests cover the new env parsing. I pushed two maintainer cleanup commits: one to remove unrelated uv.lock drift, and one to document the new env var in the config docs/changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready for review Pull request body is complete and the author marked it ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants