Built on Headroom: CodeAgent Mobile (drive cloud coding agents from your phone) #1224

edgar-durand · 2026-06-21T05:10:49Z

edgar-durand
Jun 21, 2026

Hey 👋 — wanted to share that we shipped Headroom into CodeAgent Mobile, a tool for supervising AI coding agents (Claude Code, etc.) running in cloud codespaces / self-hosted boxes from your phone.

How we use it: every managed workspace boots with headroom-ai[proxy,code] and routes the agent through headroom proxy (ANTHROPIC_BASE_URL). We pre-download kompress-v2-base + the ModernBERT tokenizer at provision time so the Kompress ONNX backend is warm before the first prompt — no cold-start stall. Running CPU-only via onnxruntime (no torch), which kept it light and rock-solid across arm64/x86.

Real numbers from our boxes: ~28% context compression on representative coding traffic (preload from cache ~2–4s, no per-request latency hit). The ONNX path was a big win for us — lighter footprint and none of the fragility we hit on the torch path.

Zero-setup for our users — it's just on. Thanks for building this; the [proxy]-extra + local-first model story made it a clean integration. Happy to answer anything if it's useful to others wiring agents through the proxy.

Integration runbook (so nobody has to ask 🙂)

Exactly what we do per workspace, end to end.

1. Install the ONNX engines — not torch

pip install "headroom-ai[proxy,code]"

[proxy] pulls onnxruntime + transformers — everything Kompress needs on the ONNX backend.
[code] adds the tree-sitter AST compressor.
We deliberately do not install [ml]/torch. torch is multi-GB, and a half-broken torch (we hit a corrupted arm64 build) makes the proxy hang every request when Kompress lazy-loads it. ONNX is lighter and solid. On CPU with torch absent, Kompress auto-selects onnx_cpu.

2. Pre-download the model so the first prompt isn't a cold start

The proxy eager-preloads at startup with allow_download=False and defers the download to the first request on a cache miss. That deferred ~840 MB download blew our agent's idle timeout → the first prompt hung ("Thinking…" forever), then recovered on reconnect. Fix: warm the cache at provision time. Two separate repos are required (this bit us — the tokenizer lives in a different repo):

from huggingface_hub import snapshot_download
# 1) the ONNX model — skip the .pt/.safetensors torch artifacts (~600 MB unused)
snapshot_download("chopratejas/kompress-v2-base",
                  allow_patterns=["*.json", "onnx/*.onnx", "kompress-int8-wo.onnx"])
# 2) the TOKENIZER only — Kompress loads it from answerdotai/ModernBERT-base.
#    Skip its model weights (~2.7 GB unused).
snapshot_download("answerdotai/ModernBERT-base",
                  allow_patterns=["*.json", "tokenizer*", "*.txt", "vocab*", "merges*"])

Minimal cache footprint ≈ 840 MB (model) + 2 MB (tokenizer).

3. Wire the agent + start the proxy

headroom init --global claude          # points ANTHROPIC_BASE_URL at the local proxy
HEADROOM_KOMPRESS_BACKEND=onnx_cpu \
  headroom proxy --port 8787           # eager-preloads from the warm cache

(The backend pin is belt-and-suspenders — auto-detection already picks ONNX when torch is absent.)

4. Verify — cache-only, no upstream call, no token cost

from headroom.transforms.kompress_compressor import KompressCompressor, KompressConfig
kc = KompressCompressor(KompressConfig(device="auto"))
print(kc.preload(allow_download=False))   # -> "onnx" in ~2-4s, no download, no hang
# kc.compress(<a ~500+ token coding sample>) -> ~28% saved on our boxes

Gotchas, condensed

Use ONNX, not torch — lighter, and no hang-on-broken-torch failure mode.
The tokenizer is in a separate repo (answerdotai/ModernBERT-base) — pre-download both, not just kompress-v2-base.
Pre-download before the proxy serves — otherwise the first prompt pays the cold download and can time out against the agent's idle window.
Prune the torch artifacts from the HF cache (.pt / .safetensors) — unused on the ONNX path, they're 80%+ of the download.
Kompress passes through messages under ~10 words, so trivial prompts show 0% — the savings show up on real, context-heavy turns.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Built on Headroom: CodeAgent Mobile (drive cloud coding agents from your phone) #1224

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Built on Headroom: CodeAgent Mobile (drive cloud coding agents from your phone) #1224

Uh oh!

Uh oh!

edgar-durand Jun 21, 2026

Integration runbook (so nobody has to ask 🙂)

1. Install the ONNX engines — not torch

2. Pre-download the model so the first prompt isn't a cold start

3. Wire the agent + start the proxy

4. Verify — cache-only, no upstream call, no token cost

Gotchas, condensed

Replies: 0 comments

edgar-durand
Jun 21, 2026