Built on Headroom: CodeAgent Mobile (drive cloud coding agents from your phone) #1224
edgar-durand
started this conversation in
General
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Hey 👋 — wanted to share that we shipped Headroom into CodeAgent Mobile, a tool for supervising AI coding agents (Claude Code, etc.) running in cloud codespaces / self-hosted boxes from your phone.
How we use it: every managed workspace boots with
headroom-ai[proxy,code]and routes the agent throughheadroom proxy(ANTHROPIC_BASE_URL). We pre-downloadkompress-v2-base+ the ModernBERT tokenizer at provision time so the Kompress ONNX backend is warm before the first prompt — no cold-start stall. Running CPU-only via onnxruntime (no torch), which kept it light and rock-solid across arm64/x86.Real numbers from our boxes: ~28% context compression on representative coding traffic (preload from cache ~2–4s, no per-request latency hit). The ONNX path was a big win for us — lighter footprint and none of the fragility we hit on the torch path.
Zero-setup for our users — it's just on. Thanks for building this; the
[proxy]-extra + local-first model story made it a clean integration. Happy to answer anything if it's useful to others wiring agents through the proxy.Integration runbook (so nobody has to ask 🙂)
Exactly what we do per workspace, end to end.
1. Install the ONNX engines — not torch
pip install "headroom-ai[proxy,code]"[proxy]pullsonnxruntime+transformers— everything Kompress needs on the ONNX backend.[code]adds the tree-sitter AST compressor.[ml]/torch. torch is multi-GB, and a half-broken torch (we hit a corrupted arm64 build) makes the proxy hang every request when Kompress lazy-loads it. ONNX is lighter and solid. On CPU with torch absent, Kompress auto-selectsonnx_cpu.2. Pre-download the model so the first prompt isn't a cold start
The proxy eager-preloads at startup with
allow_download=Falseand defers the download to the first request on a cache miss. That deferred ~840 MB download blew our agent's idle timeout → the first prompt hung ("Thinking…" forever), then recovered on reconnect. Fix: warm the cache at provision time. Two separate repos are required (this bit us — the tokenizer lives in a different repo):Minimal cache footprint ≈ 840 MB (model) + 2 MB (tokenizer).
3. Wire the agent + start the proxy
(The backend pin is belt-and-suspenders — auto-detection already picks ONNX when torch is absent.)
4. Verify — cache-only, no upstream call, no token cost
Gotchas, condensed
answerdotai/ModernBERT-base) — pre-download both, not justkompress-v2-base..pt/.safetensors) — unused on the ONNX path, they're 80%+ of the download.Beta Was this translation helpful? Give feedback.
All reactions