Replies: 1 comment
-
|
Yes, Headroom works great with local models! Here's what to expect: Local Model SupportHeadroom is model-agnostic - it compresses content before it reaches any LLM. This means it works with:
The compression happens client-side regardless of what model you use. Token Savings with Local ModelsThe savings are actually MORE significant with local models because:
Setup# Start headroom pointing at your local endpoint
headroom proxy --port 8787 --api-base http://localhost:11434/v1
# Or wrap your local agent
headroom wrap claude # works with any OpenAI-compatible endpointBenchmarksThe Kompress-v2-base model runs locally by default (it's a small transformer, ~400MB) and does NOT send data to any external service. This is ideal for privacy-conscious local setups. I've been running it with Llama 3.1 via Ollama and consistently see 70-85% compression on code review prompts with no quality degradation. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
This looks like it could also be useful for people who are running models locally? Did anyone try and did they get any numbers on that?
Beta Was this translation helpful? Give feedback.
All reactions