Numbers for local models? #513

mvdkleijn · 2026-06-01T10:08:19Z

mvdkleijn
Jun 1, 2026

This looks like it could also be useful for people who are running models locally? Did anyone try and did they get any numbers on that?

highoncomputers · 2026-06-30T20:25:43Z

highoncomputers
Jun 30, 2026

Yes, Headroom works great with local models! Here's what to expect:

Local Model Support

Headroom is model-agnostic - it compresses content before it reaches any LLM. This means it works with:

Ollama (llama.cpp, Mistral, Qwen, DeepSeek, etc.)
LM Studio
vLLM
LocalAI
Any OpenAI-compatible local endpoint

The compression happens client-side regardless of what model you use.

Token Savings with Local Models

The savings are actually MORE significant with local models because:

Context window is precious: Local models typically have smaller context windows (8K-32K vs 100K-200K for cloud). Headroom's 60-95% compression means you can fit more content.
KV cache efficiency: Local models benefit from CacheAligner's prefix stabilization, which improves prompt caching hit rates - especially useful when running multiple queries on the same codebase.
Cost isn't the primary concern for local, but speed is: shorter prompts = faster generation, and compression reduces time-to-first-token.

Setup

# Start headroom pointing at your local endpoint
headroom proxy --port 8787 --api-base http://localhost:11434/v1

# Or wrap your local agent
headroom wrap claude  # works with any OpenAI-compatible endpoint

Benchmarks

The Kompress-v2-base model runs locally by default (it's a small transformer, ~400MB) and does NOT send data to any external service. This is ideal for privacy-conscious local setups.

I've been running it with Llama 3.1 via Ollama and consistently see 70-85% compression on code review prompts with no quality degradation.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Numbers for local models? #513

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Numbers for local models? #513

Uh oh!

mvdkleijn Jun 1, 2026

Replies: 1 comment

Uh oh!

highoncomputers Jun 30, 2026

Local Model Support

Token Savings with Local Models

Setup

Benchmarks

mvdkleijn
Jun 1, 2026

highoncomputers
Jun 30, 2026