Description
Using Hedaroom proxy in a long, multi-turn agentic coding task in a big JavaScript code repository, the token savings are quite lower than advertised: I get 0.3% vs. expected 60%-95%
To Reproduce
Full installation,
pip install "headroom-ai[all]" --break-system-packages
pip install --upgrade litellm --break-system-packages
Running proxy with command
headroom proxy --code-aware --openai-api-url http://localhost:13305/api
Using the endpoint with KiloCode, I get 0.3% token savings in a long coding session
Environment
- Headroom version: 0.28.0
- Python version: 3.14.4
- OS: Ubuntu 26.04
- LLM Provider: OpenAI compatible Lemonade Server
Additional Context
Dump of stats endpoint: stats.json
Description
Using Hedaroom proxy in a long, multi-turn agentic coding task in a big JavaScript code repository, the token savings are quite lower than advertised: I get 0.3% vs. expected 60%-95%
To Reproduce
Full installation,
pip install "headroom-ai[all]" --break-system-packages pip install --upgrade litellm --break-system-packagesRunning proxy with command
Using the endpoint with KiloCode, I get 0.3% token savings in a long coding session
Environment
Additional Context
Dump of stats endpoint: stats.json