Skip to content

[BUG] [PROXY] Long multi-turns coding task: only 0.3% compression #1696

Description

@AbelVM

Description

Using Hedaroom proxy in a long, multi-turn agentic coding task in a big JavaScript code repository, the token savings are quite lower than advertised: I get 0.3% vs. expected 60%-95%

To Reproduce

Full installation,

pip install "headroom-ai[all]" --break-system-packages 
pip install --upgrade litellm --break-system-packages

Running proxy with command

headroom proxy --code-aware --openai-api-url http://localhost:13305/api
Image

Using the endpoint with KiloCode, I get 0.3% token savings in a long coding session

Image

Environment

  • Headroom version: 0.28.0
  • Python version: 3.14.4
  • OS: Ubuntu 26.04
  • LLM Provider: OpenAI compatible Lemonade Server

Additional Context

Image

Dump of stats endpoint: stats.json

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions