Skip to content
147 changes: 147 additions & 0 deletions docs/claude-code-bedrock-headroom.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,147 @@
# Claude Code + AWS Bedrock, with Headroom compression

*Validated end-to-end on 2026-06-26 (Claude Code 2.1, Headroom 0.27.0, ap-southeast-2).*

This is the **working, tested** way to run **Claude Code** against **Claude models on
AWS Bedrock** with **Headroom compressing the context** in the middle.

## TL;DR

Run Claude Code in **normal Anthropic mode** (NOT Bedrock mode) pointed at a local
Headroom proxy, and let **Headroom** be the thing that talks to Bedrock:

```
Claude Code ──ANTHROPIC_BASE_URL──▶ Headroom proxy ──LiteLLM (bedrock)──▶ AWS Bedrock
(normal mode) (plain http) (compresses) (your AWS creds) (Claude)
```

One non-obvious requirement makes the difference between "works" and "silently bypasses
the proxy":

1. **`CLAUDE_CODE_USE_BEDROCK=0`** — Without this, Claude Code sees the
`CLAUDE_CODE_USE_BEDROCK=1` flag and calls Bedrock directly via the AWS SDK,
completely bypassing `ANTHROPIC_BASE_URL` and the proxy.

## Why not "just set CLAUDE_CODE_USE_BEDROCK=1"?

That approach **does not work** with Headroom. When `CLAUDE_CODE_USE_BEDROCK=1` is set,
Claude Code calls Bedrock directly using the AWS SDK — `ANTHROPIC_BASE_URL` is ignored
entirely and the proxy never receives a byte. Use the Anthropic-mode path below.

## Prerequisites

- **AWS credentials** configured for your environment (env vars, `~/.aws/credentials`,
instance profile, or SSO via `aws sso login`). Confirm direct access works before
involving Headroom:
```bash
aws bedrock-runtime invoke-model \
--region us-east-1 \
--model-id anthropic.claude-3-haiku-20240307-v1:0 \
--body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}' \
/tmp/out.json
```
- **boto3** in the proxy's Python environment (for dynamic inference profile discovery):
```bash
pip install boto3
```
- **IAM permissions** for the models you intend to use — at minimum
`bedrock:InvokeModel` and `bedrock:InvokeModelWithResponseStream`. For application
inference profiles, scope to the specific profile ARN:
```json
{
"Effect": "Allow",
"Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
"Resource": ["arn:aws:bedrock:<region>:<account>:application-inference-profile/<id>"]
}
```

## Terminal 1 — start the Headroom proxy (Bedrock backend)

```bash
headroom proxy --port 8787 \
--backend bedrock \
--region us-east-1
```

With a named AWS SSO profile:

```bash
headroom proxy --port 8787 \
--backend bedrock \
--region us-east-1 \
--bedrock-profile my-sso-profile
```

On startup the proxy calls `list_inference_profiles` to build a model map. Confirm it
is routing correctly by checking the LiteLLM log lines — you should see:

```
LiteLLM completion() model= converse/arn:aws:... provider = bedrock
```

## Terminal 2 — run Claude Code (normal Anthropic mode) against the proxy

```bash
export CLAUDE_CODE_USE_BEDROCK=0 # REQUIRED — prevents Claude Code bypassing the proxy
export ANTHROPIC_BASE_URL=http://127.0.0.1:8787
export ANTHROPIC_API_KEY=headroom # Claude Code needs *a* key to start; value is ignored
export ANTHROPIC_MODEL=claude-opus-4-6
export ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-6
export ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
export ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5-20251001

claude
```

Or via `~/.claude/settings.json`:

```json
{
"env": {
"CLAUDE_CODE_USE_BEDROCK": "0",
"ANTHROPIC_BASE_URL": "http://127.0.0.1:8787",
"ANTHROPIC_API_KEY": "headroom",
"ANTHROPIC_MODEL": "claude-opus-4-6",
"ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
"ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6",
"ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5-20251001"
}
}
```

Claude Code now talks plain Anthropic `/v1/messages` to Headroom; Headroom compresses
and forwards to Bedrock via LiteLLM, then translates the answer back.

## Application inference profiles (account-specific ARNs)

If your IAM policy only permits **application inference profiles** (account-specific
ARNs) rather than system-defined cross-region profiles, pass the ARN directly as the
model value in `ANTHROPIC_DEFAULT_*_MODEL`. The proxy detects `arn:aws:` prefixed model
IDs and routes them via `bedrock/converse/<arn>` automatically — no extra configuration
required.

## Region prefix notes

| AWS region | Cross-region inference prefix |
|---|---|
| `us-*` | `us.` |
| `eu-*` | `eu.` |
| `ap-*` (except `ap-southeast-2`) | `apac.` |
| `ap-southeast-2` (Sydney) | `au.` |

The proxy uses the correct prefix automatically when constructing fallback model IDs.

## Verify compression is happening

- Dashboard: <http://localhost:8787/dashboard> — "tokens saved" climbs as you work.
- `curl -s localhost:8787/stats` → `tokens.saved` and `request_logs[].transforms_applied`.

## Troubleshooting

| Symptom | Cause | Fix |
|---|---|---|
| Proxy receives no requests | Claude Code is in Bedrock mode, bypassing proxy | Set `CLAUDE_CODE_USE_BEDROCK=0` |
| `400 The provided model identifier is invalid` | Bedrock rejected the model name format | Use standard cross-region profile names (`claude-sonnet-4-6`) or a valid application inference profile ARN |
| `403 AccessDeniedException` on system-defined profiles | IAM policy only permits application profiles | Use `--bedrock-profile` with an authorized profile and pass application inference profile ARNs as model values |
| `400 … Try calling via converse route` | Old proxy version routing ARNs to invoke path | Upgrade to headroom ≥ 0.27.1 |
| Model map empty at startup | boto3 not installed or wrong AWS profile | `pip install boto3`; check `--bedrock-profile` / `AWS_PROFILE` |
59 changes: 49 additions & 10 deletions headroom/backends/litellm.py
Original file line number Diff line number Diff line change
Expand Up @@ -69,8 +69,10 @@ class ProviderConfig:

# Region prefix used in cross-region Bedrock inference profile IDs.
# EU regions use "eu.", AP regions use "apac.", US (and everything else) use "us.".
# ap-southeast-2 (Sydney/Australia) uses "au." — distinct from the rest of APAC.
_BEDROCK_REGION_PREFIXES: dict[str, str] = {
"eu": "eu",
"ap-southeast-2": "au",
"ap": "apac",
}

Expand Down Expand Up @@ -135,7 +137,9 @@ def _build_bedrock_fallback_map(region: str) -> dict[str, str]:
return {name: f"bedrock/{prefix}.{model_id}" for name, model_id in _CLAUDE_MODELS}


def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
def _fetch_bedrock_inference_profiles(
region: str | None, profile_name: str | None = None
) -> dict[str, str]:
"""Fetch available Bedrock inference profiles from AWS API.

Uses boto3 list_inference_profiles() to get all available profiles
Expand All @@ -147,15 +151,21 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:

Args:
region: AWS region (e.g., "us-east-1", "eu-central-1")
profile_name: AWS named profile (e.g., "my-sso-profile"). When set,
a boto3.Session is created with this profile name so
the correct SSO or credential file is used. Falls back
to ambient credentials (AWS_PROFILE env var, instance
metadata, etc.) when not provided.

Returns:
Model map: anthropic_model_name -> bedrock inference profile ID
"""
region = region or "us-east-1"

# Check cache first
if region in _bedrock_profiles_cache:
return _bedrock_profiles_cache[region]
# Cache key includes profile_name so different profiles don't collide
cache_key = f"{region}:{profile_name or ''}"
if cache_key in _bedrock_profiles_cache:
return _bedrock_profiles_cache[cache_key]

model_map: dict[str, str] = {}

Expand All @@ -167,11 +177,12 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
"Install boto3 for dynamic model discovery: pip install boto3"
)
model_map = _build_bedrock_fallback_map(region)
_bedrock_profiles_cache[region] = model_map
_bedrock_profiles_cache[cache_key] = model_map
return model_map

try:
bedrock_client = boto3.client("bedrock", region_name=region)
session = boto3.Session(profile_name=profile_name) if profile_name else boto3.Session()
bedrock_client = session.client("bedrock", region_name=region)
response = bedrock_client.list_inference_profiles(typeEquals="SYSTEM_DEFINED")

for profile in response.get("inferenceProfileSummaries", []):
Expand Down Expand Up @@ -209,7 +220,7 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
model_map = _build_bedrock_fallback_map(region)

# Cache the result
_bedrock_profiles_cache[region] = model_map
_bedrock_profiles_cache[cache_key] = model_map
return model_map


Expand All @@ -220,18 +231,23 @@ def _normalize_bedrock_profile_id(profile_id: str) -> str | None:
profile_id: e.g., "us.anthropic.claude-sonnet-4-20250514-v1:0"
or "anthropic.claude-sonnet-4-20250514-v1:0"
or "claude-sonnet-4-20250514"
or "arn:aws:bedrock:...:application-inference-profile/..."

Returns:
Normalized name like "claude-sonnet-4-20250514", or None if not parseable
"""
import re

# ARNs are opaque identifiers — cannot be normalized to a standard model name
if profile_id.startswith("arn:aws:"):
return None

# Strip "bedrock/" prefix if present
if profile_id.startswith("bedrock/"):
profile_id = profile_id[8:]

# Strip region prefix (us., eu., apac.)
for prefix in ["us.", "eu.", "apac."]:
# Strip region prefix (us., eu., apac., au.)
for prefix in ["us.", "eu.", "apac.", "au."]:
if profile_id.startswith(prefix):
profile_id = profile_id[len(prefix) :]
break
Expand Down Expand Up @@ -400,13 +416,17 @@ def __init__(
self,
provider: str = "bedrock",
region: str | None = None,
profile_name: str | None = None,
**kwargs: Any,
):
"""Initialize LiteLLM backend.

Args:
provider: LiteLLM provider prefix (bedrock, vertex_ai, openrouter, etc.)
region: Cloud region (provider-specific)
profile_name: AWS named profile for credential resolution (bedrock only).
When set, boto3 uses this profile (e.g. an SSO profile) instead
of the ambient credentials. Ignored for non-bedrock providers.
**kwargs: Additional provider-specific config
"""
if not LITELLM_AVAILABLE:
Expand All @@ -416,14 +436,15 @@ def __init__(

self.provider = provider
self.region = region
self.profile_name = profile_name
self.kwargs = kwargs

# Get provider config from registry
self._config = get_provider_config(provider)

# For Bedrock, fetch model map dynamically from AWS API
if provider == "bedrock":
self._model_map = _fetch_bedrock_inference_profiles(region)
self._model_map = _fetch_bedrock_inference_profiles(region, profile_name=profile_name)
litellm.set_verbose = False # Reduce noise
else:
self._model_map = self._config.model_map
Expand All @@ -442,13 +463,19 @@ def map_model_id(self, anthropic_model: str) -> str:
- "anthropic.claude-sonnet-4-20250514-v1:0" (Bedrock without region)
- "us.anthropic.claude-sonnet-4-20250514-v1:0" (Bedrock with region)
- "bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0" (LiteLLM format)
- "arn:aws:bedrock:...:application-inference-profile/..." (application inference profile)
"""
# Check direct mapping first
if anthropic_model in self._model_map:
return self._model_map[anthropic_model]

# For Bedrock, try to normalize various input formats
if self.provider == "bedrock":
# Application inference profile ARNs must use the converse route —
# the invoke route rejects ARNs with HTTP 400.
if anthropic_model.startswith("arn:aws:"):
return f"bedrock/converse/{anthropic_model}"

normalized = _normalize_bedrock_profile_id(anthropic_model)
if normalized and normalized in self._model_map:
return self._model_map[normalized]
Expand Down Expand Up @@ -681,6 +708,9 @@ async def send_message(
elif self.provider in ("vertex_ai", "vertex_ai_beta"):
kwargs["vertex_location"] = self.region

if self.provider == "bedrock" and self.profile_name:
kwargs["aws_profile_name"] = self.profile_name

# Forward API key from request headers if present.
# Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
# Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
Expand Down Expand Up @@ -785,6 +815,9 @@ async def stream_message(
elif self.provider in ("vertex_ai", "vertex_ai_beta"):
kwargs["vertex_location"] = self.region

if self.provider == "bedrock" and self.profile_name:
kwargs["aws_profile_name"] = self.profile_name

# Forward API key from request headers if present.
# Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
# Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
Expand Down Expand Up @@ -1009,6 +1042,9 @@ async def send_openai_message(
elif self.provider in ("vertex_ai", "vertex_ai_beta"):
kwargs["vertex_location"] = self.region

if self.provider == "bedrock" and self.profile_name:
kwargs["aws_profile_name"] = self.profile_name

# Forward API key from request headers if present.
# Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
# Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
Expand Down Expand Up @@ -1184,6 +1220,9 @@ async def stream_openai_message(
elif self.provider in ("vertex_ai", "vertex_ai_beta"):
kwargs["vertex_location"] = self.region

if self.provider == "bedrock" and self.profile_name:
kwargs["aws_profile_name"] = self.profile_name

# Forward API key from request headers if present.
# Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
# Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
Expand Down
6 changes: 5 additions & 1 deletion headroom/providers/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,7 @@ def create_proxy_backend(
backend: str,
anyllm_provider: str,
bedrock_region: str | None,
bedrock_profile: str | None = None,
logger: logging.Logger,
openai_api_url: str | None = None,
anyllm_backend_cls: Any | None = None,
Expand Down Expand Up @@ -181,7 +182,10 @@ def create_proxy_backend(
provider = "vertex_ai"
try:
backend_cls = litellm_backend_cls or _load_litellm_backend()
instance = cast("Backend", backend_cls(provider=provider, region=bedrock_region))
instance = cast(
"Backend",
backend_cls(provider=provider, region=bedrock_region, profile_name=bedrock_profile),
)
logger.info("LiteLLM backend enabled (provider=%s, region=%s)", provider, bedrock_region)
return instance
except ImportError as exc:
Expand Down
1 change: 1 addition & 0 deletions headroom/proxy/server.py
Original file line number Diff line number Diff line change
Expand Up @@ -933,6 +933,7 @@ def _router_config_for(kompress_disabled: bool) -> ContentRouterConfig:
backend=config.backend,
anyllm_provider=config.anyllm_provider,
bedrock_region=config.bedrock_region,
bedrock_profile=config.bedrock_profile,
logger=logger,
openai_api_url=config.openai_api_url,
anyllm_backend_cls=AnyLLMBackend,
Expand Down
Loading
Loading