headroomlabs-ai · JerrettDavis · Jul 3, 2026 · Jun 26, 2026 · Jun 26, 2026 · Jun 26, 2026
@@ -0,0 +1,147 @@
+# Claude Code + AWS Bedrock, with Headroom compression
+
+*Validated end-to-end on 2026-06-26 (Claude Code 2.1, Headroom 0.27.0, ap-southeast-2).*
+
+This is the **working, tested** way to run **Claude Code** against **Claude models on
+AWS Bedrock** with **Headroom compressing the context** in the middle.
+
+## TL;DR
+
+Run Claude Code in **normal Anthropic mode** (NOT Bedrock mode) pointed at a local
+Headroom proxy, and let **Headroom** be the thing that talks to Bedrock:
+
+```
+Claude Code  ──ANTHROPIC_BASE_URL──▶  Headroom proxy  ──LiteLLM (bedrock)──▶  AWS Bedrock
+ (normal mode)     (plain http)        (compresses)         (your AWS creds)      (Claude)
+```
+
+One non-obvious requirement makes the difference between "works" and "silently bypasses
+the proxy":
+
+1. **`CLAUDE_CODE_USE_BEDROCK=0`** — Without this, Claude Code sees the
+   `CLAUDE_CODE_USE_BEDROCK=1` flag and calls Bedrock directly via the AWS SDK,
+   completely bypassing `ANTHROPIC_BASE_URL` and the proxy.
+
+## Why not "just set CLAUDE_CODE_USE_BEDROCK=1"?
+
+That approach **does not work** with Headroom. When `CLAUDE_CODE_USE_BEDROCK=1` is set,
+Claude Code calls Bedrock directly using the AWS SDK — `ANTHROPIC_BASE_URL` is ignored
+entirely and the proxy never receives a byte. Use the Anthropic-mode path below.
+
+## Prerequisites
+
+- **AWS credentials** configured for your environment (env vars, `~/.aws/credentials`,
+  instance profile, or SSO via `aws sso login`). Confirm direct access works before
+  involving Headroom:
+  ```bash
+  aws bedrock-runtime invoke-model \
+    --region us-east-1 \
+    --model-id anthropic.claude-3-haiku-20240307-v1:0 \
+    --body '{"anthropic_version":"bedrock-2023-05-31","max_tokens":20,"messages":[{"role":"user","content":"hi"}]}' \
+    /tmp/out.json
+  ```
+- **boto3** in the proxy's Python environment (for dynamic inference profile discovery):
+  ```bash
+  pip install boto3
+  ```
+- **IAM permissions** for the models you intend to use — at minimum
+  `bedrock:InvokeModel` and `bedrock:InvokeModelWithResponseStream`. For application
+  inference profiles, scope to the specific profile ARN:
+  ```json
+  {
+    "Effect": "Allow",
+    "Action": ["bedrock:InvokeModel", "bedrock:InvokeModelWithResponseStream"],
+    "Resource": ["arn:aws:bedrock:<region>:<account>:application-inference-profile/<id>"]
+  }
+  ```
+
+## Terminal 1 — start the Headroom proxy (Bedrock backend)
+
+```bash
+headroom proxy --port 8787 \
+  --backend bedrock \
+  --region us-east-1
+```
+
+With a named AWS SSO profile:
+
+```bash
+headroom proxy --port 8787 \
+  --backend bedrock \
+  --region us-east-1 \
+  --bedrock-profile my-sso-profile
+```
+
+On startup the proxy calls `list_inference_profiles` to build a model map. Confirm it
+is routing correctly by checking the LiteLLM log lines — you should see:
+
+```
+LiteLLM completion() model= converse/arn:aws:... provider = bedrock
+```
+
+## Terminal 2 — run Claude Code (normal Anthropic mode) against the proxy
+
+```bash
+export CLAUDE_CODE_USE_BEDROCK=0               # REQUIRED — prevents Claude Code bypassing the proxy
+export ANTHROPIC_BASE_URL=http://127.0.0.1:8787
+export ANTHROPIC_API_KEY=headroom              # Claude Code needs *a* key to start; value is ignored
+export ANTHROPIC_MODEL=claude-opus-4-6
+export ANTHROPIC_DEFAULT_SONNET_MODEL=claude-sonnet-4-6
+export ANTHROPIC_DEFAULT_OPUS_MODEL=claude-opus-4-6
+export ANTHROPIC_DEFAULT_HAIKU_MODEL=claude-haiku-4-5-20251001
+
+claude
+```
+
+Or via `~/.claude/settings.json`:
+
+```json
+{
+  "env": {
+    "CLAUDE_CODE_USE_BEDROCK": "0",
+    "ANTHROPIC_BASE_URL": "http://127.0.0.1:8787",
+    "ANTHROPIC_API_KEY": "headroom",
+    "ANTHROPIC_MODEL": "claude-opus-4-6",
+    "ANTHROPIC_DEFAULT_SONNET_MODEL": "claude-sonnet-4-6",
+    "ANTHROPIC_DEFAULT_OPUS_MODEL": "claude-opus-4-6",
+    "ANTHROPIC_DEFAULT_HAIKU_MODEL": "claude-haiku-4-5-20251001"
+  }
+}
+```
+
+Claude Code now talks plain Anthropic `/v1/messages` to Headroom; Headroom compresses
+and forwards to Bedrock via LiteLLM, then translates the answer back.
+
+## Application inference profiles (account-specific ARNs)
+
+If your IAM policy only permits **application inference profiles** (account-specific
+ARNs) rather than system-defined cross-region profiles, pass the ARN directly as the
+model value in `ANTHROPIC_DEFAULT_*_MODEL`. The proxy detects `arn:aws:` prefixed model
+IDs and routes them via `bedrock/converse/<arn>` automatically — no extra configuration
+required.
+
+## Region prefix notes
+
+| AWS region | Cross-region inference prefix |
+|---|---|
+| `us-*` | `us.` |
+| `eu-*` | `eu.` |
+| `ap-*` (except `ap-southeast-2`) | `apac.` |
+| `ap-southeast-2` (Sydney) | `au.` |
+
+The proxy uses the correct prefix automatically when constructing fallback model IDs.
+
+## Verify compression is happening
+
+- Dashboard: <http://localhost:8787/dashboard> — "tokens saved" climbs as you work.
+- `curl -s localhost:8787/stats` → `tokens.saved` and `request_logs[].transforms_applied`.
+
+## Troubleshooting
+
+| Symptom | Cause | Fix |
+|---|---|---|
+| Proxy receives no requests | Claude Code is in Bedrock mode, bypassing proxy | Set `CLAUDE_CODE_USE_BEDROCK=0` |
+| `400 The provided model identifier is invalid` | Bedrock rejected the model name format | Use standard cross-region profile names (`claude-sonnet-4-6`) or a valid application inference profile ARN |
+| `403 AccessDeniedException` on system-defined profiles | IAM policy only permits application profiles | Use `--bedrock-profile` with an authorized profile and pass application inference profile ARNs as model values |
+| `400 … Try calling via converse route` | Old proxy version routing ARNs to invoke path | Upgrade to headroom ≥ 0.27.1 |
+| Model map empty at startup | boto3 not installed or wrong AWS profile | `pip install boto3`; check `--bedrock-profile` / `AWS_PROFILE` |
@@ -69,8 +69,10 @@ class ProviderConfig:
 
 # Region prefix used in cross-region Bedrock inference profile IDs.
 # EU regions use "eu.", AP regions use "apac.", US (and everything else) use "us.".
+# ap-southeast-2 (Sydney/Australia) uses "au." — distinct from the rest of APAC.
 _BEDROCK_REGION_PREFIXES: dict[str, str] = {
     "eu": "eu",
+    "ap-southeast-2": "au",
     "ap": "apac",
 }
 
@@ -135,7 +137,9 @@ def _build_bedrock_fallback_map(region: str) -> dict[str, str]:
     return {name: f"bedrock/{prefix}.{model_id}" for name, model_id in _CLAUDE_MODELS}
 
 
-def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
+def _fetch_bedrock_inference_profiles(
+    region: str | None, profile_name: str | None = None
+) -> dict[str, str]:
     """Fetch available Bedrock inference profiles from AWS API.
 
     Uses boto3 list_inference_profiles() to get all available profiles
@@ -147,15 +151,21 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
 
     Args:
         region: AWS region (e.g., "us-east-1", "eu-central-1")
+        profile_name: AWS named profile (e.g., "my-sso-profile"). When set,
+                      a boto3.Session is created with this profile name so
+                      the correct SSO or credential file is used. Falls back
+                      to ambient credentials (AWS_PROFILE env var, instance
+                      metadata, etc.) when not provided.
 
     Returns:
         Model map: anthropic_model_name -> bedrock inference profile ID
     """
     region = region or "us-east-1"
 
-    # Check cache first
-    if region in _bedrock_profiles_cache:
-        return _bedrock_profiles_cache[region]
+    # Cache key includes profile_name so different profiles don't collide
+    cache_key = f"{region}:{profile_name or ''}"
+    if cache_key in _bedrock_profiles_cache:
+        return _bedrock_profiles_cache[cache_key]
 
     model_map: dict[str, str] = {}
 
@@ -167,11 +177,12 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
             "Install boto3 for dynamic model discovery: pip install boto3"
         )
         model_map = _build_bedrock_fallback_map(region)
-        _bedrock_profiles_cache[region] = model_map
+        _bedrock_profiles_cache[cache_key] = model_map
         return model_map
 
     try:
-        bedrock_client = boto3.client("bedrock", region_name=region)
+        session = boto3.Session(profile_name=profile_name) if profile_name else boto3.Session()
+        bedrock_client = session.client("bedrock", region_name=region)
         response = bedrock_client.list_inference_profiles(typeEquals="SYSTEM_DEFINED")
 
         for profile in response.get("inferenceProfileSummaries", []):
@@ -209,7 +220,7 @@ def _fetch_bedrock_inference_profiles(region: str | None) -> dict[str, str]:
         model_map = _build_bedrock_fallback_map(region)
 
     # Cache the result
-    _bedrock_profiles_cache[region] = model_map
+    _bedrock_profiles_cache[cache_key] = model_map
     return model_map
 
 
@@ -220,18 +231,23 @@ def _normalize_bedrock_profile_id(profile_id: str) -> str | None:
         profile_id: e.g., "us.anthropic.claude-sonnet-4-20250514-v1:0"
                     or "anthropic.claude-sonnet-4-20250514-v1:0"
                     or "claude-sonnet-4-20250514"
+                    or "arn:aws:bedrock:...:application-inference-profile/..."
 
     Returns:
         Normalized name like "claude-sonnet-4-20250514", or None if not parseable
     """
     import re
 
+    # ARNs are opaque identifiers — cannot be normalized to a standard model name
+    if profile_id.startswith("arn:aws:"):
+        return None
+
     # Strip "bedrock/" prefix if present
     if profile_id.startswith("bedrock/"):
         profile_id = profile_id[8:]
 
-    # Strip region prefix (us., eu., apac.)
-    for prefix in ["us.", "eu.", "apac."]:
+    # Strip region prefix (us., eu., apac., au.)
+    for prefix in ["us.", "eu.", "apac.", "au."]:
         if profile_id.startswith(prefix):
             profile_id = profile_id[len(prefix) :]
             break
@@ -400,13 +416,17 @@ def __init__(
         self,
         provider: str = "bedrock",
         region: str | None = None,
+        profile_name: str | None = None,
         **kwargs: Any,
     ):
         """Initialize LiteLLM backend.
 
         Args:
             provider: LiteLLM provider prefix (bedrock, vertex_ai, openrouter, etc.)
             region: Cloud region (provider-specific)
+            profile_name: AWS named profile for credential resolution (bedrock only).
+                          When set, boto3 uses this profile (e.g. an SSO profile) instead
+                          of the ambient credentials. Ignored for non-bedrock providers.
             **kwargs: Additional provider-specific config
         """
         if not LITELLM_AVAILABLE:
@@ -416,14 +436,15 @@ def __init__(
 
         self.provider = provider
         self.region = region
+        self.profile_name = profile_name
         self.kwargs = kwargs
 
         # Get provider config from registry
         self._config = get_provider_config(provider)
 
         # For Bedrock, fetch model map dynamically from AWS API
         if provider == "bedrock":
-            self._model_map = _fetch_bedrock_inference_profiles(region)
+            self._model_map = _fetch_bedrock_inference_profiles(region, profile_name=profile_name)
             litellm.set_verbose = False  # Reduce noise
         else:
             self._model_map = self._config.model_map
@@ -442,13 +463,19 @@ def map_model_id(self, anthropic_model: str) -> str:
         - "anthropic.claude-sonnet-4-20250514-v1:0" (Bedrock without region)
         - "us.anthropic.claude-sonnet-4-20250514-v1:0" (Bedrock with region)
         - "bedrock/us.anthropic.claude-sonnet-4-20250514-v1:0" (LiteLLM format)
+        - "arn:aws:bedrock:...:application-inference-profile/..." (application inference profile)
         """
         # Check direct mapping first
         if anthropic_model in self._model_map:
             return self._model_map[anthropic_model]
 
         # For Bedrock, try to normalize various input formats
         if self.provider == "bedrock":
+            # Application inference profile ARNs must use the converse route —
+            # the invoke route rejects ARNs with HTTP 400.
+            if anthropic_model.startswith("arn:aws:"):
+                return f"bedrock/converse/{anthropic_model}"
+
             normalized = _normalize_bedrock_profile_id(anthropic_model)
             if normalized and normalized in self._model_map:
                 return self._model_map[normalized]
@@ -681,6 +708,9 @@ async def send_message(
                 elif self.provider in ("vertex_ai", "vertex_ai_beta"):
                     kwargs["vertex_location"] = self.region
 
+            if self.provider == "bedrock" and self.profile_name:
+                kwargs["aws_profile_name"] = self.profile_name
+
             # Forward API key from request headers if present.
             # Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
             # Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
@@ -785,6 +815,9 @@ async def stream_message(
                 elif self.provider in ("vertex_ai", "vertex_ai_beta"):
                     kwargs["vertex_location"] = self.region
 
+            if self.provider == "bedrock" and self.profile_name:
+                kwargs["aws_profile_name"] = self.profile_name
+
             # Forward API key from request headers if present.
             # Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
             # Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
@@ -1009,6 +1042,9 @@ async def send_openai_message(
                 elif self.provider in ("vertex_ai", "vertex_ai_beta"):
                     kwargs["vertex_location"] = self.region
 
+            if self.provider == "bedrock" and self.profile_name:
+                kwargs["aws_profile_name"] = self.profile_name
+
             # Forward API key from request headers if present.
             # Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
             # Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.
@@ -1184,6 +1220,9 @@ async def stream_openai_message(
                 elif self.provider in ("vertex_ai", "vertex_ai_beta"):
                     kwargs["vertex_location"] = self.region
 
+            if self.provider == "bedrock" and self.profile_name:
+                kwargs["aws_profile_name"] = self.profile_name
+
             # Forward API key from request headers if present.
             # Skip for Bedrock/Vertex: they use env-based auth (AWS SigV4 / Google ADC).
             # Forwarding x-api-key (e.g. sk-ant-dummy) would override their credentials.

@@ -148,6 +148,7 @@ def create_proxy_backend(
     backend: str,
     anyllm_provider: str,
     bedrock_region: str | None,
+    bedrock_profile: str | None = None,
     logger: logging.Logger,
     openai_api_url: str | None = None,
     anyllm_backend_cls: Any | None = None,
@@ -181,7 +182,10 @@ def create_proxy_backend(
         provider = "vertex_ai"
     try:
         backend_cls = litellm_backend_cls or _load_litellm_backend()
-        instance = cast("Backend", backend_cls(provider=provider, region=bedrock_region))
+        instance = cast(
+            "Backend",
+            backend_cls(provider=provider, region=bedrock_region, profile_name=bedrock_profile),
+        )
         logger.info("LiteLLM backend enabled (provider=%s, region=%s)", provider, bedrock_region)
         return instance
     except ImportError as exc:

@@ -933,6 +933,7 @@ def _router_config_for(kompress_disabled: bool) -> ContentRouterConfig:
             backend=config.backend,
             anyllm_provider=config.anyllm_provider,
             bedrock_region=config.bedrock_region,
+            bedrock_profile=config.bedrock_profile,
             logger=logger,
             openai_api_url=config.openai_api_url,
             anyllm_backend_cls=AnyLLMBackend,