fix(bedrock): route ARNs via converse, named AWS profiles, and au. re…#1456
Conversation
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
left a comment
There was a problem hiding this comment.
The ARN-to-converse mapping and �u. prefix pieces look useful, but I found two blockers to fix before merge.\n\nFirst, --bedrock-profile is only applied to fetch_bedrock_inference_profiles(). The actual LiteLLM request paths still only pass �ws_region_name into �completion; they do not pass a profile/client/credentials value and this PR does not set AWS_PROFILE for the process. That means startup discovery can use the named SSO profile, while the real Bedrock completion still uses ambient/default credentials. Please wire the profile through the actual Bedrock LiteLLM calls (or explicitly set/document the environment mechanism and test it), and add a regression that captures the completion kwargs for a named profile.\n\nSecond, the new Claude Code Bedrock guide contradicts itself. The TL;DR says ANTHROPIC_DEFAULTMODEL must use standard Claude names and not ARNs because ARNs make Claude Code bypass ANTHROPIC_BASE_URL; later the application-inference-profile section says to pass ARN values directly in ANTHROPIC_DEFAULT_MODEL. Those cannot both be safe. Please make the guide consistent with the verified path so users do not accidentally bypass Headroom while thinking they are compressed.
JerrettDavis
left a comment
There was a problem hiding this comment.
The follow-up resolves the blockers I raised. profile_name now persists on the backend and is passed as aws_profile_name through the Anthropic, streaming Anthropic, OpenAI, and streaming OpenAI LiteLLM paths, with regression coverage around the completion kwargs. The guide is also internally consistent now about running Claude Code in normal Anthropic mode and letting Headroom handle Bedrock. I do not see a remaining blocker.
…erence-profile-arn-support
## Description `pip install headroom-ai[bedrock]` cannot serve users who authenticate with `aws login` (IAM Identity Provider / console-login, DPoP). Resolving those credentials requires the AWS Common Runtime (CRT); without `awscrt`, botocore raises `MissingDependencyException`. The AWS docs state the requirement as: **"Boto3 version 1.41.0 or later with AWS Common Runtime (CRT)"** — i.e. both a modern boto3 floor and CRT (installed via the `[crt]` extra). ## Type of Change - [x] Bug fix (non-breaking) ## Changes Made - `pyproject.toml` `bedrock` extra: bump `boto3>=1.28.0` → `boto3>=1.41.0`, add `botocore[crt]>=1.41.0` (installs `awscrt`). - `uv.lock`: regenerated — adds `awscrt`, resolves `boto3` to 1.42.x. No code changes — the bedrock backend already passes `aws_profile_name` through to the LiteLLM calls (via #1456); this just makes the installed dependencies actually able to resolve `aws login` credentials. ## Impact - **`aws login` (IAM Identity Provider / DPoP):** now works — awscrt present. - **`aws sso login` (classic Identity Center):** unaffected (already worked). - **static keys (`~/.aws/credentials`):** unaffected. - Bumping the boto3 floor only affects the optional `[bedrock]` extra; bedrock users benefit from a current boto3 regardless. ## Testing Dependency-only change. `uv lock` resolves cleanly (257 packages, awscrt 0.29.2, boto3 1.42.38). No runtime code path altered, so existing bedrock tests are unaffected. ## Checklist - [x] Self-review performed - [x] No new warnings - [x] Linting passes ## Additional Notes Focused on the dependency gap only. ARN routing / named-profile wiring / docs are handled in #1456; pricing in #1485.
…erence-profile-arn-support
JerrettDavis
left a comment
There was a problem hiding this comment.
Reviewed the Bedrock ARN/profile changes. ARNs route through �edrock/converse, the named AWS profile is used both for discovery and LiteLLM calls, and the cache key includes profile so credentials do not cross-contaminate model maps. This will need a careful rebase with the newer botocore preflight work in #1553, but I do not see a code blocker in this PR itself.
JerrettDavis
left a comment
There was a problem hiding this comment.
Applied a style-only formatter commit to the Bedrock profile tests ( ests/test_bedrock_region.py, ests/test_vertex_claude_compression.py) to address the current
uff format --check failure. The other red test shard was canceled by a runner shutdown signal before any assertion failure; I do not see a code-level blocker in the Bedrock ARN/profile changes.
# Conflicts: # headroom/backends/litellm.py
JerrettDavis
left a comment
There was a problem hiding this comment.
Reviewed the latest Bedrock head after the formatter, boto3 Session test fix, and merge-from-main commits. The implementation still routes Bedrock ARNs through bedrock/converse, forwards aws_profile_name on the real LiteLLM call paths, keeps profile-specific model-map caching isolated, and the updated tests now mock the same boto3.Session(...).client(...) shape used by the code.
The current CI/security/docs checks are green; no additional code changes requested.
🤖 I have created a release *beep* *boop* --- <details><summary>0.29.0</summary> ## [0.29.0](v0.28.0...v0.29.0) (2026-07-03) ### Features * **proxy:** add --lossless no-CCR mode with format-native compaction ([#1721](#1721)) ([c75ebde](c75ebde)) * **stats:** surface Codex WS compression counters in /stats summary ([#1680](#1680)) ([2fe19c3](2fe19c3)) * **transforms:** adaptive Otsu KEEP/DROP threshold (+ land relevance split on main) ([#1726](#1726)) ([eea667a](eea667a)) ### Bug Fixes * **bedrock:** fail fast when session-token auth lacks botocore ([#1553](#1553)) ([54cfa36](54cfa36)) * **bedrock:** route ARNs via converse, named AWS profiles, and au. re… ([#1456](#1456)) ([7d87aa2](7d87aa2)) * **ccr:** honor workspace dir for sqlite store ([#1564](#1564)) ([96e1dfe](96e1dfe)) * **claude:** surface Remote Control proxy incompatibility ([#1610](#1610)) ([4bf7f92](4bf7f92)) * **cli:** stop advertising unwired compression tuning env vars in banner ([#1634](#1634)) ([d5bf98d](d5bf98d)) * **codex:** avoid duplicate headroom provider config ([#1431](#1431)) ([ddd4adf](ddd4adf)) * **compression:** reject lossy unmarked tool output in unit router path ([#1479](#1479)) ([de24cd5](de24cd5)) * **cortex-code:** migrate to current Cortex REST API endpoints + add e2e benchmarks ([#1474](#1474)) ([f00ace6](f00ace6)) * **dashboard:** align token savings headline denominator ([#1653](#1653)) ([646e705](646e705)) * **dashboard:** derive per-project setup URL from live origin ([#1511](#1511)) ([e035aef](e035aef)) * **detection:** contain unidiff panic on orphaned +++ target line ([#1548](#1548)) ([e386c09](e386c09)) * **evals:** CJK-aware F1 tokenization + token estimation ([#1527](#1527)) ([99a8540](99a8540)) * **install:** close parent log fd in start_detached_agent ([#1576](#1576)) ([816cb85](816cb85)) * **install:** use Windows-safe PID liveness probe in runtime_status ([#1544](#1544)) ([#1560](#1560)) ([6b227b9](6b227b9)) * **learn:** aggregate verbosity baselines across projects instead of overwriting ([#1288](#1288)) ([27a5468](27a5468)) * **mcp:** show lifetime totals and label rolling session scope in headroom_stats ([#1428](#1428)) ([1c0e152](1c0e152)) * **memory:** cap local embedder CPU thread oversubscription ([#198](#198)) ([#1559](#1559)) ([b84afbf](b84afbf)) * **memory:** singleflight LocalBackend init to stop cold-start races ([#1691](#1691)) ([bec47a1](bec47a1)) * **openclaw:** detect uv-installed headroom binary in ~/.local/bin ([#1459](#1459)) ([adaeb88](adaeb88)) * **opencode:** preserve custom OpenAI gateway paths ([#1596](#1596)) ([c19347c](c19347c)) * **opencode:** route native providers + load transport plugin, fix Serena context ([#1573](#1573)) ([ad0034f](ad0034f)) * preserve anthropic passthrough tool order ([#1427](#1427)) ([a932247](a932247)) * **proxy/auth:** match real Anthropic OAuth token prefix (sk-ant-oat) ([#1672](#1672)) ([8cddf9b](8cddf9b)) * **proxy:** expose persistent savings metrics ([#1647](#1647)) ([5fe4e7b](5fe4e7b)) * **proxy:** fail open when kompress saturation would exhaust pre-upstream budget ([#1430](#1430)) ([15ac650](15ac650)) * **proxy:** handle streaming CCR retrieval ([#1451](#1451)) ([d337e3b](d337e3b)) * **proxy:** include system/tools/sampling in cache key ([#1473](#1473)) ([312129a](312129a)) * **proxy:** preserve Responses passthrough bytes ([#1598](#1598)) ([2a34a82](2a34a82)) * **proxy:** strip Codex lite header on the HTTP /responses path ([#1663](#1663)) ([9fbd47b](9fbd47b)) * **proxy:** wire --compression-max-workers / HEADROOM_COMPRESSION_MAX_WORKERS ([#1632](#1632)) ([814ffa3](814ffa3)) * **savings:** count cache-read tokens in input cost estimate ([#1429](#1429)) ([72ade37](72ade37)) * skip Magika backend on x86 CPUs without AVX2 ([#1162](#1162)) ([64783d8](64783d8)) * **transforms/content-router:** route grep/log output away from HTML extractor ([#1719](#1719)) ([0d18ef2](0d18ef2)) * **transforms:** bound native content detection with a Windows watchdog ([#575](#575)) ([#1563](#1563)) ([95abca3](95abca3)) * Vertex AI support for Claude Code with ANTHROPIC_VERTEX_BASE_URL ([#1393](#1393)) ([cff7247](cff7247)) * **wrap:** detach the shared proxy on Windows so it survives an ungraceful agent close ([#1464](#1464)) ([6cba441](6cba441)) * **wrap:** preserve custom Vertex base URL ([#1477](#1477)) ([75427bb](75427bb)) * **wrap:** remove rtk instructions from Codex AGENTS.md on unwrap ([#1604](#1604)) ([c9d717c](c9d717c)) </details> --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Description
Fix three related gaps in Bedrock support that prevented headroom from working with Claude Code when
CLAUDE_CODE_USE_BEDROCK=0andANTHROPIC_BASE_URLis pointed at the proxy:ARN passthrough used the wrong LiteLLM route — application inference profile ARNs (e.g.
arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>) were forwarded asbedrock/<arn>, which LiteLLM rejects with HTTP 400 "Try calling via converse route". Fixed tobedrock/converse/<arn>.Named AWS profile not forwarded to completion calls —
--bedrock-profilewas wired through the CLI → config →LiteLLMBackend.__init__and used to fetch the model map at startup,but never stored on
self. All fouracompletion()call sites (send_message,stream_message,send_openai_message,stream_openai_message) passed onlyaws_region_name— theactual Bedrock calls used ambient credentials regardless of the flag. Fixed by storing
self.profile_nameand passingaws_profile_name=to everyacompletion()call.ap-southeast-2used the wrong region prefix — Australia should useau.for cross-region inference profile IDs, notapac.. Addedap-southeast-2 → "au"to_BEDROCK_REGION_PREFIXESand"au."to the strip list in_normalize_bedrock_profile_id.Closes #
Type of Change
Changes Made
backends/litellm.py: routearn:aws:model IDs viabedrock/converse/<arn>inmap_model_idbackends/litellm.py: storeprofile_nameasself.profile_nameinLiteLLMBackend.__init__; passaws_profile_name=toacompletion()in all four call sites; useboto3.Session(profile_name=...)for startup discovery; cache key isregion:profile_nameto prevent cross-profile collisionsbackends/litellm.py: addap-southeast-2 → "au"to_BEDROCK_REGION_PREFIXES; add"au."to prefix strip list in_normalize_bedrock_profile_idproviders/registry.py: passprofile_name=bedrock_profiletoLiteLLMBackendproxy/server.py: passconfig.bedrock_profiletocreate_proxy_backenddocs/claude-code-bedrock-headroom.md: remove false claim that ARNs inANTHROPIC_DEFAULT_*_MODELbypass the proxy; fix troubleshooting tabletests/test_bedrock_region.py: updatetest_arn_passthroughto expectbedrock/converse/<arn>; update cache key format; addtest_profile_cache_isolation,test_ap_southeast_2_uses_au_prefix, andTestBedrockProfileForwardedToCompletion(3 async tests assertingaws_profile_nameappears inacompletion()kwargs for named profiles and isabsent for the no-profile case)
tests/test_provider_registry*.py,test_vertex_claude_compression.py: updatelitellm_backend_clsstubs to acceptprofile_name=NoneTesting
pytest)Test Output
Note: 3 deselected tests use patch("builtins.import") which hangs under Python 3.13 — pre-existing issue unrelated to these changes.
Real Behavior Proof
CLAUDE_CODE_USE_BEDROCK=0,ANTHROPIC_BASE_URL=http://127.0.0.1:8787, AWS ap-southeast-2, application inference profile ARNs inANTHROPIC_DEFAULT_*_MODELheadroom proxy --port 8787 --backend bedrock --region ap-southeast-2 --bedrock-profile "my-sso-profile"bedrock/converse/arn:aws:bedrock:ap-southeast-2:...:application-inference-profile/<id>as confirmed in LiteLLM logsReview Readiness
Checklist
Additional Notes
The 3 skipped tests (
test_fallback_when_boto3_import_fails,test_fallback_when_api_call_fails,test_successful_fetch) pre-exist in the repo and usepatch("builtins.__import__")which hangs under Python 3.13. Not affected by these changes.