Skip to content

fix(bedrock): route ARNs via converse, named AWS profiles, and au. re…#1456

Merged
JerrettDavis merged 8 commits into
headroomlabs-ai:mainfrom
mhaitana:feat/bedrock-application-inference-profile-arn-support
Jul 3, 2026
Merged

fix(bedrock): route ARNs via converse, named AWS profiles, and au. re…#1456
JerrettDavis merged 8 commits into
headroomlabs-ai:mainfrom
mhaitana:feat/bedrock-application-inference-profile-arn-support

Conversation

@mhaitana

@mhaitana mhaitana commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

Description

Fix three related gaps in Bedrock support that prevented headroom from working with Claude Code when CLAUDE_CODE_USE_BEDROCK=0 and ANTHROPIC_BASE_URL is pointed at the proxy:

  1. ARN passthrough used the wrong LiteLLM route — application inference profile ARNs (e.g. arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>) were forwarded as bedrock/<arn>, which LiteLLM rejects with HTTP 400 "Try calling via converse route". Fixed to bedrock/converse/<arn>.

  2. Named AWS profile not forwarded to completion calls--bedrock-profile was wired through the CLI → config → LiteLLMBackend.__init__ and used to fetch the model map at startup,
    but never stored on self. All four acompletion() call sites (send_message, stream_message, send_openai_message, stream_openai_message) passed only aws_region_name — the
    actual Bedrock calls used ambient credentials regardless of the flag. Fixed by storing self.profile_name and passing aws_profile_name= to every acompletion() call.

  3. ap-southeast-2 used the wrong region prefix — Australia should use au. for cross-region inference profile IDs, not apac.. Added ap-southeast-2 → "au" to _BEDROCK_REGION_PREFIXES and "au." to the strip list in _normalize_bedrock_profile_id.

Closes #

Type of Change

  • Bug fix (non-breaking change that fixes an issue)

Changes Made

  • backends/litellm.py: route arn:aws: model IDs via bedrock/converse/<arn> in map_model_id
  • backends/litellm.py: store profile_name as self.profile_name in LiteLLMBackend.__init__; pass aws_profile_name= to acompletion() in all four call sites; use
    boto3.Session(profile_name=...) for startup discovery; cache key is region:profile_name to prevent cross-profile collisions
  • backends/litellm.py: add ap-southeast-2 → "au" to _BEDROCK_REGION_PREFIXES; add "au." to prefix strip list in _normalize_bedrock_profile_id
  • providers/registry.py: pass profile_name=bedrock_profile to LiteLLMBackend
  • proxy/server.py: pass config.bedrock_profile to create_proxy_backend
  • docs/claude-code-bedrock-headroom.md: remove false claim that ARNs in ANTHROPIC_DEFAULT_*_MODEL bypass the proxy; fix troubleshooting table
  • tests/test_bedrock_region.py: update test_arn_passthrough to expect bedrock/converse/<arn>; update cache key format; add test_profile_cache_isolation,
    test_ap_southeast_2_uses_au_prefix, and TestBedrockProfileForwardedToCompletion (3 async tests asserting aws_profile_name appears in acompletion() kwargs for named profiles and is
    absent for the no-profile case)
  • tests/test_provider_registry*.py, test_vertex_claude_compression.py: update litellm_backend_cls stubs to accept profile_name=None

Testing

  • Unit tests pass (pytest)
  • New tests added for new functionality
  • Manual testing performed

Test Output

$ pytest tests/test_bedrock_region.py tests/test_provider_registry.py tests/test_provider_registry_extended.py \
    -k "not test_fallback_when_boto3_import_fails and not test_fallback_when_api_call_fails and not test_successful_fetch" -q
collected 51 items / 3 deselected / 48 selected

tests/test_bedrock_region.py ...........................
tests/test_provider_registry.py ...........
tests/test_provider_registry_extended.py .......

48 passed, 3 deselected in 2.00s

Note: 3 deselected tests use patch("builtins.import") which hangs under Python 3.13 — pre-existing issue unrelated to these changes.

Real Behavior Proof

  • Environment: macOS, Python 3.13, Claude Code with CLAUDE_CODE_USE_BEDROCK=0, ANTHROPIC_BASE_URL=http://127.0.0.1:8787, AWS ap-southeast-2, application inference profile ARNs in ANTHROPIC_DEFAULT_*_MODEL
  • Exact command / steps: headroom proxy --port 8787 --backend bedrock --region ap-southeast-2 --bedrock-profile "my-sso-profile"
  • Observed result: Requests routed correctly to bedrock/converse/arn:aws:bedrock:ap-southeast-2:...:application-inference-profile/<id> as confirmed in LiteLLM logs
  • Not tested: EU/APAC region ARN passthrough (logic is identical); non-SSO credential flows
15:29:44 - LiteLLM:INFO: utils.py:4090 - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock
2026-06-26 15:29:44,322 - LiteLLM - INFO - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock
15:31:09 - LiteLLM:INFO: utils.py:4090 - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock
2026-06-26 15:31:09,928 - LiteLLM - INFO - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock
15:34:26 - LiteLLM:INFO: utils.py:4090 - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock
2026-06-26 15:34:26,811 - LiteLLM - INFO - 
LiteLLM completion() model= converse/arn:aws:bedrock:ap-southeast-2:<account>:application-inference-profile/<id>; provider = bedrock

Review Readiness

  • I have performed a self-review
  • This PR is ready for human review

Checklist

  • My code follows the project's style guidelines
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes

Additional Notes

The 3 skipped tests (test_fallback_when_boto3_import_fails, test_fallback_when_api_call_fails, test_successful_fetch) pre-exist in the repo and use patch("builtins.__import__") which hangs under Python 3.13. Not affected by these changes.

@github-actions

github-actions Bot commented Jun 26, 2026

Copy link
Copy Markdown
Contributor

PR governance

This PR follows the template and is marked ready for human review.

@github-actions github-actions Bot added status: needs author action Pull request body or readiness checklist still needs author updates status: ready for review Pull request body is complete and the author marked it ready for human review and removed status: needs author action Pull request body or readiness checklist still needs author updates labels Jun 26, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ARN-to-converse mapping and �u. prefix pieces look useful, but I found two blockers to fix before merge.\n\nFirst, --bedrock-profile is only applied to fetch_bedrock_inference_profiles(). The actual LiteLLM request paths still only pass �ws_region_name into �completion; they do not pass a profile/client/credentials value and this PR does not set AWS_PROFILE for the process. That means startup discovery can use the named SSO profile, while the real Bedrock completion still uses ambient/default credentials. Please wire the profile through the actual Bedrock LiteLLM calls (or explicitly set/document the environment mechanism and test it), and add a regression that captures the completion kwargs for a named profile.\n\nSecond, the new Claude Code Bedrock guide contradicts itself. The TL;DR says ANTHROPIC_DEFAULTMODEL must use standard Claude names and not ARNs because ARNs make Claude Code bypass ANTHROPIC_BASE_URL; later the application-inference-profile section says to pass ARN values directly in ANTHROPIC_DEFAULT_MODEL. Those cannot both be safe. Please make the guide consistent with the verified path so users do not accidentally bypass Headroom while thinking they are compressed.

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow-up resolves the blockers I raised. profile_name now persists on the backend and is passed as aws_profile_name through the Anthropic, streaming Anthropic, OpenAI, and streaming OpenAI LiteLLM paths, with regression coverage around the completion kwargs. The guide is also internally consistent now about running Claude Code in normal Anthropic mode and letting Headroom handle Bedrock. I do not see a remaining blocker.

chopratejas pushed a commit that referenced this pull request Jun 28, 2026
## Description

`pip install headroom-ai[bedrock]` cannot serve users who authenticate
with `aws login` (IAM Identity Provider / console-login, DPoP).
Resolving those credentials requires the AWS Common Runtime (CRT);
without `awscrt`, botocore raises `MissingDependencyException`.

The AWS docs state the requirement as: **"Boto3 version 1.41.0 or later
with AWS Common Runtime (CRT)"** — i.e. both a modern boto3 floor and
CRT (installed via the `[crt]` extra).

## Type of Change

- [x] Bug fix (non-breaking)

## Changes Made

- `pyproject.toml` `bedrock` extra: bump `boto3>=1.28.0` →
`boto3>=1.41.0`, add `botocore[crt]>=1.41.0` (installs `awscrt`).
- `uv.lock`: regenerated — adds `awscrt`, resolves `boto3` to 1.42.x.

No code changes — the bedrock backend already passes `aws_profile_name`
through to the LiteLLM calls (via #1456); this just makes the installed
dependencies actually able to resolve `aws login` credentials.

## Impact

- **`aws login` (IAM Identity Provider / DPoP):** now works — awscrt
present.
- **`aws sso login` (classic Identity Center):** unaffected (already
worked).
- **static keys (`~/.aws/credentials`):** unaffected.
- Bumping the boto3 floor only affects the optional `[bedrock]` extra;
bedrock users benefit from a current boto3 regardless.

## Testing

Dependency-only change. `uv lock` resolves cleanly (257 packages, awscrt
0.29.2, boto3 1.42.38). No runtime code path altered, so existing
bedrock tests are unaffected.

## Checklist

- [x] Self-review performed
- [x] No new warnings
- [x] Linting passes

## Additional Notes

Focused on the dependency gap only. ARN routing / named-profile wiring /
docs are handled in #1456; pricing in #1485.

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the Bedrock ARN/profile changes. ARNs route through �edrock/converse, the named AWS profile is used both for discovery and LiteLLM calls, and the cache key includes profile so credentials do not cross-contaminate model maps. This will need a careful rebase with the newer botocore preflight work in #1553, but I do not see a code blocker in this PR itself.

@github-actions github-actions Bot added status: ci failing Required or reported CI checks are failing and removed status: ready for review Pull request body is complete and the author marked it ready for human review labels Jul 1, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied a style-only formatter commit to the Bedrock profile tests ( ests/test_bedrock_region.py, ests/test_vertex_claude_compression.py) to address the current
uff format --check failure. The other red test shard was canceled by a runner shutdown signal before any assertion failure; I do not see a code-level blocker in the Bedrock ARN/profile changes.

@github-actions github-actions Bot added status: ready for review Pull request body is complete and the author marked it ready for human review and removed status: ci failing Required or reported CI checks are failing labels Jul 2, 2026

@JerrettDavis JerrettDavis left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed the latest Bedrock head after the formatter, boto3 Session test fix, and merge-from-main commits. The implementation still routes Bedrock ARNs through bedrock/converse, forwards aws_profile_name on the real LiteLLM call paths, keeps profile-specific model-map caching isolated, and the updated tests now mock the same boto3.Session(...).client(...) shape used by the code.

The current CI/security/docs checks are green; no additional code changes requested.

@JerrettDavis JerrettDavis merged commit 7d87aa2 into headroomlabs-ai:main Jul 3, 2026
31 checks passed
@github-actions github-actions Bot mentioned this pull request Jul 3, 2026
chopratejas pushed a commit that referenced this pull request Jul 3, 2026
🤖 I have created a release *beep* *boop*
---


<details><summary>0.29.0</summary>

##
[0.29.0](v0.28.0...v0.29.0)
(2026-07-03)


### Features

* **proxy:** add --lossless no-CCR mode with format-native compaction
([#1721](#1721))
([c75ebde](c75ebde))
* **stats:** surface Codex WS compression counters in /stats summary
([#1680](#1680))
([2fe19c3](2fe19c3))
* **transforms:** adaptive Otsu KEEP/DROP threshold (+ land relevance
split on main)
([#1726](#1726))
([eea667a](eea667a))


### Bug Fixes

* **bedrock:** fail fast when session-token auth lacks botocore
([#1553](#1553))
([54cfa36](54cfa36))
* **bedrock:** route ARNs via converse, named AWS profiles, and au. re…
([#1456](#1456))
([7d87aa2](7d87aa2))
* **ccr:** honor workspace dir for sqlite store
([#1564](#1564))
([96e1dfe](96e1dfe))
* **claude:** surface Remote Control proxy incompatibility
([#1610](#1610))
([4bf7f92](4bf7f92))
* **cli:** stop advertising unwired compression tuning env vars in
banner
([#1634](#1634))
([d5bf98d](d5bf98d))
* **codex:** avoid duplicate headroom provider config
([#1431](#1431))
([ddd4adf](ddd4adf))
* **compression:** reject lossy unmarked tool output in unit router path
([#1479](#1479))
([de24cd5](de24cd5))
* **cortex-code:** migrate to current Cortex REST API endpoints + add
e2e benchmarks
([#1474](#1474))
([f00ace6](f00ace6))
* **dashboard:** align token savings headline denominator
([#1653](#1653))
([646e705](646e705))
* **dashboard:** derive per-project setup URL from live origin
([#1511](#1511))
([e035aef](e035aef))
* **detection:** contain unidiff panic on orphaned +++ target line
([#1548](#1548))
([e386c09](e386c09))
* **evals:** CJK-aware F1 tokenization + token estimation
([#1527](#1527))
([99a8540](99a8540))
* **install:** close parent log fd in start_detached_agent
([#1576](#1576))
([816cb85](816cb85))
* **install:** use Windows-safe PID liveness probe in runtime_status
([#1544](#1544))
([#1560](#1560))
([6b227b9](6b227b9))
* **learn:** aggregate verbosity baselines across projects instead of
overwriting
([#1288](#1288))
([27a5468](27a5468))
* **mcp:** show lifetime totals and label rolling session scope in
headroom_stats
([#1428](#1428))
([1c0e152](1c0e152))
* **memory:** cap local embedder CPU thread oversubscription
([#198](#198))
([#1559](#1559))
([b84afbf](b84afbf))
* **memory:** singleflight LocalBackend init to stop cold-start races
([#1691](#1691))
([bec47a1](bec47a1))
* **openclaw:** detect uv-installed headroom binary in ~/.local/bin
([#1459](#1459))
([adaeb88](adaeb88))
* **opencode:** preserve custom OpenAI gateway paths
([#1596](#1596))
([c19347c](c19347c))
* **opencode:** route native providers + load transport plugin, fix
Serena context
([#1573](#1573))
([ad0034f](ad0034f))
* preserve anthropic passthrough tool order
([#1427](#1427))
([a932247](a932247))
* **proxy/auth:** match real Anthropic OAuth token prefix (sk-ant-oat)
([#1672](#1672))
([8cddf9b](8cddf9b))
* **proxy:** expose persistent savings metrics
([#1647](#1647))
([5fe4e7b](5fe4e7b))
* **proxy:** fail open when kompress saturation would exhaust
pre-upstream budget
([#1430](#1430))
([15ac650](15ac650))
* **proxy:** handle streaming CCR retrieval
([#1451](#1451))
([d337e3b](d337e3b))
* **proxy:** include system/tools/sampling in cache key
([#1473](#1473))
([312129a](312129a))
* **proxy:** preserve Responses passthrough bytes
([#1598](#1598))
([2a34a82](2a34a82))
* **proxy:** strip Codex lite header on the HTTP /responses path
([#1663](#1663))
([9fbd47b](9fbd47b))
* **proxy:** wire --compression-max-workers /
HEADROOM_COMPRESSION_MAX_WORKERS
([#1632](#1632))
([814ffa3](814ffa3))
* **savings:** count cache-read tokens in input cost estimate
([#1429](#1429))
([72ade37](72ade37))
* skip Magika backend on x86 CPUs without AVX2
([#1162](#1162))
([64783d8](64783d8))
* **transforms/content-router:** route grep/log output away from HTML
extractor
([#1719](#1719))
([0d18ef2](0d18ef2))
* **transforms:** bound native content detection with a Windows watchdog
([#575](#575))
([#1563](#1563))
([95abca3](95abca3))
* Vertex AI support for Claude Code with ANTHROPIC_VERTEX_BASE_URL
([#1393](#1393))
([cff7247](cff7247))
* **wrap:** detach the shared proxy on Windows so it survives an
ungraceful agent close
([#1464](#1464))
([6cba441](6cba441))
* **wrap:** preserve custom Vertex base URL
([#1477](#1477))
([75427bb](75427bb))
* **wrap:** remove rtk instructions from Codex AGENTS.md on unwrap
([#1604](#1604))
([c9d717c](c9d717c))
</details>

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

status: ready for review Pull request body is complete and the author marked it ready for human review

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants