fix(proxy/openai): translate max_tokens -> max_completion_tokens on chat path#1774
Open
chopratejas wants to merge 1 commit into
Open
fix(proxy/openai): translate max_tokens -> max_completion_tokens on chat path#1774chopratejas wants to merge 1 commit into
chopratejas wants to merge 1 commit into
Conversation
…hat path
GPT-5 / o-series chat models reject the legacy `max_tokens` ("Unsupported
parameter: 'max_tokens' is not supported with this model. Use
'max_completion_tokens' instead."); gpt-4o/4.1 accept `max_completion_tokens`
too. openai-compatible clients (opencode via @ai-sdk/openai-compatible, older
SDKs) still send `max_tokens`, so requests for GPT-5 models fail at the proxy's
OpenAI upstream — which is exactly what blocked a live opencode run.
The proxy already owns the outbound chat/completions body (it rewrites messages
to compress them), so translate the token param there: rename `max_tokens` ->
`max_completion_tokens` when the newer form isn't already set, then drop the
rejected legacy key. Safe one-way shim for current OpenAI models; no-op when the
client already sends max_completion_tokens. Responses path (max_output_tokens)
is unaffected.
Contributor
PR governanceThis PR follows the template and is marked ready for human review. |
JerrettDavis
approved these changes
Jul 3, 2026
JerrettDavis
left a comment
Collaborator
There was a problem hiding this comment.
Reviewed the chat-path shim and tests. The normalization is scoped to the outbound OpenAI chat body, preserves an explicit max_completion_tokens value, drops the unsupported legacy key, and has focused coverage for the edge cases. CI is green and this looks ready to merge.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
GPT-5 / o-series chat models reject the legacy
max_tokens—AI_APICallError: Unsupported parameter: 'max_tokens' is not supported with this model. Use 'max_completion_tokens' instead.— while gpt-4o/4.1 acceptmax_completion_tokenstoo. openai-compatible clients (opencode via@ai-sdk/openai-compatible, older SDKs) still sendmax_tokens, so requests for GPT-5 models fail at the proxy's OpenAI upstream. This is a blocker for any such client pointed at a GPT-5 model through Headroom.The proxy already owns the outbound
/v1/chat/completionsbody (it rewritesmessagesto compress them), so translate the token param there: renamemax_tokens→max_completion_tokenswhen the newer form isn't already set, then drop the rejected legacy key. One-way, safe for current OpenAI models; no-op when the client already sendsmax_completion_tokens. The Responses path (max_output_tokens) is unaffected.Closes #
Type of Change
Changes Made
_normalize_openai_max_tokens(body)helper + call inhandle_openai_chatafter body finalization, before upstream forward.Testing
pytest)ruff check)mypy headroom)Test Output
Real Behavior Proof
@ai-sdk/openai-compatible→ Headroom proxy) targetinggpt-5.3-chat-latestfailed withUnsupported parameter: 'max_tokens' ... Use 'max_completion_tokens'in the DEBUG stream log. The shim renames the param on the outbound body.runstalls for unrelated reasons in this env — separate from this param fix).Review Readiness
Additional Notes
Discovered while debugging why opencode wouldn't run through the proxy: three layered blockers — (1) missing
modelsmap in the injected provider config [PR #1716], (2) noapiKeyin the injected config / HTTP path doesn't injectOPENAI_API_KEYlike the WS path does, (3) thismax_tokensvsmax_completion_tokensmismatch. This PR addresses (3).