Skip to content

feat: #325 add invalid_final_output error handler#3728

Open
pablopupo wants to merge 1 commit into
openai:mainfrom
pablopupo:feat/325-retry-model-behavior-error
Open

feat: #325 add invalid_final_output error handler#3728
pablopupo wants to merge 1 commit into
openai:mainfrom
pablopupo:feat/325-retry-model-behavior-error

Conversation

@pablopupo

Copy link
Copy Markdown

Summary

The tool-not-found case from the original report on #325 is already fixed by RunConfig.tool_not_found_behavior, merged in #3461 and released in v0.17.4. What is still broken is the other failure reported later in that thread. When the model's final message does not validate against output_type, output_schema.validate_json in turn_resolution.py raises ModelBehaviorError straight through and the whole run is lost. Several users on the issue hit exactly this with structured outputs.

This PR adds an invalid_final_output key to the existing RunErrorHandlers hook, the same shape as max_turns and model_refusal. When final output validation fails and a handler is registered, the handler's fallback is validated against the output schema and the run completes with it, optionally recorded in history. With no handler registered, the original raise is unchanged.

This is deliberately not a retry mechanism. #3587 was closed because retrying the turn can duplicate tool side effects and let streamed callers observe events from a discarded attempt. Nothing here replays a model call or re-executes anything. The validation error surfaces at the very end of the turn, after tools have already run, and the handler only decides what the final output should be. Callers who want to retry can do that from their own handler on their own terms.

docs/running_agents.md gets a matching section next to the existing model_refusal entry. I flagged this scope in an intent comment on the issue before opening this PR.

Test plan

Four new tests in tests/test_max_turns.py alongside the existing model_refusal handler tests. They cover the default raise staying unchanged, a handler returning a structured fallback through Runner.run, include_in_history=False keeping the fallback message out of history, and the streaming path through Runner.run_streamed. make format, make lint, make typecheck, and make tests all pass, and make build-docs builds clean for the docs change.

Issue number

Related to #325. The original tool-not-found report there was fixed by #3461. This covers the remaining invalid JSON against output_type case.

Checks

  • I've added new tests, if relevant
  • I've run .agents/skills/code-change-verification/scripts/run.sh
  • I've confirmed all verification steps pass
  • If using Codex, I've run /review before submitting this PR

output_schema.validate_json raises ModelBehaviorError uncaught when the
model's final message does not match output_type, ending the run with no
way to recover. Wire it through the existing RunErrorHandlers extension
point (already used for max_turns and model_refusal) so callers can
opt in to a handler-supplied fallback instead of a hard crash. Default
behavior is unchanged. With no handler registered, the run still raises.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f0f5a0fb0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 809 to +811
if output_schema and not output_schema.is_plain_text() and potential_final_output_text:
final_output = output_schema.validate_json(potential_final_output_text)
try:
final_output = output_schema.validate_json(potential_final_output_text)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Route empty structured messages to the handler

When a structured-output agent receives a final message whose output_text is "" (or all text parts are empty), ItemHelpers.extract_text() returns None, so the if ... potential_final_output_text guard is false and this new try/handler path is skipped. The run then issues another model call and can end as MaxTurnsExceeded, even though a configured invalid_final_output handler should be able to replace that invalid final message. Please distinguish no message from an empty message and dispatch the handler for empty text as well.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I checked this and the observation is right, an empty structured message skips the new path because extract_text returns text or None. That is the pre-existing behavior though. Empty messages never reach validate_json, so nothing raises there today and the run proceeds to the next turn, which can recover naturally on the next model call, and if it exhausts turns the max_turns handler already gives callers a recovery hook. Firing invalid_final_output on empty text would change behavior on a path that raises nothing today and could end runs with a fallback where the next call would have succeeded. I kept the diff to the existing raise site and left that semantic call to the maintainers. I'm happy to extend the guard to empty text if that is the preferred behavior.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant