feat: #325 add invalid_final_output error handler#3728
Conversation
output_schema.validate_json raises ModelBehaviorError uncaught when the model's final message does not match output_type, ending the run with no way to recover. Wire it through the existing RunErrorHandlers extension point (already used for max_turns and model_refusal) so callers can opt in to a handler-supplied fallback instead of a hard crash. Default behavior is unchanged. With no handler registered, the run still raises.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3f0f5a0fb0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if output_schema and not output_schema.is_plain_text() and potential_final_output_text: | ||
| final_output = output_schema.validate_json(potential_final_output_text) | ||
| try: | ||
| final_output = output_schema.validate_json(potential_final_output_text) |
There was a problem hiding this comment.
Route empty structured messages to the handler
When a structured-output agent receives a final message whose output_text is "" (or all text parts are empty), ItemHelpers.extract_text() returns None, so the if ... potential_final_output_text guard is false and this new try/handler path is skipped. The run then issues another model call and can end as MaxTurnsExceeded, even though a configured invalid_final_output handler should be able to replace that invalid final message. Please distinguish no message from an empty message and dispatch the handler for empty text as well.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
I checked this and the observation is right, an empty structured message skips the new path because extract_text returns text or None. That is the pre-existing behavior though. Empty messages never reach validate_json, so nothing raises there today and the run proceeds to the next turn, which can recover naturally on the next model call, and if it exhausts turns the max_turns handler already gives callers a recovery hook. Firing invalid_final_output on empty text would change behavior on a path that raises nothing today and could end runs with a fallback where the next call would have succeeded. I kept the diff to the existing raise site and left that semantic call to the maintainers. I'm happy to extend the guard to empty text if that is the preferred behavior.
Summary
The tool-not-found case from the original report on #325 is already fixed by
RunConfig.tool_not_found_behavior, merged in #3461 and released in v0.17.4. What is still broken is the other failure reported later in that thread. When the model's final message does not validate againstoutput_type,output_schema.validate_jsoninturn_resolution.pyraisesModelBehaviorErrorstraight through and the whole run is lost. Several users on the issue hit exactly this with structured outputs.This PR adds an
invalid_final_outputkey to the existingRunErrorHandlershook, the same shape asmax_turnsandmodel_refusal. When final output validation fails and a handler is registered, the handler's fallback is validated against the output schema and the run completes with it, optionally recorded in history. With no handler registered, the original raise is unchanged.This is deliberately not a retry mechanism. #3587 was closed because retrying the turn can duplicate tool side effects and let streamed callers observe events from a discarded attempt. Nothing here replays a model call or re-executes anything. The validation error surfaces at the very end of the turn, after tools have already run, and the handler only decides what the final output should be. Callers who want to retry can do that from their own handler on their own terms.
docs/running_agents.mdgets a matching section next to the existingmodel_refusalentry. I flagged this scope in an intent comment on the issue before opening this PR.Test plan
Four new tests in
tests/test_max_turns.pyalongside the existingmodel_refusalhandler tests. They cover the default raise staying unchanged, a handler returning a structured fallback throughRunner.run,include_in_history=Falsekeeping the fallback message out of history, and the streaming path throughRunner.run_streamed.make format,make lint,make typecheck, andmake testsall pass, andmake build-docsbuilds clean for the docs change.Issue number
Related to #325. The original tool-not-found report there was fixed by #3461. This covers the remaining invalid JSON against
output_typecase.Checks
.agents/skills/code-change-verification/scripts/run.sh/reviewbefore submitting this PR