feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721
feat(realtime): add input guardrails for RealtimeAgent and RealtimeRunConfig#3721Skyline-9 wants to merge 4 commits into
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1b2c6fc6b1
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for guardrail in input_guardrails: | ||
| try: | ||
| result = await guardrail.run( | ||
| # TODO (rm) Remove this cast, it's wrong | ||
| cast(Agent[Any], self._current_agent), | ||
| text, | ||
| self._context_wrapper, | ||
| ) | ||
| if result.output.tripwire_triggered: | ||
| triggered_results.append(result) |
There was a problem hiding this comment.
Run realtime input guardrails concurrently
When more than one input guardrail is configured, this loop awaits them serially and only cancels after all earlier guardrails have completed. If a slow/model-backed guardrail comes before one that would trip, the unsafe user transcript can continue generating for that guardrail's entire latency, which largely defeats the forced response cancellation. Please run the input guardrails concurrently or interrupt as soon as the first tripwire result is available.
Useful? React with 👍 / 👎.
| combined_guardrails = self._current_agent.input_guardrails + self._run_config.get( | ||
| "input_guardrails", [] | ||
| ) |
There was a problem hiding this comment.
Snapshot the agent for queued input guardrails
Because this background task re-reads self._current_agent when it eventually runs, a session that calls update_agent() or completes a handoff before the task gets CPU can check agent A's transcript using agent B's input guardrails, or no agent-level guardrails at all. That silently bypasses the guardrails configured on the agent that received the transcribed input; capture the agent/guardrail list when handling the transcription event and pass that snapshot into the task.
Useful? React with 👍 / 👎.
|
@codex review |
1b2c6fc to
bedd7a9
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bedd7a9c37
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
|
|
||
| # Run the guardrails concurrently so a slow guardrail cannot delay the forced cancel behind | ||
| # unrelated guardrails, which would let the unsafe turn keep generating. | ||
| results = await asyncio.gather(*(_run_one(guardrail) for guardrail in input_guardrails)) |
There was a problem hiding this comment.
Cancel realtime input on the first tripped guardrail
Fresh evidence: this version now uses asyncio.gather, but gather still waits for every input guardrail to finish before inspecting triggered_results. When one guardrail trips quickly but another model-backed guardrail is slow, the unsafe audio turn keeps generating until the slowest guardrail returns, delaying the forced response.cancel that this feature relies on. Use asyncio.as_completed/task cancellation like the streamed input guardrail path so the session interrupts as soon as the first tripwire result is available.
Useful? React with 👍 / 👎.
|
@codex review |
bedd7a9 to
7e2676a
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e2676af1d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) | ||
|
|
||
| # Interrupt the model, forcing a cancel of any in-progress response. | ||
| await self._model.send_event(RealtimeModelSendInterrupt(force_response_cancel=True)) |
There was a problem hiding this comment.
Avoid cancelling later turns for stale input guardrails
When a model-backed input guardrail finishes after the guarded audio turn has already ended and a later user turn is active, this sends an untargeted forced response.cancel; _send_interrupt cancels whatever response is currently ongoing, and these input guardrail tasks are not cancelled on turn_ended. In that latency scenario a trip for item A can interrupt item B's unrelated response and enqueue the guardrail notification into the wrong turn, so the trip should be ignored/cancelled once the guarded turn is over or correlated to the response for this item_id before cancelling.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 7e2676af1d
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| ) | ||
|
|
||
| # Interrupt the model, forcing a cancel of any in-progress response. | ||
| await self._model.send_event(RealtimeModelSendInterrupt(force_response_cancel=True)) |
There was a problem hiding this comment.
Avoid cancelling a later response from stale guardrails
When an input guardrail is slower than the response it is checking (for example, the guarded turn reaches turn_ended and the user starts another turn before the guardrail finishes), this unscoped forced interrupt cancels whatever response is active at completion time; the model interrupt path is not tied to the guarded item_id. A trip for an earlier transcript can therefore interrupt an unrelated later answer and enqueue the guardrail follow-up into the wrong turn, so stale input-guardrail tasks should be ignored/cancelled after their turn ends or correlated to the response they are meant to cancel.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fair point but this is how output guardrail happens too. For the sake of parity, I will acknowledge this as a drawback but not change this, if this is desired, it should be raised in a separate PR
Snapshot the active agent and its input guardrails when the transcription event is handled so a concurrent update_agent()/handoff cannot run a different agent's guardrails, and run the input guardrails concurrently so a slow guardrail cannot delay the forced response cancel.
Address Codex review: replace asyncio.gather (which waits for every guardrail before inspecting results) with asyncio.as_completed so the forced response cancel fires as soon as the first tripwire is available, then cancel the remaining guardrail tasks. Mirrors the streamed input-guardrail path so a slow guardrail can no longer delay the interrupt.
7e2676a to
058565e
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 058565e6fd
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| for task in guardrail_tasks: | ||
| if not task.done(): | ||
| task.cancel() | ||
| await asyncio.gather(*guardrail_tasks, return_exceptions=True) |
There was a problem hiding this comment.
Interrupt before awaiting cancelled guardrails
Fresh evidence: this version now uses as_completed, but this await asyncio.gather(...) still runs before the forced cancel at line 1348. When one input guardrail trips quickly and another model-backed guardrail is slow to acknowledge cancellation or does cleanup, the session waits here before sending response.cancel, so the unsafe realtime response can continue generating for that latency; request the interrupt and mark the item interrupted before awaiting sibling task cleanup.
Useful? React with 👍 / 👎.
Summary
Adds input guardrails to the realtime API, bringing it closer to parity with the non-realtime
Agent/Runner, which already supportsinput_guardrails. Realtime today only supports output guardrails (RealtimeAgent.output_guardrails/RealtimeRunConfig["output_guardrails"]); there is no first-class way to screen the user's transcribed input.What changed:
RealtimeAgent.input_guardrails(appended at the end of the dataclass,default_factory=list) andRealtimeRunConfig["input_guardrails"](NotRequiredTypedDict key).RealtimeInputGuardrailTrippedsession event (appended at the end of theRealtimeSessionEventunion), mirroringRealtimeGuardrailTrippedfield-for-field but typed toInputGuardrailResult.RealtimeSessionruns the combined agent + run-config input guardrails on the completed user transcript (input_audio_transcription_completed), de-duped byid(). It reuses the existing output-guardrail machinery (shared_guardrail_tasksset,_on_guardrail_task_done,_cleanup_guardrail_tasks), soclose()cancels in-flight tasks. On a trip it emitsinput_guardrail_tripped, forcesresponse.cancel, and sends a follow-up user message naming the guardrail.agents.realtime.__init__(__all__) with an import regression test.docs/ref/realtime/events.mdrenders the new event;docs/realtime/guide.mddocuments the feature and disambiguates it from the existing tool-level "input guardrails on function-tool calls".The design deliberately mirrors
_run_output_guardrails(argument order verified againstInputGuardrail.run(self, agent, input, context)) so the behavior and lifecycle are consistent with what maintainers already review.Known limitation (documented, not hidden)
The forced cancel reliably interrupts a response that is already in flight. If a guardrail resolves in the narrow window before any response has been created for the tripped turn, the cancel is a no-op and that response may proceed. Eliminating this window cleanly requires response<->user-item correlation at the model layer (for example a
response_idon turn-started / response-created) so the session can cancel only the tripped turn's response without also cancelling the intentional guardrail-notification response. This limitation is documented in theRealtimeInputGuardrailTrippeddocstring,RealtimeAgent.input_guardrails, and the guide rather than papered over with a heuristic that would cancel the wrong response. Scope is also documented: input guardrails run on transcribed audio only; text sent viasend_messageis not screened. Happy to pursue the model-layer correlation as a follow-up if maintainers prefer.Test plan
tests/realtime/test_session.py::TestInputGuardrailFunctionality, including edge cases:make format,make lint,make typecheck— passmake tests(full) — pass (4797 passed, 2 skipped; serial 27 passed, 5 skipped)make build-docs— pass (newRealtimeInputGuardrailTrippedreference resolves clean)Issue number
Realtime parity with the non-realtime input-guardrail support. Happy to link the relevant tracking issue.
Checks
.agents/skills/code-change-verification/scripts/run.shmake format,make lint,make typecheck,make tests, andmake build-docs)/reviewbefore submitting this PRCompatibility notes
Additive. New fields are appended at the end of
RealtimeAgent(preserving positional compatibility) and are aNotRequiredconfig key; the new event is appended at the end of theRealtimeSessionEventunion. Sessions with no input guardrails configured create no extra tasks per utterance.