feat(telemetry): send tool-call duration_ms with server_call_tool#492
Conversation
Move the server_call_tool event from before execution into a finally block so it can carry duration_ms and is_error. The finally guarantees it fires on success, handled-error, and hard-crash paths. Only a true hang (never returns/throws) is uninstrumented. No new event added. - Hoist telemetryData/result/isError above the try so finally can read them - duration_ms = Date.now() - startTime, is_error sent as string (proxy convention) - Keeps local toolHistory duration tracking unchanged
📝 WalkthroughWalkthroughThe PR refactors telemetry emission in the ChangesTelemetry Emission Restructuring
Sequence DiagramsequenceDiagram
participant Handler as CallToolRequestSchema
participant Tool as Tool Execution
participant Telemetry as Telemetry
Handler->>Handler: Hoist telemetryData, result, isError vars
Handler->>Handler: trackToolCall setup
Handler->>Tool: Execute tool
alt Tool succeeds
Tool-->>Handler: result with isError=false
Handler->>Handler: Set isError from result.isError
else Tool throws
Tool-->>Handler: Exception
Handler->>Handler: Catch: set isError=true
end
Handler->>Telemetry: Finally: capture_call_tool with telemetryData + duration_ms + is_error
Telemetry-->>Handler: Event emitted
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@src/server.ts`:
- Around line 1575-1583: In the finally block where
capture_call_tool('server_call_tool', {...}) is invoked, wrap that call in a
local try/catch so any exception from the telemetry/analytics emission cannot
escape and affect the handler's normal return/error flow; inside the catch, log
the telemetry error (do not rethrow). Keep existing telemetry payload fields
(telemetryData, duration_ms: Date.now() - startTime, is_error: String(isError))
when calling capture_call_tool and ensure only logging happens on failure.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
| } finally { | ||
| // Single tool-call telemetry event, fired AFTER execution so it can carry | ||
| // timing. In a finally so it still fires on the hard-crash path (the catch | ||
| // above). Only missed if a tool never returns or throws (a true hang). | ||
| capture_call_tool('server_call_tool', { | ||
| ...telemetryData, | ||
| duration_ms: Date.now() - startTime, | ||
| is_error: String(isError), | ||
| }); |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Guard telemetry emission in finally so analytics failures can't fail the tool call.
A throw from capture_call_tool(...) inside finally will override the normal return/error response path and turn successful tool executions into handler failures. Wrap this emit in a local try/catch (log-only on failure).
Proposed hardening patch
} finally {
// Single tool-call telemetry event, fired AFTER execution so it can carry
// timing. In a finally so it still fires on the hard-crash path (the catch
// above). Only missed if a tool never returns or throws (a true hang).
- capture_call_tool('server_call_tool', {
- ...telemetryData,
- duration_ms: Date.now() - startTime,
- is_error: String(isError),
- });
+ try {
+ capture_call_tool('server_call_tool', {
+ ...telemetryData,
+ duration_ms: Date.now() - startTime,
+ is_error: String(isError),
+ });
+ } catch (telemetryError) {
+ logToStderr('warn', `server_call_tool telemetry failed: ${telemetryError}`);
+ }
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| } finally { | |
| // Single tool-call telemetry event, fired AFTER execution so it can carry | |
| // timing. In a finally so it still fires on the hard-crash path (the catch | |
| // above). Only missed if a tool never returns or throws (a true hang). | |
| capture_call_tool('server_call_tool', { | |
| ...telemetryData, | |
| duration_ms: Date.now() - startTime, | |
| is_error: String(isError), | |
| }); | |
| } finally { | |
| // Single tool-call telemetry event, fired AFTER execution so it can carry | |
| // timing. In a finally so it still fires on the hard-crash path (the catch | |
| // above). Only missed if a tool never returns or throws (a true hang). | |
| try { | |
| capture_call_tool('server_call_tool', { | |
| ...telemetryData, | |
| duration_ms: Date.now() - startTime, | |
| is_error: String(isError), | |
| }); | |
| } catch (telemetryError) { | |
| logToStderr('warn', `server_call_tool telemetry failed: ${telemetryError}`); | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/server.ts` around lines 1575 - 1583, In the finally block where
capture_call_tool('server_call_tool', {...}) is invoked, wrap that call in a
local try/catch so any exception from the telemetry/analytics emission cannot
escape and affect the handler's normal return/error flow; inside the catch, log
the telemetry error (do not rethrow). Keep existing telemetry payload fields
(telemetryData, duration_ms: Date.now() - startTime, is_error: String(isError))
when calling capture_call_tool and ensure only logging happens on failure.
| @@ -1453,6 +1454,7 @@ server.setRequestHandler(CallToolRequestSchema, async (request: CallToolRequest) | |||
|
|
|||
| // Add tool call to history (exclude only get_recent_tool_calls to prevent recursion) | |||
| const duration = Date.now() - startTime; | |||
There was a problem hiding this comment.
I think this is the duration to use for the tool call duration without overhead calculations. Currently, on line 1581 duration is recalculated in finally, but there is a bit of an overhead between this point and that point, I think this duration is more precise for describing tool call duration. Should maybe recalculate sooner than finally.
There was a problem hiding this comment.
In the end decided to leave as is.
Duration you point to is in try block.
Due to error it may never get calculated.
What
Makes per-tool-call latency observable in telemetry so we can detect MCP performance regressions/improvements across versions and clients.
Previously
server_call_toolwas fired before the handler ran, so it could never carry timing. Duration was computed after execution but only fed into the local in-memorytoolHistory(surfaced viaget_recent_tool_calls) and never sent anywhere.Change
Moves the single
server_call_toolcapture from before theswitchinto afinallyblock at the end of theCallToolRequestSchemahandler, enriched with two new fields:duration_ms—Date.now() - startTimeis_error— sent as a string (consistent with howremoteis sent)To make this possible,
telemetryData,result, and a newisErrorflag are hoisted above thetryso thefinallycan read them.isErroris taken fromresult.isErroron the normal path and set totruein the catch.No new event, no volume change — just two new params on the existing
server_call_tool.Why
finallyIt fires on all exit paths: success, handled-error (
isErrorresult), and hard crash (the top-level catch). The event is only missed if a tool never returns and never throws — a true hang. That's an accepted, known limitation (and the price of not adding a second start-fired event).Local
toolHistoryduration tracking is unchanged.Testing
npm run build— clean, no TS errorstest-telemetry-handling.js,test-ui-event-tracking.js— passtest-file-handlers.jsTest 9,read_fileimage-content host compatibility) reproduces on cleanorigin/mainand is unrelated to this change.Follow-up for reviewer
Once this is collecting, we can build BigQuery views on
mcp.mergedfor p50/p95duration_mspertool_nameperapp_versionto watch for regressions.Summary by CodeRabbit