executor multiplicative latency fix#1229
Open
mchain0 wants to merge 2 commits into
Open
Conversation
|
👋 mchain0, thanks for creating this pull request! To help reviewers, please consider creating future PRs as drafts first. This allows you to self-review and make any final changes before notifying the team. Once you're ready, you can mark it as "Ready for review" to request feedback. Thanks! |
|
Code coverage report:
|
carte7000
approved these changes
Jul 3, 2026
RensR
approved these changes
Jul 3, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Executor Multiplicative Latency Fix
tl;dr
Multi-CCV messages that previously waited 15–60s after the first CCV was indexed now retry in ~1–4s once remaining verifier data lands, while post-transmit duplicate-execution protection is unchanged.
Problem description
A message flows: source chain -> Verifier -> Aggregator -> Indexer -> Executor -> destination OffRamp.
The indexer publishes a message record to its messages endpoint as soon as the FIRST verification is discovered (from the aggregator), not when the full CCV set is indexed. The executor's streamer polls that endpoint every 1s (indexerPollingInterval = 1 * time.Second, cmd/executor/service.go) and picks the message up immediately.
For any message needing more than one CCV (essentially every token transfer: committee + CCTP/Lombard), the leader executor attempts it before the second CCV is indexed and hits a "not enough verifier data" or
ErrInsufficientVerifiers. It returnsshouldRetry=true.The coordinator then re-schedules using a SINGLE retry delay for ALL retries.
GetRetryDelay = len(pool) * executionIntervalfromexecutor/pkg/leaderelector/hash_based_elector.go:167-177, withexecutionInterval = 15s(devenv) orexecutionInterval = 60s(default). So a message whose data lands milliseconds later still idles 15-60s per pool member.This delay is CORRECT for post-transmit retries (staggering executors avoids duplicate on-chain execution/gas), but WRONG for pre-transmit "data not ready" waits where no tx was broadcast.
Solution
Classify each retry as either data-not-ready (pre-transmit) or execution-contended (post-transmit), and apply a short (bounded-exponential) backoff to the former while keeping the staggered delay for the latter.
Shortening only pre-transmit retries cannot cause duplicate execution because (a) each executor's INITIAL attempt is still staggered by
GetReadyTimestamp, and (b) before transmitting,HandleMessagecallsHasHonestAttemptand skips if another executor already executed. We only speed up an executor that is actively waiting on data and has not broadcast anything.Changes
Minimal changes made to make it work: