Skip to content

[FLINK-40068][tests] Do not assert apply() was called when cancelling KeyedJob window#28634

Open
MartijnVisser wants to merge 1 commit into
apache:masterfrom
MartijnVisser:fix-keyedjob-close-assertion-flake
Open

[FLINK-40068][tests] Do not assert apply() was called when cancelling KeyedJob window#28634
MartijnVisser wants to merge 1 commit into
apache:masterfrom
MartijnVisser:fix-keyedjob-close-assertion-flake

Conversation

@MartijnVisser

Copy link
Copy Markdown
Contributor

What is the purpose of the change

Fixes the recurring KeyedComplexChainTest.testMigrationAndRestore failures (NoResourceAvailableException, Azure builds 76400, 76448, 76627), the follow-up investigation deferred in FLINK-39918. KeyedJob.StatefulWindowFunction.close() asserted applyCalled on every close, but close() also runs on the cancellation path, and the GENERATE/MIGRATE jobs are always stopped via non-draining cancel-with-savepoint, so under CI load a window subtask can be closed before its element was processed. The AssertionFailedError thrown from close() fails the shared static MiniCluster's only TaskManager (NUM_TMS=1, no restart), so the subsequent restore step and every later savepoint parameterization starve with NoResourceAvailableException after the slot-request timeout. There is no slot leak and no data loss: the runtime released all slots correctly, and the window's keyed state is restored from the savepoint independently of apply() having run.

Brief change log

  • Restrict StatefulWindowFunction.close()'s applyCalled assertion to ExecutionMode.RESTORE, the only mode whose job runs to completion.

Verifying this change

This change is already covered by existing tests: KeyedComplexChainTest ran 3x locally, 16/16 green each time. The original failure needs cancel-with-savepoint to beat element delivery under CI load and is not reproducible locally.

Coverage trade-off: MIGRATE loses the applyCalled fail-safe, but the RESTORE run re-validates the migrated state end-to-end via the untouched apply() state-comparison assertions, so migration correctness is still verified.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? no

Was generative AI tooling used to co-author this PR?
  • Yes (Claude Opus 4.8, via Claude Code)

Generated-by: Claude Opus 4.8 (1M context)

… KeyedJob window

StatefulWindowFunction.close() also runs on the cancellation path, and the GENERATE/MIGRATE jobs are always stopped via non-draining cancel-with-savepoint, so a window subtask can be closed before apply() was called; the failing assertion then kills the shared MiniCluster's only TaskManager and starves every later parameterization. Restrict the assertion to RESTORE, the only mode whose job runs to completion.

Generated-by: Claude Opus 4.8 (1M context)
@flinkbot

flinkbot commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build

@spuru9 spuru9 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
Clean and straightforward fix.

@github-actions github-actions Bot added the community-reviewed PR has been reviewed by the community. label Jul 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

community-reviewed PR has been reviewed by the community.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants