Skip to content

HDDS-10307. Speed up TestOzoneManagerHAWithStoppedNodes#10658

Merged
adoroszlai merged 4 commits into
apache:masterfrom
hevinhsu:HDDS-10307
Jul 4, 2026
Merged

HDDS-10307. Speed up TestOzoneManagerHAWithStoppedNodes#10658
adoroszlai merged 4 commits into
apache:masterfrom
hevinhsu:HDDS-10307

Conversation

@hevinhsu

@hevinhsu hevinhsu commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This patch improves TestOzoneManagerHAWithStoppedNodes by making the
tests more deterministic and reducing unnecessary execution time.

  • Replace Thread.sleep with state-based waiting using
    GenericTestUtils.waitFor and waitForLeaderToBeReady(), removing
    blind delays while preserving the original test behavior.
  • Add a setExtraClusterConfig(Consumer<OzoneConfiguration>) hook to
    AbstractOzoneManagerHATest so subclasses can customize the cluster
    configuration before it is built.
  • Use the new hook to reduce
    OZONE_BLOCK_DELETING_SERVICE_INTERVAL from 10s to 2s, and reduce
    the polling interval in testKeyDeletion from 10000ms to 1000ms
    (timeout unchanged).

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-10307

How was this patch tested?

https://github.com/hevinhsu/ozone/actions/runs/28646244276

Test command:

TestOzoneManagerHAWithStoppedNodes

mvn -pl :ozone-integration-test test \
  -Dtest=TestOzoneManagerHAWithStoppedNodes \
  -DskipShade -DskipRecon -DskipDocs

Execution time:

before:

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 296.8 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes

after:

Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 250.8 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes

TestOzoneManagerHAFollowerReadWithStoppedNodes

mvn -pl :ozone-integration-test test \
  -Dtest=TestOzoneManagerHAFollowerReadWithStoppedNodes \
  -DskipShade -DskipRecon -DskipDocs

Execution time:

before:

Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 319.6 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes

after:

Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 274.4 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes

hevinhsu and others added 2 commits July 3, 2026 15:40
Replace blind Thread.sleep calls in TestOzoneManagerHAWithStoppedNodes
with state-based waits using GenericTestUtils.waitFor and the existing
waitForLeaderToBeReady() helper.

- oneOMDown, twoOMDown, testMultipartUploadWithOneOmNodeDown,
  testListParts, testListVolumes: replace Thread.sleep(NODE_FAILURE_TIMEOUT * N)
  with waitForLeaderToBeReady(), which polls until a leader is ready
  rather than sleeping for a fixed duration.
- testOMProxyProviderFailoverOnConnectionFailure: remove the pre-request
  sleep (not needed; the client detects failure during the request) and
  replace the post-request sleep with a waitFor that polls until the
  proxy node ID actually changes.
- twoOMDown: remove the post-stop sleep entirely; with no quorum a
  leader cannot be elected, and the subsequent operations already expect
  failure.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…duced poll intervals

Add a config hook to AbstractOzoneManagerHATest so subclasses can
customize cluster configuration without overriding @BeforeAll.

- Add a private static extraClusterConfig field (Consumer<OzoneConfiguration>)
  with a protected setter setExtraClusterConfig(). Subclasses set this
  in a static {} block, which runs at class-load time before any @BeforeAll.
- Update initCluster(boolean) to apply extraClusterConfig, so the
  single @BeforeAll in TestOzoneManagerHA picks it up automatically.
  This avoids the double-initialization that occurs when a subclass
  defines its own @BeforeAll with the same signature (static methods
  are hidden, not overridden, so JUnit 5 would execute both).
- Use the hook in TestOzoneManagerHAWithStoppedNodes to reduce
  OZONE_BLOCK_DELETING_SERVICE_INTERVAL from 10s to 2s.
- Reduce GenericTestUtils.waitFor checkInterval in testKeyDeletion
  from 10000ms to 1000ms (timeout unchanged at 120s).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@adoroszlai adoroszlai added the test label Jul 4, 2026

@adoroszlai adoroszlai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hevinhsu for the patch.

Can you please make the same change in TestOzoneManagerHAFollowerReadWithStoppedNodes, which is a slightly modified copy of TestOzoneManagerHAWithStoppedNodes?

Also, with that MiniOzoneHAClusterImpl.NODE_FAILURE_TIMEOUT can be changed to private.

@hevinhsu hevinhsu changed the title HDDS-10703. Speed up TestOzoneManagerHAWithStoppedNodes HDDS-10307. Speed up TestOzoneManagerHAWithStoppedNodes Jul 4, 2026
@hevinhsu

hevinhsu commented Jul 4, 2026

Copy link
Copy Markdown
Contributor Author

@adoroszlai Thanks for your feedback.
Sure, no problem. I made the same change in TestOzoneManagerHAFollowerReadWithStoppedNodes as you suggested, and changed MiniOzoneHAClusterImpl.NODE_FAILURE_TIMEOUT back to private.

@adoroszlai adoroszlai left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @hevinhsu for updating the patch.

Before:

Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 233.5 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 204.0 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes

After:

Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 189.4 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 160.6 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes

@adoroszlai adoroszlai merged commit 10a9437 into apache:master Jul 4, 2026
30 of 31 checks passed
@hevinhsu

hevinhsu commented Jul 5, 2026

Copy link
Copy Markdown
Contributor Author

Thanks @adoroszlai for the reviews and for merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants