HDDS-10307. Speed up TestOzoneManagerHAWithStoppedNodes#10658
Merged
Conversation
Replace blind Thread.sleep calls in TestOzoneManagerHAWithStoppedNodes with state-based waits using GenericTestUtils.waitFor and the existing waitForLeaderToBeReady() helper. - oneOMDown, twoOMDown, testMultipartUploadWithOneOmNodeDown, testListParts, testListVolumes: replace Thread.sleep(NODE_FAILURE_TIMEOUT * N) with waitForLeaderToBeReady(), which polls until a leader is ready rather than sleeping for a fixed duration. - testOMProxyProviderFailoverOnConnectionFailure: remove the pre-request sleep (not needed; the client detects failure during the request) and replace the post-request sleep with a waitFor that polls until the proxy node ID actually changes. - twoOMDown: remove the post-stop sleep entirely; with no quorum a leader cannot be elected, and the subsequent operations already expect failure. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…duced poll intervals
Add a config hook to AbstractOzoneManagerHATest so subclasses can
customize cluster configuration without overriding @BeforeAll.
- Add a private static extraClusterConfig field (Consumer<OzoneConfiguration>)
with a protected setter setExtraClusterConfig(). Subclasses set this
in a static {} block, which runs at class-load time before any @BeforeAll.
- Update initCluster(boolean) to apply extraClusterConfig, so the
single @BeforeAll in TestOzoneManagerHA picks it up automatically.
This avoids the double-initialization that occurs when a subclass
defines its own @BeforeAll with the same signature (static methods
are hidden, not overridden, so JUnit 5 would execute both).
- Use the hook in TestOzoneManagerHAWithStoppedNodes to reduce
OZONE_BLOCK_DELETING_SERVICE_INTERVAL from 10s to 2s.
- Reduce GenericTestUtils.waitFor checkInterval in testKeyDeletion
from 10000ms to 1000ms (timeout unchanged at 120s).
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
adoroszlai
reviewed
Jul 4, 2026
adoroszlai
left a comment
Contributor
There was a problem hiding this comment.
Thanks @hevinhsu for the patch.
Can you please make the same change in TestOzoneManagerHAFollowerReadWithStoppedNodes, which is a slightly modified copy of TestOzoneManagerHAWithStoppedNodes?
Also, with that MiniOzoneHAClusterImpl.NODE_FAILURE_TIMEOUT can be changed to private.
Contributor
Author
|
@adoroszlai Thanks for your feedback. |
adoroszlai
approved these changes
Jul 4, 2026
adoroszlai
left a comment
Contributor
There was a problem hiding this comment.
Thanks @hevinhsu for updating the patch.
Before:
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 233.5 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 204.0 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
After:
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 189.4 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAFollowerReadWithStoppedNodes
Tests run: 13, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 160.6 s -- in org.apache.hadoop.ozone.om.TestOzoneManagerHAWithStoppedNodes
Contributor
Author
|
Thanks @adoroszlai for the reviews and for merging. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
This patch improves
TestOzoneManagerHAWithStoppedNodesby making thetests more deterministic and reducing unnecessary execution time.
Thread.sleepwith state-based waiting usingGenericTestUtils.waitForandwaitForLeaderToBeReady(), removingblind delays while preserving the original test behavior.
setExtraClusterConfig(Consumer<OzoneConfiguration>)hook toAbstractOzoneManagerHATestso subclasses can customize the clusterconfiguration before it is built.
OZONE_BLOCK_DELETING_SERVICE_INTERVALfrom10sto2s, and reducethe polling interval in
testKeyDeletionfrom10000msto1000ms(timeout unchanged).
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-10307
How was this patch tested?
https://github.com/hevinhsu/ozone/actions/runs/28646244276
Test command:
TestOzoneManagerHAWithStoppedNodes
mvn -pl :ozone-integration-test test \ -Dtest=TestOzoneManagerHAWithStoppedNodes \ -DskipShade -DskipRecon -DskipDocsExecution time:
before:
after:
TestOzoneManagerHAFollowerReadWithStoppedNodes
mvn -pl :ozone-integration-test test \ -Dtest=TestOzoneManagerHAFollowerReadWithStoppedNodes \ -DskipShade -DskipRecon -DskipDocsExecution time:
before:
after: