Skip to content

HDDS-14524. Add freon test that uses hfs API for both writes and reads with data validation#10651

Open
chihsuan wants to merge 1 commit into
apache:masterfrom
chihsuan:HDDS-14524
Open

HDDS-14524. Add freon test that uses hfs API for both writes and reads with data validation#10651
chihsuan wants to merge 1 commit into
apache:masterfrom
chihsuan:HDDS-14524

Conversation

@chihsuan

@chihsuan chihsuan commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

This PR adds a new Freon workload, dfsrw (dfs-read-write-validator), that exercises a Hadoop-compatible file system (o3fs:// / ofs://) with a mixed read-write workload and per-file data validation.

Each worker thread:

  1. writes a file whose content carries a distinct per-write marker
  2. keeps the latest hash of every path it wrote
  3. reads back a random path it previously wrote and verifies the hash still matches

A digest mismatch fails the run, so the tool surfaces both data corruption (bytes changed) and stale reads (an overwritten path returning older bytes) under concurrent load.

Design / robustness notes

  • Streaming digest - the write-side hash is computed on the fly via DigestOutputStream, so large files are never materialized in memory and file size is not capped at Integer.MAX_VALUE.
  • Exact size - the per-write marker is budgeted within --size, so a generated file is exactly --size bytes.
  • Meaningful under --duration - each write gets a distinct marker, so when a path is reused (as it is in time-based runs) its content and hash change; the read-back validates against the most recent write, which is what makes stale reads detectable.
  • Per-thread path namespace - the thread sequence id is part of each path, so one worker never overwrites a file another worker is reading back. Each thread tracks its paths in a per-path map (an overwrite just updates the digest), which stays naturally bounded with a hard cap for very large runs.
  • Input bounds - --size must be at least 8 bytes (the width of the per-write marker) and --buffer / --copy-buffer must be positive; these are functional minimums, not arbitrary caps.

The workload extends HadoopBaseFreonGenerator (thread-local FileSystem with proper close handling) and reuses ContentGenerator, getDigest, and the runTests task runner.

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-14524

How was this patch tested?

  • New integration test TestHadoopFsReadWriteValidator, added as a @Nested case to the existing FreonTests suite (shared MiniOzoneCluster), parameterized over FILE_SYSTEM_OPTIMIZED and LEGACY bucket layouts. It runs dfsrw end to end and independently verifies the expected files were written with the requested size.
  • Ran locally against a MiniOzoneCluster: FreonTests$HadoopFsReadWriteValidator 2/2 passing; full TestFreon green (31 tests, 0 failures).
  • mvn checkstyle:check clean on the changed modules.

Generated-by: Claude Code (Claude Opus 4.8)

…s with data validation

Add a Freon workload (dfsrw) that uses the Hadoop FS API to write files
with per-file content, keep each file's hash, then read back a random
file previously written by the thread and validate the hash matches.

Covered by a new integration test running against FSO and LEGACY bucket
layouts, wired into the existing FreonTests suite.
@chihsuan chihsuan marked this pull request as ready for review July 3, 2026 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant