Classify write-batch failures as must_reload_view in RocksDB and storage-service backends#6515
Classify write-batch failures as must_reload_view in RocksDB and storage-service backends#6515ndr-ds wants to merge 1 commit into
Conversation
…age-service backends
Instruction Count Benchmark Results
Deterministic metrics — reproducible across runs (34 benchmarks)
Cache-dependent metrics — expect fluctuations between runs (34 benchmarks)
|
| /// RocksDB error while writing a batch. Unlike [`RocksDbStoreInternalError::RocksDb`] | ||
| /// (which also covers read failures), this error means a batch write may or may not | ||
| /// have been applied, so the in-memory view must be reloaded from storage. |
There was a problem hiding this comment.
Because it runs in the same process, I don't think rocksdb has ambiguous writes. Please link the doc if you found something.
There was a problem hiding this comment.
Sorry, this was supposed to be a draft. A lot of this still needs to be verified
There was a problem hiding this comment.
Np. (Probably) just revert this part
| // error) the server may or may not have applied the batch, so the in-memory view | ||
| // must be reloaded from storage. These variants can also surface on read RPCs, | ||
| // where a reload is unnecessary but harmless; we err on the side of reloading. | ||
| matches!(self, Self::GrpcError(_) | Self::TransportError(_)) |
Motivation
KeyValueStoreError::must_reload_view()tells the chain worker whether a storageerror may have left the in-memory view inconsistent with durable storage (a write
that may-or-may-not have landed). When it does, the worker is evicted and reloaded
from storage. Misclassifying a write-uncertainty error as benign causes a silent
in-memory↔storage desync — the root cause of the recent "Corrupted chain state"
incident, fixed for the journaling layer in #6507 / #6508.
Auditing the rest of the storage stack, two backends never override the trait's
default
must_reload_view() == false, so an ambiguous write is silently treated assafe:
rocks_db.rs): awrite_batchfailure fromdb.write()surfaces asthe generic
RocksDbvariant →false. RocksDB is the storage backend used by theworkers, which is exactly where the recent payout divergence appeared, so this
is a production silent-desync vector.
common.rs): awrite_batchis sent over gRPC; aDEADLINE_EXCEEDED/UNAVAILABLE/transport failure after the server has committedthe batch but before the client receives the ack →
GrpcError/TransportError→false.Proposal
WriteBatchError(rocksdb::Error)variant (mirroringScyllaDB's
WriteBatchExecutionError), mapdb.write()failures to it, and returnmust_reload_view() == trueonly for it. Read failures keep using the sharedRocksDbvariant and are unaffected (no reload churn on reads).must_reload_view()to returntrueforGrpcErrorand
TransportError. These variants can also surface on read RPCs (there is nodedicated read/write error split), so this conservatively errs toward reloading;
documented inline.
Test Plan
cargo clippy -p linera-views --features rocksdbandcargo clippy -p linera-storage-service --all-targetsare clean.rocksdb::Errorhas no public constructor, soit cannot be instantiated in a test. The classification is a trivial
matches!mirroring the proven ScyllaDB pattern; the abstract delegation is covered by Fix JournalingError::must_reload_view to delegate to the inner store error #6508's
journaling test, and worker-level behavior will be covered by a follow-up
fault-injection test.
Release Plan
These changes should be backported to the latest
testnetbranch, then(RocksDB is the worker storage backend, so the workers benefit directly.)
Links
JournalingError::must_reload_view).WriteBatchExecutionError(linera-views/src/backends/scylla_db.rs).must_reload_viewtrait default so future backendsmust classify explicitly.