Implement the execution of the streams, messages and operations in persistent instances over the block.#6032
Conversation
|
This looks promising but it'd be good to have some metric that we can use to confirm/gauge how much of the improvement we're getting. |
976da24 to
8858863
Compare
During a simulation for the matching engine with one user creating 50 messages, another user 50 messages, and then measuring the However, that implementation over the block does not work for the checkpointing. This is because if we use persistent instances, then when accessing an older state, we have to access the older state. But we would lose the modification to the view. I leave it here as a draft since it is possible to run without auto_retry with |
8858863 to
be8a4ac
Compare
29c6138 to
e7b3e5b
Compare
f1d9930 to
356f869
Compare
| .expect("Failed to restore Wasm global from snapshot"); | ||
| } | ||
| } | ||
| _ => {} |
There was a problem hiding this comment.
Should this be an error? Is it expected that we'd encounter multiple Extern::Memory instances?
There was a problem hiding this comment.
For Linera, no since we have just one (the heap). However, there are use-cases where we have several, and maybe it would be better not to restruct ourselves initially.
| for (name, ext) in exports { | ||
| match ext { | ||
| Extern::Memory(mem) if !memory_restored => { | ||
| mem.data_mut(&mut self.store)[..snapshot.memory_data.len()] |
There was a problem hiding this comment.
Do we need to check here if the memory data is large enough?
There was a problem hiding this comment.
Yes, added a correction.
| .copy_to_vec() | ||
| .expect("Failed to copy Wasm memory") | ||
| }) | ||
| .unwrap_or_default(); |
There was a problem hiding this comment.
Should it be an error if we don't have exactly one memory?
There was a problem hiding this comment.
I am not sure.
I added error handling to the snapshot operations. However, a priori we could have several memory on Wasm, so this looks like an unnecessary check. There are plenty of other possible failure scenarios.
|
|
||
| /// Creates a snapshot of the Wasm instance's mutable state (memory and globals). | ||
| /// | ||
| /// Returns `None` for non-Wasm contract implementations. |
There was a problem hiding this comment.
I think this really shouldn't have a default implementation: It needs to work for all VMs.
(If we want to postpone e.g. the EVM implementation I'd rather add an error and a TODO comment in the EVM-specific code.)
There was a problem hiding this comment.
I agree that we should not have a default implementation for the snapshots.
Well execution can fail for a block that contains REVM message. That is true as it is for Wasm.
But the difference is that REVM does not have snapshots. So, we save to storage even if we are not at the end of the block. So, the snapshots functionality becomes a noop.
| } | ||
|
|
||
| /// Restores the Wasm instance's mutable state from a snapshot. | ||
| fn restore_snapshot(&mut self, _snapshot: &(dyn std::any::Any + Send)) {} |
There was a problem hiding this comment.
Yes, done similarly.
| /// Creates a snapshot of the Wasm instance's mutable state (memory and globals). | ||
| /// | ||
| /// Returns `None` for non-Wasm contract implementations. | ||
| fn create_snapshot(&mut self) -> Option<Box<dyn std::any::Any + Send>> { |
There was a problem hiding this comment.
And why not make the snapshot an associated type instead of Any?
There was a problem hiding this comment.
It is not easy.
The problem is that we have to store the snapshots of many possible execution engines: Wasmer, Wasmtime, REVM (that one is trivial).
We could have a type that encapsulates the Box<dyn Any+Send> but I am afraid there is no totally nice solutions.
| /// Creates a new BlockExecutionTracker. | ||
| /// | ||
| /// The `runtime_channels` argument is `Some` when a block-level contract runtime thread | ||
| /// is available (native). On web, WASM modules cannot be shipped to the runtime worker |
There was a problem hiding this comment.
| /// is available (native). On web, WASM modules cannot be shipped to the runtime worker | |
| /// is available (native). On web, Wasm modules cannot be shipped to the runtime worker |
There was a problem hiding this comment.
Ok, corrected.
Actually, the code somehow assumes that the execution engine is Wasm, which is not accurate. But that is already visible with the metrics.
| /// The `runtime_channels` argument is `Some` when a block-level contract runtime thread | ||
| /// is available (native). On web, WASM modules cannot be shipped to the runtime worker | ||
| /// via shared memory, so the block-level runtime is not spawned and `runtime_channels` | ||
| /// is `None` — per-transaction fallback runtimes are used instead. |
There was a problem hiding this comment.
I don't understand; does that mean that in the web client, the staging loop will produce a different outcome than the final block execution?
There was a problem hiding this comment.
That was the big problem. But I think I found a solution.
There are now indeed two execution paths:
- One for native where we have shared memory.
- One for the web.
This is forced because of the lack of shared memory in wasmer. The same problem that forces us to preload the contracts being used. There is a test that those two paths are coherent.
For the Native, we have persistent Wasm instance running over the whole execution of the block. For the web we create the Wasm instance during the scope of the execution. We work with snapshots in order to reinstate the Wasm instance when needed.
So, snapshots are used for both code execution paths, just in different ways.
There was a problem hiding this comment.
And using the web approach for all would be less efficient, would it? (Because we'd needlessly snapshot-and-restore.)
3b49879 to
a8b725d
Compare
Add the motion for the REVM.
28476cd to
5dd5eb8
Compare
Motivation
When executing a message, an operation, or a stream then we are instantiating, processing, and then finalizing.
So, the finalization is done many times, which is inefficient.
Proposal
By having persistent instances for contracts, we can address many issues with the execution model: The contract are running, and when we finish the block, we save. That covers many scenario:
However, that framework fails on the problem of checkpointing:
IncomingBundle. That execution could fail.ChainStateViewis not enough. One needs to checkpoint the state of the Wasm machine.Snapshots:
wasmerandwasmtime.savein thelinera-sdk.load/savecost 25k for acountercontract.The execution on the web does not work because of the lack of shared memory (the same problem that forces the preloading of all relevant applications before running). So, we use a different scheme for the web:
With the changes:
loadis called once inside a block, andstoreis called once. Slight break of the SDK.testnet_conway.The flash loan application:
terminatefunction of the bank fails to terminate correctly then it means that the loans was not repaid correctly and the block cannot be created.Test Plan
CI should cover all scenarios.
A benchmark function for the
matching_engineis introduced: 50 bids, 50 asks, all sent to the same application. The processing of the inbox is benchmarked. The result is 760ms for the old code and 413ms for the new code.If accepted, then the documentation has to be updated since the
counterapplication is changing.Release Plan
It cannot be backported to
testnet_conwayfor the following reasons:Links
None.