Skip to content

Implement the execution of the streams, messages and operations in persistent instances over the block.#6032

Open
MathieuDutSik wants to merge 32 commits into
linera-io:mainfrom
MathieuDutSik:user_action_bundle_messages_c
Open

Implement the execution of the streams, messages and operations in persistent instances over the block.#6032
MathieuDutSik wants to merge 32 commits into
linera-io:mainfrom
MathieuDutSik:user_action_bundle_messages_c

Conversation

@MathieuDutSik

@MathieuDutSik MathieuDutSik commented Apr 15, 2026

Copy link
Copy Markdown
Contributor

Motivation

When executing a message, an operation, or a stream then we are instantiating, processing, and then finalizing.
So, the finalization is done many times, which is inefficient.

Proposal

By having persistent instances for contracts, we can address many issues with the execution model: The contract are running, and when we finish the block, we save. That covers many scenario:

  • Contract A executes a message, Contract B executes another message which lead to executing A.
  • Multiple operations are grouped.

However, that framework fails on the problem of checkpointing:

  • We need to be able to checkpoint before executing an IncomingBundle. That execution could fail.
  • Checkpointing the ChainStateView is not enough. One needs to checkpoint the state of the Wasm machine.
  • So we use snapshots of the Wasm running instance.

Snapshots:

  • We implement them for wasmer and wasmtime.
  • They are small. Something like 1.12 M for matching engine.
  • They are better than software solutions like introducing a save in the linera-sdk.
  • They are super fact and do not cost any fuel. Doing a load / save cost 25k for a counter contract.
  • They help enforce a strict alignment of the execution for the staging loop.

The execution on the web does not work because of the lack of shared memory (the same problem that forces the preloading of all relevant applications before running). So, we use a different scheme for the web:

  • We implement the serialization for the Wasm snapshots.
  • We use those snapshots when we execute a transaction.
  • After a transaction, we wind down the Wasm execution instance.

With the changes:

  • For a given application, load is called once inside a block, and store is called once. Slight break of the SDK.
  • This is breaking the testnet_conway.

The flash loan application:

  • Contract A calls the bank in order to get tokens.
  • Contract B repays the bank in tokens.
  • If the terminate function of the bank fails to terminate correctly then it means that the loans was not repaid correctly and the block cannot be created.

Test Plan

CI should cover all scenarios.

A benchmark function for the matching_engine is introduced: 50 bids, 50 asks, all sent to the same application. The processing of the inbox is benchmarked. The result is 760ms for the old code and 413ms for the new code.

If accepted, then the documentation has to be updated since the counter application is changing.

Release Plan

It cannot be backported to testnet_conway for the following reasons:

  • The API of the contract is changed.
  • The cost structure is changed.

Links

None.

@deuszx

deuszx commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

This looks promising but it'd be good to have some metric that we can use to confirm/gauge how much of the improvement we're getting.

@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch 2 times, most recently from 976da24 to 8858863 Compare April 22, 2026 07:35
@MathieuDutSik

MathieuDutSik commented Apr 22, 2026

Copy link
Copy Markdown
Contributor Author

This looks promising but it'd be good to have some metric that we can use to confirm/gauge how much of the improvement we're getting.

During a simulation for the matching engine with one user creating 50 messages, another user 50 messages, and then measuring the process_inbox we are achieving a 2x speedup. More precisely 760.286667ms vs 380.791458ms.

However, that implementation over the block does not work for the checkpointing. This is because if we use persistent instances, then when accessing an older state, we have to access the older state. But we would lose the modification to the view.

I leave it here as a draft since it is possible to run without auto_retry with BundleFailurePolicy::Abort. So, it could be interesting in some context due to the massive potential benefit.

@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch from 8858863 to be8a4ac Compare April 22, 2026 09:14
@MathieuDutSik MathieuDutSik changed the title Implement the execution of the streams, messages and operations in persistent instances. Implement the execution of the streams, messages and operations in persistent instances over the block. Apr 22, 2026
@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch 4 times, most recently from 29c6138 to e7b3e5b Compare April 28, 2026 20:44
@MathieuDutSik MathieuDutSik marked this pull request as ready for review May 2, 2026 10:57
@MathieuDutSik MathieuDutSik requested review from afck and ma2bd May 2, 2026 10:58
@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch 2 times, most recently from f1d9930 to 356f869 Compare May 4, 2026 05:50
Comment thread linera-witty/src/runtime/wasmtime/mod.rs Outdated
Comment thread linera-witty/src/runtime/wasmtime/mod.rs Outdated
.expect("Failed to restore Wasm global from snapshot");
}
}
_ => {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be an error? Is it expected that we'd encounter multiple Extern::Memory instances?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Linera, no since we have just one (the heap). However, there are use-cases where we have several, and maybe it would be better not to restruct ourselves initially.

for (name, ext) in exports {
match ext {
Extern::Memory(mem) if !memory_restored => {
mem.data_mut(&mut self.store)[..snapshot.memory_data.len()]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to check here if the memory data is large enough?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, added a correction.

Comment thread linera-witty/src/runtime/wasmer/mod.rs Outdated
.copy_to_vec()
.expect("Failed to copy Wasm memory")
})
.unwrap_or_default();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be an error if we don't have exactly one memory?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure.
I added error handling to the snapshot operations. However, a priori we could have several memory on Wasm, so this looks like an unnecessary check. There are plenty of other possible failure scenarios.

Comment thread linera-execution/src/lib.rs Outdated

/// Creates a snapshot of the Wasm instance's mutable state (memory and globals).
///
/// Returns `None` for non-Wasm contract implementations.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this really shouldn't have a default implementation: It needs to work for all VMs.

(If we want to postpone e.g. the EVM implementation I'd rather add an error and a TODO comment in the EVM-specific code.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should not have a default implementation for the snapshots.

Well execution can fail for a block that contains REVM message. That is true as it is for Wasm.

But the difference is that REVM does not have snapshots. So, we save to storage even if we are not at the end of the block. So, the snapshots functionality becomes a noop.

Comment thread linera-execution/src/lib.rs Outdated
}

/// Restores the Wasm instance's mutable state from a snapshot.
fn restore_snapshot(&mut self, _snapshot: &(dyn std::any::Any + Send)) {}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done similarly.

Comment thread linera-execution/src/lib.rs Outdated
/// Creates a snapshot of the Wasm instance's mutable state (memory and globals).
///
/// Returns `None` for non-Wasm contract implementations.
fn create_snapshot(&mut self) -> Option<Box<dyn std::any::Any + Send>> {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why not make the snapshot an associated type instead of Any?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not easy.
The problem is that we have to store the snapshots of many possible execution engines: Wasmer, Wasmtime, REVM (that one is trivial).
We could have a type that encapsulates the Box<dyn Any+Send> but I am afraid there is no totally nice solutions.

Comment thread linera-chain/src/block_tracker.rs Outdated
/// Creates a new BlockExecutionTracker.
///
/// The `runtime_channels` argument is `Some` when a block-level contract runtime thread
/// is available (native). On web, WASM modules cannot be shipped to the runtime worker

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// is available (native). On web, WASM modules cannot be shipped to the runtime worker
/// is available (native). On web, Wasm modules cannot be shipped to the runtime worker

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, corrected.
Actually, the code somehow assumes that the execution engine is Wasm, which is not accurate. But that is already visible with the metrics.

Comment thread linera-chain/src/block_tracker.rs Outdated
/// The `runtime_channels` argument is `Some` when a block-level contract runtime thread
/// is available (native). On web, WASM modules cannot be shipped to the runtime worker
/// via shared memory, so the block-level runtime is not spawned and `runtime_channels`
/// is `None` — per-transaction fallback runtimes are used instead.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand; does that mean that in the web client, the staging loop will produce a different outcome than the final block execution?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was the big problem. But I think I found a solution.

There are now indeed two execution paths:

  • One for native where we have shared memory.
  • One for the web.

This is forced because of the lack of shared memory in wasmer. The same problem that forces us to preload the contracts being used. There is a test that those two paths are coherent.

For the Native, we have persistent Wasm instance running over the whole execution of the block. For the web we create the Wasm instance during the scope of the execution. We work with snapshots in order to reinstate the Wasm instance when needed.

So, snapshots are used for both code execution paths, just in different ways.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And using the web approach for all would be less efficient, would it? (Because we'd needlessly snapshot-and-restore.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch 2 times, most recently from 3b49879 to a8b725d Compare May 5, 2026 05:34
@MathieuDutSik MathieuDutSik force-pushed the user_action_bundle_messages_c branch from 28476cd to 5dd5eb8 Compare May 6, 2026 10:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants