MegaETH: Dissecting a Real-Time Ethereum L2
MegaETH bills itself as “real-time Ethereum” — an L2 targeting sub-millisecond block latency and 100k+ TPS. Bold claims aside, the engineering is genuinely interesting. I spent a week reading their open-source repos, and what I found is a methodical, well-executed attack on every bottleneck in Ethereum execution. This post is a deep technical walkthrough of how it works.
The Performance Problem with Ethereum
To understand what MegaETH is doing, you need to understand what’s slow about Ethereum execution today.
An Ethereum node (geth, reth, etc.) processes transactions sequentially. For each transaction, the EVM interpreter:
- Fetches the next opcode from bytecode
- Dispatches to the handler (switch/jump table)
- Manipulates a 256-bit stack machine
- Reads/writes state from a Merkle Patricia Trie backed by LevelDB/MDBX
- Computes gas costs
- Updates the state root (re-hashing the trie)
Each of these steps has performance pathologies:
- Interpretation overhead: Even the fastest interpreters (evmone’s baseline) pay dispatch costs per opcode — branch misprediction, icache misses, no cross-opcode optimization.
- 256-bit arithmetic on a 64-bit machine: Every
ADDis 4 limb additions with carry propagation.MULis 16 partial products. The EVM’s native word size is hostile to modern hardware. - State access: The Merkle Patricia Trie requires O(log n) random disk reads per state access. Each node is a separate database key. This is catastrophically slow for SSDs (random 4KB reads) and doesn’t parallelize.
- Sequential execution: No transaction-level parallelism. The state root depends on execution order.
- State root computation: After every block, you rehash the entire modified trie path. Each level is a keccak256 over up to 17 children. This is CPU-bound and latency-critical.
MegaETH attacks all five.
Architecture Overview
MegaETH’s architecture is sequencer-centric. The sequencer is a beefy machine (hundreds of GB RAM) optimized for throughput. It’s where the innovative execution happens. Validators use a fundamentally different (and much cheaper) verification path.
The codebase is organized as:
| Repository | Language | Purpose |
|---|---|---|
megaeth-reth | Rust | Fork of reth — the execution client |
megaeth-evmone-compiler | C++ | EVM bytecode → native code AOT compiler |
megaeth-mega-evm | Rust | Custom EVM implementation (wraps revm) |
megaeth-salt | Rust | SALT: novel authenticated trie structure |
megaeth-stateless-validator | Rust | Lightweight stateless block validator |
The key insight: fork reth, replace the hot paths with hyper-optimized versions, keep compatibility everywhere else. This is pragmatic engineering — they didn’t rewrite a consensus client from scratch (like Firedancer did for Solana). They surgically replaced the performance-critical components.
The Star: evmone-compiler (EVM → Native Code AOT Compiler)
This is the most technically interesting piece. It’s an ahead-of-time compiler that translates EVM bytecode into native x86-64 machine code via C++ as an intermediate representation.
The Compilation Pipeline
The pipeline is elegantly simple:
EVM bytecode → C++ function → Native .so (via clang) → dlopen at runtime
From lib/compiler/README.md:
┌───────────────┐
│ │
│ evmone opcode ├────┐
│ handlers │ │
│ │ │
└───────────────┘ │
│
┌──────────────┐ ┌──────────────┐ │ ┌───────────────┐
│ │ │ │ │ │ │
│ EVM bytecode ├────────►│ C++ function ├────┴────►│ Native code │
│ │ Our │ │ C++ │ (.so file) │
└──────────────┘ compiler└──────────────┘ compiler └───────────────┘
The clever trick: instead of writing a full code generator (register allocation, instruction selection, etc.), they generate C++ code that calls evmone’s existing opcode handler functions, then let clang’s optimization pipeline do the heavy lifting. This is a form of partial evaluation — the EVM program structure is known at compile time, but the operand values are runtime.
How Compilation Works
The compiler in lib/compiler/compiler.cpp performs three phases:
Phase 1: Basic Block Analysis
The bytecode is split into basic blocks at control flow boundaries. Each block’s gas cost, stack requirements, and stack growth are precomputed:
// From lib/compiler/compiler.cpp
struct BasicBlockAnalysis {
bool valid = true;
size_t start_offset;
const baseline::CostTable& cost_table;
std::vector<Opcode> opcodes {};
std::vector<std::optional<uint256>> imm_values {};
std::vector<bool> push_n_jump;
int64_t base_gas_cost {};
int stack_required {};
int stack_max_growth {};
};A key optimization detected here: push_n_jump identifies consecutive PUSH; JUMP pairs, which are extremely common in compiled Solidity (every function call and return). These get fused into a super-instruction.
Phase 2: Block Summary Computation
For each basic block, the total gas cost and stack bounds are pre-computed:
// From lib/compiler/compiler.cpp
for (auto& bb : basic_blks) {
int stack_change = 0;
bb.push_n_jump.assign(bb.opcodes.size(), false);
for (size_t i = 0; i < bb.opcodes.size(); ++i) {
const auto op = bb.opcodes[i];
bb.base_gas_cost += cost_table[op];
auto current_stack_required =
instr::traits[op].stack_height_required - stack_change;
bb.stack_required = std::max(bb.stack_required, current_stack_required);
stack_change += instr::traits[op].stack_height_change;
bb.stack_max_growth = std::max(bb.stack_max_growth, stack_change);
}
}This means gas checking and stack bounds checking happen once per basic block, not per opcode. In an interpreter, every single opcode dispatch includes these checks.
Phase 3: C++ Code Generation
Each opcode becomes a call to evmone’s handler functions. The generated code uses GNU computed gotos for control flow:
// Generated output for a basic block (from compile_cxx)
BLOCK_START(6, 18, 3, 0)
INVOKE(DUP3)
INVOKE(ISZERO)
PUSHnJUMP(27) // fused PUSH+JUMPI super-instructionThe BLOCK_START macro expands to (from lib/compiler/aot_compiler.hpp):
#define BLOCK_START(ofs, ...) \
L_OFFSET_##ofs: \
static constexpr BasicBlock bb_##ofs {__VA_ARGS__}; \
status = check_block_requirements(bb_##ofs, gas, &stack.top(), stack_bottom); \
if ((!GAS_CHECK_OFF && (GAS_CHECK_LOC == 0) && (gas < 0)) || \
status != EVMC_SUCCESS) [[unlikely]] \
goto label_final;And the INVOKE macro (the heart of the compiler):
#define INVOKE(Op, ...) \
instr::core::impl<OP_##Op>(stack, gas, status, jump_addr, state \
__VA_OPT__(,) __VA_ARGS__); \
if constexpr (evmone::instr::has_extra_error_cases(OP_##Op)) { \
if (status != EVMC_SUCCESS) [[unlikely]] \
goto label_final; \
} \
if constexpr (OP_##Op == OP_JUMP) \
goto *jump_addr;Notice: if constexpr means the error checks are compile-time eliminated for opcodes that can’t fail (like ADD, DUP1, etc.). In an interpreter, every opcode dispatch has the full error-handling overhead.
The PUSHnJUMP Super-Instruction
This is a critical optimization. In Solidity-compiled bytecode, PUSH <addr>; JUMP is the most common pattern (function calls, returns, dispatcher). The compiler fuses these into a direct goto:
// From lib/compiler/aot_compiler.hpp
#define PUSHnJUMP(ofs) \
if ((GAS_CHECK_OFF || (GAS_CHECK_LOC != 1) || (gas >= 0)) && \
jumpdest_map.is_jumpdest(ofs)) [[likely]] \
goto L_OFFSET_##ofs; \
else \
goto label_final;Because the jump target is a compile-time constant (from the PUSH immediate), jumpdest_map.is_jumpdest(ofs) is evaluated at compile time via consteval:
// From lib/compiler/aot_execution_state.hpp
struct JumpdestMap {
static constexpr auto limit = 1 << 10;
const size_t size;
std::array<uint256, limit> keys;
std::array<native_jumpdest, limit> vals;
[[nodiscard]] consteval bool is_jumpdest(const uint256& offset) const noexcept {
return std::binary_search(keys.begin(), keys.begin() + size, offset);
}
};So the entire PUSH+JUMP sequence compiles down to a single unconditional goto (plus a gas check). Compare this to an interpreter, which must: decode PUSH, push the value, decode JUMP, pop the value, validate the jumpdest, and dispatch.
The Opcode Handlers
The opcode implementations in lib/compiler/aot_instructions.hpp are C++ inline functions that operate directly on the stack. Here’s ADD:
// From lib/compiler/aot_instructions.hpp
inline void add(PARAMS) noexcept
{
stack.top() += stack.pop();
}Where PARAMS expands to:
#define PARAMS StackTop& stack, int64_t& gas_left, evmc_status_code& status, \
native_jumpdest& jump_addr, ExecutionState& stateThe StackTop class uses raw pointer arithmetic for zero-overhead stack access:
class StackTop {
uint256* m_top;
public:
[[nodiscard]] uint256& operator[](int index) noexcept { return m_top[-index]; }
[[nodiscard]] uint256& pop() noexcept { return *m_top--; }
void push(const uint256& value) noexcept { *++m_top = value; }
};The 256-bit integers use clang’s _BitInt(256) extension (from aot_execution_state.hpp):
using uint256 = unsigned _BitInt(256);
using int256 = signed _BitInt(256);This lets clang natively understand 256-bit arithmetic and optimize accordingly — it can use SSE/AVX instructions, constant folding, strength reduction, etc. Standard intx (used in the interpreter) does multi-limb arithmetic in software.
Gas Check Location Optimization
The compiler explores where to place gas checks — not just at block entry:
// From lib/compiler/aot_compiler.hpp
/// Optimization 2: choose where to insert the out-of-gas check
/// Possible values:
/// - 0: BLOCK_START
/// - 1: Before JUMP (appears to be the best option in most cases)
/// - 2: JUMPDEST
#ifndef GAS_CHECK_LOC
#define GAS_CHECK_LOC 1
#endifPlacing the gas check before JUMP (option 1) instead of at BLOCK_START (option 0) helps the C++ compiler because it separates the gas metering (subtraction, always executed) from the gas checking (branch, rarely taken). This allows the compiler’s basic block layout and branch prediction heuristics to work better.
Performance Results
From the README, using a fibonacci benchmark (fib(10^8)) on AMD EPYC 7543:
| Method | Time (ms) | vs. Interpreter | vs. Native C |
|---|---|---|---|
| evmone interpreter | 4230 | 1.0x | 141x slower |
| evmone compiler | 403 | 10.5x faster | 13.3x slower |
| + Loop inversion | 134 | 31.6x faster | 4.4x slower |
| + Elide gas checks | 43 | 98.4x faster | 1.4x slower |
| Native C | 30 | 141x faster | 1.0x |
The current prototype achieves 5-10x speedup over the fastest EVM interpreter. With loop inversion (a classic compiler optimization that hasn’t been implemented yet), this goes to 30x. With gas check elision (feasible for trusted execution environments), you’re within 40% of native C.
Limitations
The “one function per contract” approach hits a wall for large contracts. The snailtracer benchmark generates a single C++ function with 10k+ lines — too complex for clang’s optimizer to handle efficiently. The solution would be to split contracts into multiple functions (one per Solidity function), but this requires understanding the contract’s ABI dispatch structure.
mega-evm: The Custom EVM
While the AOT compiler targets the sequencer’s hot path, MegaETH also needs a production EVM for the general case. The mega-evm crate wraps revm (the Rust EVM) with MegaETH-specific modifications.
Multidimensional Gas Model
This is MegaETH’s most significant protocol-level innovation. Standard Ethereum has a single gas dimension. MegaETH introduces three:
- Compute gas (1B limit) — CPU-bound work (arithmetic, memory, etc.)
- Data size (3.125 MB limit) — transaction calldata + logs
- KV updates (125K limit) — storage slot modifications
From crates/mega-evm/src/evm/limit.rs:
pub struct EvmTxRuntimeLimits {
pub tx_data_size_limit: u64,
pub tx_kv_updates_limit: u64,
pub tx_compute_gas_limit: u64,
pub tx_state_growth_limit: u64,
pub block_env_access_compute_gas_limit: u64,
pub oracle_access_compute_gas_limit: u64,
}Why? Because in standard Ethereum, a single gas dimension means compute-heavy and storage-heavy transactions compete unfairly. A loop that runs 1M iterations of ADD costs very different hardware resources than an SSTORE to a cold slot, but they’re priced in the same unit. Multidimensional gas allows the sequencer to price each resource independently.
Dynamic Gas Costs (SALT-Aware)
Storage gas costs scale with the SALT bucket capacity, preventing state bloat. From crates/mega-evm/src/external/gas.rs:
pub fn sstore_set_gas(
&mut self,
address: Address,
key: U256,
) -> Result<u64, SaltEnvImpl::Error> {
let bucket_id = SaltEnvImpl::bucket_id_for_slot(address, key);
let multiplier = self.load_bucket_cost_multiplier(bucket_id)?;
let gas = if self.spec.is_enabled(MegaSpecId::REX) {
constants::rex::SSTORE_SET_STORAGE_GAS_BASE * (multiplier - 1)
} else {
constants::mini_rex::SSTORE_SET_STORAGE_GAS * multiplier
};
Ok(gas)
}The bigger your SALT bucket gets (more items hashed to the same prefix), the more expensive it is to write to it. This creates economic pressure to spread storage across the address space and prevents pathological bucket growth.
Gas Detention Mechanism
When a contract accesses “volatile” data (block environment, oracle, or beneficiary), its remaining gas is immediately capped:
// Block environment access → 20M gas cap
// Oracle access → 1M gas cap (20M in Rex3+)This is a DoS prevention mechanism. Without it, a contract could read block.timestamp, then do 1B gas worth of computation that depends on it — but the timestamp changes every block, so none of that computation can be cached or predicted. By capping gas after volatile access, MegaETH bounds the “damage” from volatile-data-dependent computation.
The detained gas is refunded at transaction end — users only pay for actual work.
Custom Instruction Table
The MegaInstructions struct replaces the standard EVM instruction table. From crates/mega-evm/src/evm/instructions.rs:
pub struct MegaInstructions<DB: Database, ExtEnvs: ExternalEnvTypes> {
spec: MegaSpecId,
inner: EthInstructions<EthInterpreter, MegaContext<DB, ExtEnvs>>,
}
impl<DB: Database, ExtEnvs: ExternalEnvTypes> MegaInstructions<DB, ExtEnvs> {
pub fn new(spec: MegaSpecId) -> Self {
let instruction_table = match spec {
MegaSpecId::EQUIVALENCE => EthInstructions::new_mainnet(),
MegaSpecId::MINI_REX => EthInstructions::new(
mini_rex::instruction_table::<EthInterpreter, MegaContext<DB, ExtEnvs>>()
),
MegaSpecId::REX | MegaSpecId::REX1 => EthInstructions::new(
rex::instruction_table::<EthInterpreter, MegaContext<DB, ExtEnvs>>()
),
// ...
};
}
}Each spec version has its own instruction table with different gas costs, different opcode behavior (e.g., SELFDESTRUCT disabled in Mini-Rex but re-enabled in Rex2 with EIP-6780 semantics), and different limit enforcement.
Spec Evolution
MegaETH has gone through several spec versions, each refining the gas model:
- EQUIVALENCE: Pure Optimism compatibility (baseline)
- MINI_REX: Introduces multidimensional gas, 512KB contracts, gas detention
- REX: Refined storage gas (20K-32K base vs. MiniRex’s 2M), zero cost for fresh storage
- REX1: Limit reset fix
- REX2: SELFDESTRUCT restored with EIP-6780, keyless deploy
- REX3: Oracle detection moved to SLOAD-based, raised oracle gas limit to 20M
This rapid iteration is only possible because the EVM modifications are cleanly separated from the execution client via the mega-evm crate.
SALT: Small Authentication Large Trie
SALT is MegaETH’s replacement for Ethereum’s Merkle Patricia Trie. This is arguably the most impactful component for real-world performance, because state access is the dominant cost in block execution.
The Problem with MPT
Ethereum’s MPT has O(log₁₆ n) depth, where n is the number of accounts (~250M on mainnet). That’s about 7 levels. Each level is a separate database read. With cold storage (SSD), each read is ~100μs. That’s 700μs per state access — already exceeding MegaETH’s target block time.
Verkle trees improve this with wider branching (256-ary), reducing depth to ~3-4 levels. But they still require random disk I/O for interior nodes.
SALT’s Design
SALT uses a fundamentally different approach: a static, memory-resident main trie with dynamic hash table buckets at the leaves.
[ Level 1: Root (1 node) ]
|
[ Level 2 (256 nodes) ]
|
[ Level 3 (65,536 nodes) ]
|
[ Level 4 (16,777,216 leaf nodes) ]
|
Commitment to Bucket (SHI hash table)
The main trie is a 4-level complete 256-ary trie — that’s 16.8 million leaf nodes. Each leaf points to a bucket that holds the actual key-value pairs. For 3 billion items with 256-slot buckets, the entire authentication layer fits in ~1 GB of RAM.
From salt/src/state/state.rs, the state uses Strongly History-Independent (SHI) hash tables for buckets:
/// SHI Hash Table Implementation
///
/// The SHI hash table ensures that the same set of key-value pairs always
/// produces the same bucket layout, regardless of insertion order.
///
/// Key Features:
/// - History Independence: Final layout depends only on key-value pairs
/// - Linear Probing: Uses linear probing with key swapping
/// - Dynamic Resizing: Bucket expansion when load factor exceeds thresholdThe SHI property is critical — it means the commitment to a bucket is deterministic regardless of insertion order, which is necessary for consensus.
State Root Updates: O(1) Amortized
SALT uses IPA (Inner Product Argument) with Pedersen commitments — a homomorphic vector commitment scheme. When a slot value changes, the commitment update is a single elliptic curve multiplication (ECMul) on the delta:
new_commitment = old_commitment + (new_value - old_value) * generator[slot_index]
Propagation up the 3-level main trie costs 3 more ECMuls. Total: 4 ECMuls per key update.
But it gets better. Updates from multiple keys in different child buckets are batched at each parent level. From the README:
Updating 200,000 random keys requires approximately 460,000 ECMul operations, or an amortized cost of about 2.3 ECMuls per key.
Compare this to MPT, where updating a single key requires rehashing the entire path (7+ keccak256 operations, each over up to 17 children × 32 bytes = 544 bytes). And these hashes are not batching-friendly.
Bucket Growth
When a bucket exceeds 256 slots, it’s partitioned into 256-slot segments with a bucket tree built on top:
[ Bucket Tree Root ] ← new commitment
|
+-----+-----+
| | |
Seg 0 Seg 1 Seg N ← 256 slots each
This is elegant — the main trie structure never changes, only the bucket internals grow. No global rebalancing, no migrations.
Types
From salt/src/types.rs, the addressing is compact:
/// 24-bit bucket identifier (up to ~16M buckets).
pub type BucketId = u32;
/// 40-bit slot identifier within a bucket (up to ~1T slots).
pub type SlotId = u64;
/// 64-byte uncompressed group element for cryptographic commitments.
pub type CommitmentBytes = [u8; 64];
/// 32-byte scalar field element for cryptographic commitments.
pub type ScalarBytes = [u8; 32];Using the Banderwagon curve (same as Verkle trees) for commitments:
pub fn hash_commitment(commitment: CommitmentBytes) -> ScalarBytes {
use banderwagon::{CanonicalSerialize, Element};
let mut bytes = [0u8; 32];
Element::from_bytes_unchecked_uncompressed(commitment)
.map_to_scalar_field()
.serialize_compressed(&mut bytes[..])
.expect("Failed to serialize scalar to bytes");
bytes
}Stateless Validator
The stateless validator is the complement to the high-performance sequencer. While the sequencer maintains full state and uses every optimization, validators don’t need full state at all — they just need to verify that the sequencer’s claimed state transitions are correct.
How It Works
From crates/validator-core/src/executor.rs, the validation process is:
- Receive witness from sequencer (contains only the state slots touched by the block)
- Verify witness proof against previous state root (cryptographic proof from SALT)
- Replay transactions using witness data as the database
- Compute new state root from execution results
- Compare computed root with claimed root
The WitnessDatabase from crates/validator-core/src/database.rs implements revm’s DatabaseRef trait backed entirely by witness data:
pub struct WitnessDatabase<'a, W> {
pub header: &'a Header,
pub witness: &'a W,
pub contracts: &'a HashMap<B256, Bytecode>,
}Partial Statelessness
A key design decision: contract bytecode is not included in the witness. It’s fetched on-demand from RPC and cached locally. This is because:
- Bytecode is immutable (changes only on deployment)
- Bytecode is large (up to 24KB per contract, 512KB in MegaETH)
- Witness data should only contain things that change per block
This “partial stateless” approach dramatically reduces witness size.
Light Witness Optimization
For the debug-trace-server (used in development), there’s a LightWitness that skips expensive cryptographic point validation during deserialization. From crates/validator-core/src/light_witness.rs:
/// Standard SaltWitness deserialization: ~240ms (due to EC point validation)
/// LightWitness deserialization: ~10-20ms (skips EC point validation)
pub struct LightWitness {
pub kvs: BTreeMap<SaltKey, Option<SaltValue>>,
pub levels: FxHashMap<BucketId, u8>,
}A 12x speedup just by skipping proof validation you don’t need in trusted contexts. This is the kind of pragmatic optimization that matters in production.
Embarrassingly Parallel Validation
From the README:
Validation workers operate independently on different blocks with no coordination overhead. Throughput scales linearly with the number of CPU cores available.
Each block can be validated independently because the witness contains everything needed — there’s no shared mutable state between validation workers. This is a massive advantage over full nodes, which must process blocks sequentially (each block depends on the previous state).
Multi-Client Philosophy
The stateless validator supports pluggable execution engines:
Beyond the default Revm-based executor, the validator also supports an executor based on the formal K semantics of the EVM (developed with Pi²). Combined with the hyper-optimized, parallel, JIT-compiled executor on sequencer nodes, this creates three distinct MegaETH client implementations.
Three independent EVM implementations for cross-validation:
- Sequencer: AOT-compiled, parallel, hyper-optimized
- Validator: Vanilla revm, single-threaded, simple
- Formal verifier: K semantics (mathematically specified)
If all three agree, you have very high confidence in correctness. This is the same philosophy behind Ethereum’s multi-client approach (geth, Nethermind, Besu, Erigon), but applied within a single L2.
The reth Fork
MegaETH’s execution client is a fork of reth (Paradigm’s Rust Ethereum implementation). The fork strategy is surgical — replace the EVM and state trie while keeping everything else (networking, consensus, RPC, etc.) from upstream reth.
The key integration points are:
- EVM Factory: reth’s modular
EvmFactorytrait is implemented byMegaEvmFactory, which creates mega-evm instances instead of standard revm instances - Block Executor: The
MegaBlockExecutorhandles MegaETH-specific block processing (multidimensional gas limits, system transactions, etc.) - State Provider: SALT replaces the MPT as the state trie backend
This fork-and-replace approach means MegaETH inherits all of reth’s battle-tested infrastructure (peer discovery, sync, JSON-RPC, etc.) while only modifying the performance-critical execution layer.
Performance Techniques Summary
Let’s step back and catalog every performance technique across the codebase:
Eliminating Interpretation Overhead (evmone-compiler)
- AOT compilation: EVM bytecode → C++ → native code via clang
- Super-instructions:
PUSH+JUMPfused into directgoto - Computed gotos: GNU label-as-value extension for zero-cost dispatch
- Compile-time dispatch:
if constexpreliminates dead error checks - Block-level gas metering: Gas checked once per basic block, not per opcode
- Configurable gas check location: Optimize branch prediction
Optimizing 256-bit Arithmetic
_BitInt(256)extension: Lets clang use native multi-precision instructions- Compiler constant folding: Known-at-compile-time values propagated through
Eliminating State I/O (SALT)
- Memory-resident trie: 1GB footprint for 3B items → no disk reads
- O(1) amortized state root updates: Homomorphic commitments (ECMul, not rehash)
- Flat bucket storage: SHI hash tables instead of sparse sub-tries
- Batched commitment propagation: Width shrinkage at higher trie levels
Reducing Validation Cost
- Stateless validation: No full state needed, just witness data
- Partial statelessness: Bytecode excluded from witness (fetched separately)
- Light witness: Skip EC point validation in trusted contexts (12x faster deser)
- Embarrassingly parallel: Validators scale linearly with cores
Protocol-Level Optimizations (mega-evm)
- Multidimensional gas: Separate pricing for compute, storage, data
- Dynamic gas costs: SALT bucket-aware pricing prevents state bloat
- Gas detention: Cap computation after volatile data access (DoS prevention)
- 512KB contracts: 21x larger than standard Ethereum (24KB)
Comparison with Other Approaches
vs. Firedancer (Solana)
Firedancer is a from-scratch C reimplementation of Solana’s validator, optimized with SIMD, kernel bypass (io_uring, XDP), and cache-line-aware data structures. MegaETH takes a different approach:
- Firedancer: Rewrite everything in C, optimize every system call
- MegaETH: Fork reth, replace hot paths with specialized implementations
Firedancer operates on a fundamentally different chain (Solana has native parallelism, no EVM). MegaETH’s challenge is harder in some ways — they must maintain EVM compatibility while achieving similar throughput.
vs. Monad (Parallel EVM)
Monad focuses primarily on parallel EVM execution with optimistic concurrency control. MegaETH’s approach is broader — AOT compilation, custom state trie, multidimensional gas, stateless validation. Monad keeps the standard EVM/state trie and parallelizes at the transaction level.
The key philosophical difference: Monad bets that parallelism alone is sufficient. MegaETH bets that you need to optimize every layer of the stack.
vs. Standard geth/reth
| Aspect | geth/reth | MegaETH |
|---|---|---|
| EVM execution | Interpreted | AOT compiled (5-10x) |
| State trie | MPT (7 levels, disk-bound) | SALT (4 levels, memory-resident) |
| State root update | Rehash path (keccak256) | ECMul (homomorphic) |
| Gas model | 1 dimension | 3 dimensions |
| Validation | Full state replay | Stateless (witness-based) |
| Block latency | ~12 seconds | Sub-millisecond target |
What’s Missing / Future Work
A few things I noticed that aren’t in the open-source repos:
-
Parallel execution: The mega-evm crate has block environment access tracking (for conflict detection), and the README mentions parallel execution, but the actual parallel execution engine isn’t in the public repos. The stateless validator explicitly uses “single-threaded executor based on vanilla Revm interpreter” for simplicity.
-
AOT compiler integration with reth: The compiler produces
.sofiles, but the runtime dlopen/hot-swapping infrastructure isn’t in the public repos. -
Compilation speed and code size: The compiler README acknowledges these aren’t optimized yet. For production, you’d want incremental compilation and perhaps function-level splitting for large contracts.
-
Loop optimizations: The benchmark shows 30x speedup with manual loop inversion, but automatic loop optimization isn’t implemented yet. Getting LLVM’s
LoopRotationpass to work on the generated code would be the natural next step.
Conclusion
MegaETH is doing something rare in the blockchain space: attacking performance with genuine systems engineering depth across every layer of the stack. The AOT compiler is clever (using C++ as IR to leverage clang’s optimizer), SALT is well-designed (memory-resident, O(1) amortized updates, compact proofs), and the multidimensional gas model is a thoughtful protocol-level innovation.
The separation of concerns is clean — the sequencer uses every optimization available (AOT, parallel execution, full state in RAM), while validators use a simple, auditable, stateless approach. Three independent EVM implementations provide cross-validation confidence.
Whether MegaETH achieves its 100k TPS target in production remains to be seen. But the engineering approach is sound: identify every bottleneck, replace it with something better, and don’t rewrite what doesn’t need rewriting.