MegaETH: Dissecting a Real-Time Ethereum L2

MegaETH bills itself as “real-time Ethereum” — an L2 targeting sub-millisecond block latency and 100k+ TPS. Bold claims aside, the engineering is genuinely interesting. I spent a week reading their open-source repos, and what I found is a methodical, well-executed attack on every bottleneck in Ethereum execution. This post is a deep technical walkthrough of how it works.

The Performance Problem with Ethereum

To understand what MegaETH is doing, you need to understand what’s slow about Ethereum execution today.

An Ethereum node (geth, reth, etc.) processes transactions sequentially. For each transaction, the EVM interpreter:

Fetches the next opcode from bytecode
Dispatches to the handler (switch/jump table)
Manipulates a 256-bit stack machine
Reads/writes state from a Merkle Patricia Trie backed by LevelDB/MDBX
Computes gas costs
Updates the state root (re-hashing the trie)

Each of these steps has performance pathologies:

Interpretation overhead: Even the fastest interpreters (evmone’s baseline) pay dispatch costs per opcode — branch misprediction, icache misses, no cross-opcode optimization.
256-bit arithmetic on a 64-bit machine: Every ADD is 4 limb additions with carry propagation. MUL is 16 partial products. The EVM’s native word size is hostile to modern hardware.
State access: The Merkle Patricia Trie requires O(log n) random disk reads per state access. Each node is a separate database key. This is catastrophically slow for SSDs (random 4KB reads) and doesn’t parallelize.
Sequential execution: No transaction-level parallelism. The state root depends on execution order.
State root computation: After every block, you rehash the entire modified trie path. Each level is a keccak256 over up to 17 children. This is CPU-bound and latency-critical.

MegaETH attacks all five.

Architecture Overview

MegaETH’s architecture is sequencer-centric. The sequencer is a beefy machine (hundreds of GB RAM) optimized for throughput. It’s where the innovative execution happens. Validators use a fundamentally different (and much cheaper) verification path.

The codebase is organized as:

Repository	Language	Purpose
`megaeth-reth`	Rust	Fork of reth — the execution client
`megaeth-evmone-compiler`	C++	EVM bytecode → native code AOT compiler
`megaeth-mega-evm`	Rust	Custom EVM implementation (wraps revm)
`megaeth-salt`	Rust	SALT: novel authenticated trie structure
`megaeth-stateless-validator`	Rust	Lightweight stateless block validator

The key insight: fork reth, replace the hot paths with hyper-optimized versions, keep compatibility everywhere else. This is pragmatic engineering — they didn’t rewrite a consensus client from scratch (like Firedancer did for Solana). They surgically replaced the performance-critical components.

The Star: evmone-compiler (EVM → Native Code AOT Compiler)

This is the most technically interesting piece. It’s an ahead-of-time compiler that translates EVM bytecode into native x86-64 machine code via C++ as an intermediate representation.

The Compilation Pipeline

The pipeline is elegantly simple:

EVM bytecode → C++ function → Native .so (via clang) → dlopen at runtime

From lib/compiler/README.md:

                        ┌───────────────┐
                        │               │
                        │ evmone opcode ├────┐
                        │    handlers   │    │
                        │               │    │
                        └───────────────┘    │
                                             │
┌──────────────┐         ┌──────────────┐    │     ┌───────────────┐
│              │         │              │    │     │               │
│ EVM bytecode ├────────►│ C++ function ├────┴────►│  Native code  │
│              │   Our   │              │   C++    │  (.so file)   │
└──────────────┘ compiler└──────────────┘ compiler └───────────────┘

The clever trick: instead of writing a full code generator (register allocation, instruction selection, etc.), they generate C++ code that calls evmone’s existing opcode handler functions, then let clang’s optimization pipeline do the heavy lifting. This is a form of partial evaluation — the EVM program structure is known at compile time, but the operand values are runtime.

How Compilation Works

The compiler in lib/compiler/compiler.cpp performs three phases:

Phase 1: Basic Block Analysis

The bytecode is split into basic blocks at control flow boundaries. Each block’s gas cost, stack requirements, and stack growth are precomputed:

// From lib/compiler/compiler.cpp
struct BasicBlockAnalysis {
    bool valid = true;
    size_t start_offset;
    const baseline::CostTable& cost_table;
    std::vector<Opcode> opcodes {};
    std::vector<std::optional<uint256>> imm_values {};
    std::vector<bool> push_n_jump;
    int64_t base_gas_cost {};
    int stack_required {};
    int stack_max_growth {};
};

A key optimization detected here: push_n_jump identifies consecutive PUSH; JUMP pairs, which are extremely common in compiled Solidity (every function call and return). These get fused into a super-instruction.

Phase 2: Block Summary Computation

For each basic block, the total gas cost and stack bounds are pre-computed:

// From lib/compiler/compiler.cpp
for (auto& bb : basic_blks) {
    int stack_change = 0;
    bb.push_n_jump.assign(bb.opcodes.size(), false);
    for (size_t i = 0; i < bb.opcodes.size(); ++i) {
        const auto op = bb.opcodes[i];
        bb.base_gas_cost += cost_table[op];
        auto current_stack_required =
            instr::traits[op].stack_height_required - stack_change;
        bb.stack_required = std::max(bb.stack_required, current_stack_required);
        stack_change += instr::traits[op].stack_height_change;
        bb.stack_max_growth = std::max(bb.stack_max_growth, stack_change);
    }
}

This means gas checking and stack bounds checking happen once per basic block, not per opcode. In an interpreter, every single opcode dispatch includes these checks.

Phase 3: C++ Code Generation

Each opcode becomes a call to evmone’s handler functions. The generated code uses GNU computed gotos for control flow:

// Generated output for a basic block (from compile_cxx)
BLOCK_START(6, 18, 3, 0)
INVOKE(DUP3)
INVOKE(ISZERO)
PUSHnJUMP(27)    // fused PUSH+JUMPI super-instruction

The BLOCK_START macro expands to (from lib/compiler/aot_compiler.hpp):

#define BLOCK_START(ofs, ...) \
  L_OFFSET_##ofs:                                                               \
    static constexpr BasicBlock bb_##ofs {__VA_ARGS__};                         \
    status = check_block_requirements(bb_##ofs, gas, &stack.top(), stack_bottom); \
    if ((!GAS_CHECK_OFF && (GAS_CHECK_LOC == 0) && (gas < 0)) ||                \
            status != EVMC_SUCCESS) [[unlikely]]                                \
        goto label_final;

And the INVOKE macro (the heart of the compiler):

#define INVOKE(Op, ...)                                                         \
    instr::core::impl<OP_##Op>(stack, gas, status, jump_addr, state             \
        __VA_OPT__(,) __VA_ARGS__);                                             \
    if constexpr (evmone::instr::has_extra_error_cases(OP_##Op)) {              \
        if (status != EVMC_SUCCESS) [[unlikely]]                                \
            goto label_final;                                                   \
    }                                                                           \
    if constexpr (OP_##Op == OP_JUMP)                                           \
        goto *jump_addr;

Notice: if constexpr means the error checks are compile-time eliminated for opcodes that can’t fail (like ADD, DUP1, etc.). In an interpreter, every opcode dispatch has the full error-handling overhead.

The PUSHnJUMP Super-Instruction

This is a critical optimization. In Solidity-compiled bytecode, PUSH <addr>; JUMP is the most common pattern (function calls, returns, dispatcher). The compiler fuses these into a direct goto:

// From lib/compiler/aot_compiler.hpp
#define PUSHnJUMP(ofs)                                                      \
    if ((GAS_CHECK_OFF || (GAS_CHECK_LOC != 1) || (gas >= 0)) &&            \
            jumpdest_map.is_jumpdest(ofs)) [[likely]]                       \
        goto L_OFFSET_##ofs;                                                \
    else                                                                    \
        goto label_final;

Because the jump target is a compile-time constant (from the PUSH immediate), jumpdest_map.is_jumpdest(ofs) is evaluated at compile time via consteval:

// From lib/compiler/aot_execution_state.hpp
struct JumpdestMap {
    static constexpr auto limit = 1 << 10;
    const size_t size;
    std::array<uint256, limit> keys;
    std::array<native_jumpdest, limit> vals;
 
    [[nodiscard]] consteval bool is_jumpdest(const uint256& offset) const noexcept {
        return std::binary_search(keys.begin(), keys.begin() + size, offset);
    }
};

So the entire PUSH+JUMP sequence compiles down to a single unconditional goto (plus a gas check). Compare this to an interpreter, which must: decode PUSH, push the value, decode JUMP, pop the value, validate the jumpdest, and dispatch.

The Opcode Handlers

The opcode implementations in lib/compiler/aot_instructions.hpp are C++ inline functions that operate directly on the stack. Here’s ADD:

// From lib/compiler/aot_instructions.hpp
inline void add(PARAMS) noexcept
{
    stack.top() += stack.pop();
}

Where PARAMS expands to:

#define PARAMS StackTop& stack, int64_t& gas_left, evmc_status_code& status, \
               native_jumpdest& jump_addr, ExecutionState& state

The StackTop class uses raw pointer arithmetic for zero-overhead stack access:

class StackTop {
    uint256* m_top;
public:
    [[nodiscard]] uint256& operator[](int index) noexcept { return m_top[-index]; }
    [[nodiscard]] uint256& pop() noexcept { return *m_top--; }
    void push(const uint256& value) noexcept { *++m_top = value; }
};

The 256-bit integers use clang’s _BitInt(256) extension (from aot_execution_state.hpp):

using uint256 = unsigned _BitInt(256);
using int256 = signed _BitInt(256);

This lets clang natively understand 256-bit arithmetic and optimize accordingly — it can use SSE/AVX instructions, constant folding, strength reduction, etc. Standard intx (used in the interpreter) does multi-limb arithmetic in software.

Gas Check Location Optimization

The compiler explores where to place gas checks — not just at block entry:

// From lib/compiler/aot_compiler.hpp
/// Optimization 2: choose where to insert the out-of-gas check
/// Possible values:
/// - 0: BLOCK_START
/// - 1: Before JUMP (appears to be the best option in most cases)
/// - 2: JUMPDEST
#ifndef GAS_CHECK_LOC
#define GAS_CHECK_LOC 1
#endif

Placing the gas check before JUMP (option 1) instead of at BLOCK_START (option 0) helps the C++ compiler because it separates the gas metering (subtraction, always executed) from the gas checking (branch, rarely taken). This allows the compiler’s basic block layout and branch prediction heuristics to work better.

Performance Results

From the README, using a fibonacci benchmark (fib(10^8)) on AMD EPYC 7543:

Method	Time (ms)	vs. Interpreter	vs. Native C
evmone interpreter	4230	1.0x	141x slower
evmone compiler	403	10.5x faster	13.3x slower
+ Loop inversion	134	31.6x faster	4.4x slower
+ Elide gas checks	43	98.4x faster	1.4x slower
Native C	30	141x faster	1.0x

The current prototype achieves 5-10x speedup over the fastest EVM interpreter. With loop inversion (a classic compiler optimization that hasn’t been implemented yet), this goes to 30x. With gas check elision (feasible for trusted execution environments), you’re within 40% of native C.

Limitations

The “one function per contract” approach hits a wall for large contracts. The snailtracer benchmark generates a single C++ function with 10k+ lines — too complex for clang’s optimizer to handle efficiently. The solution would be to split contracts into multiple functions (one per Solidity function), but this requires understanding the contract’s ABI dispatch structure.

mega-evm: The Custom EVM

While the AOT compiler targets the sequencer’s hot path, MegaETH also needs a production EVM for the general case. The mega-evm crate wraps revm (the Rust EVM) with MegaETH-specific modifications.

Multidimensional Gas Model

This is MegaETH’s most significant protocol-level innovation. Standard Ethereum has a single gas dimension. MegaETH introduces three:

Compute gas (1B limit) — CPU-bound work (arithmetic, memory, etc.)
Data size (3.125 MB limit) — transaction calldata + logs
KV updates (125K limit) — storage slot modifications

From crates/mega-evm/src/evm/limit.rs:

pub struct EvmTxRuntimeLimits {
    pub tx_data_size_limit: u64,
    pub tx_kv_updates_limit: u64,
    pub tx_compute_gas_limit: u64,
    pub tx_state_growth_limit: u64,
    pub block_env_access_compute_gas_limit: u64,
    pub oracle_access_compute_gas_limit: u64,
}

Why? Because in standard Ethereum, a single gas dimension means compute-heavy and storage-heavy transactions compete unfairly. A loop that runs 1M iterations of ADD costs very different hardware resources than an SSTORE to a cold slot, but they’re priced in the same unit. Multidimensional gas allows the sequencer to price each resource independently.

Dynamic Gas Costs (SALT-Aware)

Storage gas costs scale with the SALT bucket capacity, preventing state bloat. From crates/mega-evm/src/external/gas.rs:

pub fn sstore_set_gas(
    &mut self,
    address: Address,
    key: U256,
) -> Result<u64, SaltEnvImpl::Error> {
    let bucket_id = SaltEnvImpl::bucket_id_for_slot(address, key);
    let multiplier = self.load_bucket_cost_multiplier(bucket_id)?;
 
    let gas = if self.spec.is_enabled(MegaSpecId::REX) {
        constants::rex::SSTORE_SET_STORAGE_GAS_BASE * (multiplier - 1)
    } else {
        constants::mini_rex::SSTORE_SET_STORAGE_GAS * multiplier
    };
    Ok(gas)
}

The bigger your SALT bucket gets (more items hashed to the same prefix), the more expensive it is to write to it. This creates economic pressure to spread storage across the address space and prevents pathological bucket growth.

Gas Detention Mechanism

When a contract accesses “volatile” data (block environment, oracle, or beneficiary), its remaining gas is immediately capped:

// Block environment access → 20M gas cap
// Oracle access → 1M gas cap (20M in Rex3+)

This is a DoS prevention mechanism. Without it, a contract could read block.timestamp, then do 1B gas worth of computation that depends on it — but the timestamp changes every block, so none of that computation can be cached or predicted. By capping gas after volatile access, MegaETH bounds the “damage” from volatile-data-dependent computation.

The detained gas is refunded at transaction end — users only pay for actual work.

Custom Instruction Table

The MegaInstructions struct replaces the standard EVM instruction table. From crates/mega-evm/src/evm/instructions.rs:

pub struct MegaInstructions<DB: Database, ExtEnvs: ExternalEnvTypes> {
    spec: MegaSpecId,
    inner: EthInstructions<EthInterpreter, MegaContext<DB, ExtEnvs>>,
}
 
impl<DB: Database, ExtEnvs: ExternalEnvTypes> MegaInstructions<DB, ExtEnvs> {
    pub fn new(spec: MegaSpecId) -> Self {
        let instruction_table = match spec {
            MegaSpecId::EQUIVALENCE => EthInstructions::new_mainnet(),
            MegaSpecId::MINI_REX => EthInstructions::new(
                mini_rex::instruction_table::<EthInterpreter, MegaContext<DB, ExtEnvs>>()
            ),
            MegaSpecId::REX | MegaSpecId::REX1 => EthInstructions::new(
                rex::instruction_table::<EthInterpreter, MegaContext<DB, ExtEnvs>>()
            ),
            // ...
        };
    }
}

Each spec version has its own instruction table with different gas costs, different opcode behavior (e.g., SELFDESTRUCT disabled in Mini-Rex but re-enabled in Rex2 with EIP-6780 semantics), and different limit enforcement.

Spec Evolution

MegaETH has gone through several spec versions, each refining the gas model:

EQUIVALENCE: Pure Optimism compatibility (baseline)
MINI_REX: Introduces multidimensional gas, 512KB contracts, gas detention
REX: Refined storage gas (20K-32K base vs. MiniRex’s 2M), zero cost for fresh storage
REX1: Limit reset fix
REX2: SELFDESTRUCT restored with EIP-6780, keyless deploy
REX3: Oracle detection moved to SLOAD-based, raised oracle gas limit to 20M

This rapid iteration is only possible because the EVM modifications are cleanly separated from the execution client via the mega-evm crate.

SALT: Small Authentication Large Trie

SALT is MegaETH’s replacement for Ethereum’s Merkle Patricia Trie. This is arguably the most impactful component for real-world performance, because state access is the dominant cost in block execution.

The Problem with MPT

Ethereum’s MPT has O(log₁₆ n) depth, where n is the number of accounts (~250M on mainnet). That’s about 7 levels. Each level is a separate database read. With cold storage (SSD), each read is ~100μs. That’s 700μs per state access — already exceeding MegaETH’s target block time.

Verkle trees improve this with wider branching (256-ary), reducing depth to ~3-4 levels. But they still require random disk I/O for interior nodes.

SALT’s Design

SALT uses a fundamentally different approach: a static, memory-resident main trie with dynamic hash table buckets at the leaves.

[ Level 1: Root (1 node) ]
         |
[ Level 2 (256 nodes) ]
         |
[ Level 3 (65,536 nodes) ]
         |
[ Level 4 (16,777,216 leaf nodes) ]
         |
    Commitment to Bucket (SHI hash table)

The main trie is a 4-level complete 256-ary trie — that’s 16.8 million leaf nodes. Each leaf points to a bucket that holds the actual key-value pairs. For 3 billion items with 256-slot buckets, the entire authentication layer fits in ~1 GB of RAM.

From salt/src/state/state.rs, the state uses Strongly History-Independent (SHI) hash tables for buckets:

/// SHI Hash Table Implementation
///
/// The SHI hash table ensures that the same set of key-value pairs always
/// produces the same bucket layout, regardless of insertion order.
///
/// Key Features:
/// - History Independence: Final layout depends only on key-value pairs
/// - Linear Probing: Uses linear probing with key swapping
/// - Dynamic Resizing: Bucket expansion when load factor exceeds threshold

The SHI property is critical — it means the commitment to a bucket is deterministic regardless of insertion order, which is necessary for consensus.

State Root Updates: O(1) Amortized

SALT uses IPA (Inner Product Argument) with Pedersen commitments — a homomorphic vector commitment scheme. When a slot value changes, the commitment update is a single elliptic curve multiplication (ECMul) on the delta:

new_commitment = old_commitment + (new_value - old_value) * generator[slot_index]

Propagation up the 3-level main trie costs 3 more ECMuls. Total: 4 ECMuls per key update.

But it gets better. Updates from multiple keys in different child buckets are batched at each parent level. From the README:

Updating 200,000 random keys requires approximately 460,000 ECMul operations, or an amortized cost of about 2.3 ECMuls per key.

Compare this to MPT, where updating a single key requires rehashing the entire path (7+ keccak256 operations, each over up to 17 children × 32 bytes = 544 bytes). And these hashes are not batching-friendly.

Bucket Growth

When a bucket exceeds 256 slots, it’s partitioned into 256-slot segments with a bucket tree built on top:

[ Bucket Tree Root ]          ← new commitment
        |
  +-----+-----+
  |     |     |
Seg 0  Seg 1  Seg N          ← 256 slots each

This is elegant — the main trie structure never changes, only the bucket internals grow. No global rebalancing, no migrations.

Types

From salt/src/types.rs, the addressing is compact:

/// 24-bit bucket identifier (up to ~16M buckets).
pub type BucketId = u32;
 
/// 40-bit slot identifier within a bucket (up to ~1T slots).
pub type SlotId = u64;
 
/// 64-byte uncompressed group element for cryptographic commitments.
pub type CommitmentBytes = [u8; 64];
 
/// 32-byte scalar field element for cryptographic commitments.
pub type ScalarBytes = [u8; 32];

Using the Banderwagon curve (same as Verkle trees) for commitments:

pub fn hash_commitment(commitment: CommitmentBytes) -> ScalarBytes {
    use banderwagon::{CanonicalSerialize, Element};
    let mut bytes = [0u8; 32];
    Element::from_bytes_unchecked_uncompressed(commitment)
        .map_to_scalar_field()
        .serialize_compressed(&mut bytes[..])
        .expect("Failed to serialize scalar to bytes");
    bytes
}

Stateless Validator

The stateless validator is the complement to the high-performance sequencer. While the sequencer maintains full state and uses every optimization, validators don’t need full state at all — they just need to verify that the sequencer’s claimed state transitions are correct.

How It Works

From crates/validator-core/src/executor.rs, the validation process is:

Receive witness from sequencer (contains only the state slots touched by the block)
Verify witness proof against previous state root (cryptographic proof from SALT)
Replay transactions using witness data as the database
Compute new state root from execution results
Compare computed root with claimed root

The WitnessDatabase from crates/validator-core/src/database.rs implements revm’s DatabaseRef trait backed entirely by witness data:

pub struct WitnessDatabase<'a, W> {
    pub header: &'a Header,
    pub witness: &'a W,
    pub contracts: &'a HashMap<B256, Bytecode>,
}

Partial Statelessness

A key design decision: contract bytecode is not included in the witness. It’s fetched on-demand from RPC and cached locally. This is because:

Bytecode is immutable (changes only on deployment)
Bytecode is large (up to 24KB per contract, 512KB in MegaETH)
Witness data should only contain things that change per block

This “partial stateless” approach dramatically reduces witness size.

Light Witness Optimization

For the debug-trace-server (used in development), there’s a LightWitness that skips expensive cryptographic point validation during deserialization. From crates/validator-core/src/light_witness.rs:

/// Standard SaltWitness deserialization: ~240ms (due to EC point validation)
/// LightWitness deserialization: ~10-20ms (skips EC point validation)
 
pub struct LightWitness {
    pub kvs: BTreeMap<SaltKey, Option<SaltValue>>,
    pub levels: FxHashMap<BucketId, u8>,
}

A 12x speedup just by skipping proof validation you don’t need in trusted contexts. This is the kind of pragmatic optimization that matters in production.

Embarrassingly Parallel Validation

From the README:

Validation workers operate independently on different blocks with no coordination overhead. Throughput scales linearly with the number of CPU cores available.

Each block can be validated independently because the witness contains everything needed — there’s no shared mutable state between validation workers. This is a massive advantage over full nodes, which must process blocks sequentially (each block depends on the previous state).

Multi-Client Philosophy

The stateless validator supports pluggable execution engines:

Beyond the default Revm-based executor, the validator also supports an executor based on the formal K semantics of the EVM (developed with Pi²). Combined with the hyper-optimized, parallel, JIT-compiled executor on sequencer nodes, this creates three distinct MegaETH client implementations.

Three independent EVM implementations for cross-validation:

Sequencer: AOT-compiled, parallel, hyper-optimized
Validator: Vanilla revm, single-threaded, simple
Formal verifier: K semantics (mathematically specified)

If all three agree, you have very high confidence in correctness. This is the same philosophy behind Ethereum’s multi-client approach (geth, Nethermind, Besu, Erigon), but applied within a single L2.

The reth Fork

MegaETH’s execution client is a fork of reth (Paradigm’s Rust Ethereum implementation). The fork strategy is surgical — replace the EVM and state trie while keeping everything else (networking, consensus, RPC, etc.) from upstream reth.

The key integration points are:

EVM Factory: reth’s modular EvmFactory trait is implemented by MegaEvmFactory, which creates mega-evm instances instead of standard revm instances
Block Executor: The MegaBlockExecutor handles MegaETH-specific block processing (multidimensional gas limits, system transactions, etc.)
State Provider: SALT replaces the MPT as the state trie backend

This fork-and-replace approach means MegaETH inherits all of reth’s battle-tested infrastructure (peer discovery, sync, JSON-RPC, etc.) while only modifying the performance-critical execution layer.

Performance Techniques Summary

Let’s step back and catalog every performance technique across the codebase:

Eliminating Interpretation Overhead (evmone-compiler)

AOT compilation: EVM bytecode → C++ → native code via clang
Super-instructions: PUSH+JUMP fused into direct goto
Computed gotos: GNU label-as-value extension for zero-cost dispatch
Compile-time dispatch: if constexpr eliminates dead error checks
Block-level gas metering: Gas checked once per basic block, not per opcode
Configurable gas check location: Optimize branch prediction

Optimizing 256-bit Arithmetic

_BitInt(256) extension: Lets clang use native multi-precision instructions
Compiler constant folding: Known-at-compile-time values propagated through

Eliminating State I/O (SALT)

Memory-resident trie: 1GB footprint for 3B items → no disk reads
O(1) amortized state root updates: Homomorphic commitments (ECMul, not rehash)
Flat bucket storage: SHI hash tables instead of sparse sub-tries
Batched commitment propagation: Width shrinkage at higher trie levels

Reducing Validation Cost

Stateless validation: No full state needed, just witness data
Partial statelessness: Bytecode excluded from witness (fetched separately)
Light witness: Skip EC point validation in trusted contexts (12x faster deser)
Embarrassingly parallel: Validators scale linearly with cores

Protocol-Level Optimizations (mega-evm)

Multidimensional gas: Separate pricing for compute, storage, data
Dynamic gas costs: SALT bucket-aware pricing prevents state bloat
Gas detention: Cap computation after volatile data access (DoS prevention)
512KB contracts: 21x larger than standard Ethereum (24KB)

Comparison with Other Approaches

vs. Firedancer (Solana)

Firedancer is a from-scratch C reimplementation of Solana’s validator, optimized with SIMD, kernel bypass (io_uring, XDP), and cache-line-aware data structures. MegaETH takes a different approach:

Firedancer: Rewrite everything in C, optimize every system call
MegaETH: Fork reth, replace hot paths with specialized implementations

Firedancer operates on a fundamentally different chain (Solana has native parallelism, no EVM). MegaETH’s challenge is harder in some ways — they must maintain EVM compatibility while achieving similar throughput.

vs. Monad (Parallel EVM)

Monad focuses primarily on parallel EVM execution with optimistic concurrency control. MegaETH’s approach is broader — AOT compilation, custom state trie, multidimensional gas, stateless validation. Monad keeps the standard EVM/state trie and parallelizes at the transaction level.

The key philosophical difference: Monad bets that parallelism alone is sufficient. MegaETH bets that you need to optimize every layer of the stack.

vs. Standard geth/reth

Aspect	geth/reth	MegaETH
EVM execution	Interpreted	AOT compiled (5-10x)
State trie	MPT (7 levels, disk-bound)	SALT (4 levels, memory-resident)
State root update	Rehash path (keccak256)	ECMul (homomorphic)
Gas model	1 dimension	3 dimensions
Validation	Full state replay	Stateless (witness-based)
Block latency	~12 seconds	Sub-millisecond target

What’s Missing / Future Work

A few things I noticed that aren’t in the open-source repos:

Parallel execution: The mega-evm crate has block environment access tracking (for conflict detection), and the README mentions parallel execution, but the actual parallel execution engine isn’t in the public repos. The stateless validator explicitly uses “single-threaded executor based on vanilla Revm interpreter” for simplicity.
AOT compiler integration with reth: The compiler produces .so files, but the runtime dlopen/hot-swapping infrastructure isn’t in the public repos.
Compilation speed and code size: The compiler README acknowledges these aren’t optimized yet. For production, you’d want incremental compilation and perhaps function-level splitting for large contracts.
Loop optimizations: The benchmark shows 30x speedup with manual loop inversion, but automatic loop optimization isn’t implemented yet. Getting LLVM’s LoopRotation pass to work on the generated code would be the natural next step.

Conclusion

MegaETH is doing something rare in the blockchain space: attacking performance with genuine systems engineering depth across every layer of the stack. The AOT compiler is clever (using C++ as IR to leverage clang’s optimizer), SALT is well-designed (memory-resident, O(1) amortized updates, compact proofs), and the multidimensional gas model is a thoughtful protocol-level innovation.

The separation of concerns is clean — the sequencer uses every optimization available (AOT, parallel execution, full state in RAM), while validators use a simple, auditable, stateless approach. Three independent EVM implementations provide cross-validation confidence.

Whether MegaETH achieves its 100k TPS target in production remains to be seen. But the engineering approach is sound: identify every bottleneck, replace it with something better, and don’t rewrite what doesn’t need rewriting.

twigslot

Explorer

MegaETH: Dissecting a Real-Time Ethereum L2

MegaETH: Dissecting a Real-Time Ethereum L2

The Performance Problem with Ethereum

Architecture Overview

The Star: evmone-compiler (EVM → Native Code AOT Compiler)

The Compilation Pipeline

How Compilation Works

The PUSHnJUMP Super-Instruction

The Opcode Handlers

Gas Check Location Optimization

Performance Results

Limitations

mega-evm: The Custom EVM

Multidimensional Gas Model

Dynamic Gas Costs (SALT-Aware)

Gas Detention Mechanism

Custom Instruction Table

Spec Evolution

SALT: Small Authentication Large Trie

The Problem with MPT

SALT’s Design

State Root Updates: O(1) Amortized

Bucket Growth

Types

Stateless Validator

How It Works

Partial Statelessness

Light Witness Optimization

Embarrassingly Parallel Validation

Multi-Client Philosophy

The reth Fork

Performance Techniques Summary

Eliminating Interpretation Overhead (evmone-compiler)

Optimizing 256-bit Arithmetic

Eliminating State I/O (SALT)

Reducing Validation Cost

Protocol-Level Optimizations (mega-evm)

Comparison with Other Approaches

vs. Firedancer (Solana)

vs. Monad (Parallel EVM)

vs. Standard geth/reth

What’s Missing / Future Work

Conclusion

Graph View

Table of Contents

Backlinks