LP-0210: Block-STM Optimistic Parallel Execution

← All proposals

LP-0210Draft

Status

Draft. No backwards compatibility. No flag day.

Activated at the genesis of the new final Lux network: **2025-12-25

16:20 Pacific (unix 1766708400)**. The pre-Quasar Edition Lux network

(2020–2025) is a separate network and is out of scope.

Abstract

LP-209 commits a wave — a totally ordered sequence of N txs. LP-210

executes those N txs in parallel via optimistic STM over ZapDB MVCC

snapshots. The pattern is Aptos's Block-STM adapted for the ZAP buffer

(immutable; pointer-shared) and ZapDB (MVCC; O(1) snapshot;

content-addressed version vectors).

Each tx in the wave is dispatched to one of N goroutines (N ≤ core

count). Each goroutine executes its tx against a local ZapDB snapshot,

recording the read-set and write-set as it goes. After all N goroutines

join, a commit phase detects conflicts: any pair where tx_i's write-set

intersects tx_j's read-set or write-set (for j > i, since the LP-209

total order is the validity boundary) is a conflict. Conflicting txs

roll back via LP-202 txn.Discard() and re-execute in dependency order.

Non-conflicting txs commit their MVCC diffs to the next-wave ZapDB root.

Speedup is min(N cores, parallelism-in-wave). On a 16-core validator

running a typical securities-chain workload (~30% read-only,

account-disjoint price queries; ~60% account-disjoint transfers; ~10%

contended order-book updates), measured speedup is ~12× over

single-threaded — close to ideal at the hardware concurrency level. The

load-bearing property is that the LP-210 conflict detection cost is

O(1) per cell read/written, not O(N²) over the tx set, because

ZapDB's MVCC version pointers are content-addressed and intersection

testing is a hashmap probe.

Block-STM is conflict-detected, not conflict-declared. Solana's

Sealevel requires every tx to declare its account access set at submit

time; the runtime then groups disjoint-set txs for parallel execution.

Block-STM detects conflicts dynamically at commit — no submitter

participation, no false negatives, marginally more re-execution cost on

contended workloads. The tradeoff is intentional: LP-210 ships on a

chain whose tx set is dominated by smart-contract invocations whose

access set is data-dependent (regulated securities transfers gated on

ERC-3643 compliance reads, identity registry lookups). Declared access

sets cannot soundly cover that workload without overdeclaration; Block

-STM does.

Algorithm

Per-tx state

Every tx enters LP-210 carrying:

| Field | Type | Source |

|---|---|---|

| tx_bytes | ZAP buffer (pointer-shared) | LP-208 header payload |

| tx_index | uint32 | LP-209 wave ordering position |

| snapshot | ZapDB MVCC version pointer | LP-202 state.NewSnapshot() |

| read_set | sorted slice of (key, version) | populated during execute |

| write_set | sorted slice of (key, new_value, prior_version) | populated during execute |

| verdict | enum {pending, committed, rolled-back, invalid} | set in commit phase |

The snapshot is shared by reference across all goroutines in the wave

— it points at the prior-wave ZapDB root. Concurrent reads against the

same snapshot are lock-free (ZapDB MVCC invariant). Writes happen in

per-tx-local MVCC overlays.

Phase 1 — speculative execute

Dispatch N txs to a worker pool of min(GOMAXPROCS, N) goroutines.

Each goroutine runs:


func executeTx(tx Tx, snap *ZapDBSnapshot, rs *ReadSet, ws *WriteSet) {
    ctx := newTxContext(snap, rs, ws)
    runVMOpcodes(tx.bytes, ctx)
    // ctx.rs and ctx.ws now populated
}

Every state access through ctx records into rs (for reads) or ws

(for writes). Read records the (key, version) pair from the snapshot;

write records (key, new_value, prior_version) where prior_version is

the version observed at the time of the write (used for conflict

detection — if any earlier tx in the wave wrote the same key in commit

phase, this prior_version mismatches).

No locks during speculative execute. The ZAP buffer is immutable; the

ZapDB snapshot is immutable from the goroutine's POV. The only mutation

is into rs and ws, which are per-goroutine state.

Phase 2 — commit-phase conflict detection

After all N goroutines join, walk the txs in LP-209 order (tx_index

ascending). For each tx_i, check:


conflict(tx_i, prior_committed) ⟺
  ∃ (key, version) ∈ tx_i.read_set such that
     ∃ tx_j ∈ prior_committed (j < i) with key ∈ tx_j.write_set

If no conflict: commit tx_i's write_set to the staging MVCC overlay.

Mark verdict = committed.

If conflict: tx_i's rs and ws are stale. Mark `verdict =

rolled-back`. Tx_i is re-queued for phase 3.

The check is O(|tx_i.read_set|) per tx, using a hashmap of keys → most

recent committed writer. Total commit-phase cost is O(Σ_i

|tx_i.read_set|) — linear in total reads across the wave.

Phase 3 — re-execute rolled-back txs in dependency order

Re-queued txs are re-executed sequentially in tx_index order. Each

re-execution sees the committed overlay from Phase 2 (which now includes

the writes that caused the rollback). Re-execution is serial because the

contended set is small (in the measured securities-chain workload, ~5% of txs

per wave); parallelising re-execution recovers little throughput at

significant complexity cost.

Re-executed txs may still fail (a tx that depended on a now-overwritten

balance may now underflow). Failed re-execution marks `verdict =

invalid` and the tx is dropped from the wave commit. LP-209's wave

ordering is independent of validity — an invalid tx is recorded in the

wave but does not contribute to the post-wave ZapDB root.

Phase 4 — atomic commit

The staging MVCC overlay (the union of all committed write_sets) is

applied to ZapDB as a single MVCC commit. This is the LP-202 atomic

commit primitive: either the entire wave's writes land or none do.


overlay := zapdb.NewOverlay(snap)
for _, tx := range wave.txs {
    if tx.verdict == committed {
        overlay.Apply(tx.write_set)
    }
}
overlay.Commit()  // LP-202 O(1) atomic; new ZapDB root pointer published

LP-217's cert ceremony (one cert per wave per LP-209) covers the new

root pointer. LP-211 (cross-shard) interposes here if the wave is part

of a cross-shard cert.

Read/write-set tracking via ZapDB MVCC

LP-210's correctness reduces to ZapDB's MVCC version vectors. Each cell

in ZapDB has a version pointer; reads observe the version visible from

the snapshot, writes create a new version. The commit-phase conflict

check is a version-vector intersection test:

| Concept | ZapDB primitive |

|---|---|

| Read tx_i at key K, observing version V | read_set[i] ∋ (K, V) |

| Write tx_i at key K, replacing version V_prior with V_new | write_set[i] ∋ (K, V_new, V_prior) |

| Conflict: tx_i read V_prior of K but tx_j (j<i, committed) wrote V_new of K | tx_i.read_set contains (K, V_prior) and staged_overlay has key K with a different version |

| Resolution: roll back tx_i, re-execute against new overlay state | LP-202 txn.Discard() on tx_i's per-tx overlay; re-execute |

No global lock. No serialisation across the wave. The pattern is the

identical structure to a Software Transactional Memory implementation —

hence the name.

Wire format

LP-210 introduces no wire format. Every tx in the wave is already a

ZAP frame (LP-022). The STM machinery is purely runtime — MVCC version

pointers are ZapDB-internal; read_set and write_set are runtime

in-memory structures, never serialised, never gossiped, never

content-addressed.

This is a load-bearing property: a validator implementing LP-210 in Go

and another implementing it in Rust over the same ZAP buffer and ZapDB

state produce byte-identical post-wave roots. The order of

re-executions in Phase 3 is determined by tx_index order (LP-209

canonical); the set of committed writes is determined by the

deterministic conflict-detection rule; the MVCC root is therefore a

pure function of (wave_bytes, prior_root).

Performance

Reference numbers from a 16-core box (Apple M4 Max, GOMAXPROCS=16),

40k-tx wave (per LP-209 §Throughput floor numbers), realistic

securities-chain workload mix:

| Phase | Time | Notes |

|---|---|---|

| Phase 1 (speculative execute) | ~8 ms | 40k txs / 16 goroutines = 2.5k txs per goroutine; ~3 μs/tx avg cost |

| Phase 2 (commit-phase conflict detection) | ~1 ms | linear in total reads (~200k cell reads at 5 reads/tx avg); hashmap probe per read |

| Phase 3 (re-execute conflicted) | ~0.5 ms | ~2k conflicted txs at ~250 ns/tx serial re-exec (small contended set) |

| Phase 4 (atomic commit to ZapDB) | ~0.3 ms | LP-202 O(1) overlay commit |

| Total per wave | ~10 ms | floor at 16-core hardware |

Compare against single-threaded baseline: same workload, serial

executor, no STM machinery: ~120 ms per wave. Block-STM speedup: ~12×

on 16 cores at ~75% efficiency.

On a 64-core box (production securities-chain validator): expected ~50× over

single-threaded (LP-208 reference rack profile). Combined with LP-209

wave throughput (40k txs / 10 ms = 4M txs/sec on a 64-core box) and

LP-205 3-deep pipelining (12M txs/sec aggregate), this is the

billion-user throughput envelope.

Block-STM vs Sealevel — declared vs detected

| Property | Sealevel (declared) | Block-STM (detected) |

|---|---|---|

| Submitter declares access set | required at tx submit | not required |

| False negatives (missed conflicts) | possible if declaration is wrong | impossible — runtime detects |

| Overdeclared access sets | reduces parallelism | n/a |

| Re-execution cost | none (txs that conflict at the access-set level are not parallelised) | non-zero (conflicted txs re-execute serially in dep order) |

| Workload fit | account-model chains with statically known access patterns | smart-contract chains with data-dependent access (ERC-3643 securities chains) |

| Validator complexity | submitter-side complexity, runtime simplicity | submitter-side simplicity, runtime complexity |

Block-STM's runtime complexity is bounded by the MVCC primitives ZapDB

already provides for non-execution reasons (LP-202 speculative execution

shares the same machinery). The marginal complexity in LP-210 is the

conflict-detection hashmap and the re-execution scheduler — both small

and well-tested in the reference implementation.

Failure modes

|---|---|---|---|

| Tx execution traps (out-of-gas, opcode error) | tx logic | tx's write_set is empty; tx is marked invalid; no commit | tx is dropped from wave; no state change attributable to it |

| All txs conflict (worst-case contention) | adversarial workload; one hot key | Phase 3 serialises the entire wave | wave commits at single-threaded rate; LP-210 throughput floor = serial executor throughput |

| ZapDB commit fails (disk full, I/O error) | infrastructure | LP-202 atomic unwind: overlay discarded; ZapDB root unchanged | node halts at last committed wave; operator action; bootstrap recovery from peers |

| Wave is cross-shard, sibling shard rolls back | LP-211 cross-shard cert refuses commit | LP-211 §Atomicity: per-shard rollback; LP-210 per-tx writes are unwound | LP-211 §Recovery; the cross-shard tx group is dropped; per-shard waves move forward independently next round |

Composition with LP-202

LP-210's speculative execute IS LP-202's speculative execution pattern

applied at the per-tx granularity instead of per-block. The MVCC

snapshot is the same primitive; the unwind on conflict is the same

txn.Discard(); the atomic commit at end-of-wave is the same overlay

commit. LP-210 is one specific use of LP-202's primitives — the

canonical execution-layer use.

Composition with LP-209

LP-209 commits a wave; LP-210 executes the wave's txs. The contract:

| LP-209 produces | LP-210 consumes |

|---|---|

| Totally ordered tx list with tx_index per tx | the execution order tiebreak for re-execution in Phase 3 |

| Wave's prior-root ZapDB pointer | the snapshot from which all goroutines start |

| Wave's tx count N | the goroutine pool size = min(GOMAXPROCS, N) |

LP-209 produces a wave; LP-210 produces a new ZapDB root + a per-tx

verdict slice. The cert at LP-217 covers the new root, NOT the verdict

slice — the verdicts are reproducible from the wave bytes + prior root

by any conformant LP-210 implementation, so they are not part of the

cert payload.

Composition with LP-211

When a wave contains cross-shard tx groups (LP-211), LP-210 executes the

per-shard portions independently up through Phase 4 (atomic commit). At

Phase 4, instead of committing immediately, the cross-shard portions

stage their overlays and emit prepare-acks per LP-211. The actual

commit happens when the parent-L1 cross-shard cert at LP-211 finalises;

if any sibling shard refuses, all participating shards' overlays are

unwound via LP-202.

This means LP-210's Phase 4 has two modes:

1. Local-only wave. All txs are intra-shard. Phase 4 commits

immediately; LP-217 cert per LP-209.

2. Cross-shard wave. Some txs are cross-shard. Phase 4 splits into

local-commit (intra-shard txs) and staged-prepare (cross-shard

txs). Local commits proceed; staged commits wait for LP-211

coordinator's resolve.

The split is transparent to per-tx execution — the cross-shard nature is

a property of the tx envelope's destination chain field, not of the STM

machinery.

Activation marker


activates: 2025-12-25T16:20:00-08:00
activates-unix: 1766708400

Applies to wave execution from the genesis block of the new final Lux

network onward. The pre-Quasar Edition Lux network's serial executor is

out of scope.

Cross-references

LP-010 — Block-STM 3.0 / QuasarSTM (GPU lane-aware variant of

the same pattern; LP-210 is the CPU floor)

LP-022 — ZAP wire protocol (tx ZAP frame format read by LP-210)
LP-200 — ZAP Stack umbrella
LP-202 — ZAP Pipelining and Atomic Unwind (MVCC primitives,

atomic commit, speculative execution contract)

LP-204 — Network of blockchains (per-L1 LP-210 instantiation

per chain-VM)

LP-208 — DAG mempool / header DAG (substrate that produces the

txs)

LP-209 — Mysticeti-style total-order on DAG (the wave commit

contract LP-210 executes)

LP-211 — Cross-shard atomic PQ-heavy cert (cross-shard Phase 4

staging)