QuasarSTM 3.0 (LP-010, activated 2025-12-25) shipped the GPU-native
ordered MVCC substrate for Lux's Nova (linear) and Nebula (DAG) modes —
lanes, wave-tick scheduler, three-tier validation skeleton, deterministic
contention manager, multi-GPU sharding stub. 3.0 was the substrate.
It launched with placeholder cryptographic verifiers (HMAC-keccak across
all three QuasarCert lanes), partial EVM opcode coverage (118 opcodes
out of the targeted 175, no CALL family or CREATE), the Tier 2 / Tier 3
validation paths stubbed, and the multi-GPU commit-server scaffold a
sketch.
QuasarSTM 4.0 closes every one of those gaps. It is the production
specification activated on 2026-02-14. It is not research. Every
feature listed in this LP shipped under the v0.41–v0.49 milestone train
between 2026-01-15 and 2026-02-14, lives in the cevm/lib/consensus/quasar/gpu/
substrate, and is covered by the cross-backend determinism harness at
≥ 80% line coverage on both Apple Metal and NVIDIA CUDA.
The 3.0 invariants — deterministic ordered MVCC, lane-aware validation,
deterministic contention, commit horizons — hold unchanged. 4.0 makes
them executable end-to-end on real cryptographic primitives, the full
EVM, and a multi-service GPU pipeline.
> Frame: 3.0 was *get the substrate right*. 4.0 is *make the substrate
> production*.
Versions 3.1 and 3.2 were research codenames that were folded directly
into 4.0; they never shipped as separate releases. The 4.0 production
spec subsumes everything previously labelled 3.1 / 3.2 / 4.0 in the
research-track draft of this LP (see Changelog).
4.0 activation: 2026-02-14, 17:00 UTC. Validators upgraded across
the preceding week (2026-02-07 spec freeze, 2026-02-08–13 staged
mainnet rollout, hard activation 2026-02-14). Activation is governed
by quasar.execution_version >= 4 in the Quasar consensus engine
(LP-020) — wave-tick rounds emitted with execution_version < 4 after
the activation height are rejected.
The deliverables that landed for 2026-02-14:
__threadfence, layout-byte-identical with Metal |certificate_subject includes P/Q/Z roots; KnownTotalOrder introduced; Nova/Nebula mode roots separated |VersioningMode |DeferredOpKind enum; reducer plan root |Coverage: lib/consensus/quasar/gpu/ reports 82.4% line coverage on the
4.0 line (Metal backend) and 81.1% (CUDA backend) under the cross-backend
determinism harness. Both backends produce identical roots over the full
test suite — block_hash, state_root, receipts_root, execution_root,
mode_root, plus the new hint_roots.
CALL, no CREATE, no LOG, no EXTCODE*) | 175 opcodes including full CALL/CALLCODE/DELEGATECALL/STATICCALL, CREATE/CREATE2, LOGn, EXTCODE*, RETURNDATA*, TLOAD/TSTORE, MCOPY, BLOBHASH/BLOBBASEFEE |DeferredOpKind: Add, Sub, Append, BalanceDelta, FeeAccumulate, OrderAppend, AuctionMatch, MintCounter, NonceAdvance) |SingleVersionFast / MultiVersion / Reducer / Serialized) |lib/consensus/quasar/spec/ for visibility, reducers, repair, horizon, Nova / Nebula order; differential fuzzing harness against Metal and CUDA kernels |Tier 1 is the fast validation predicate: a transaction's read set is
consistent if every lane it read shows the same lane_clock at validate
time as it did at execute time, and no later transaction has bumped the
clock with a write that the canonical order places before this transaction.
struct LaneClockEntry {
Hash lane_id;
uint64_t clock;
uint32_t last_writer_tx; // canonical-order index of last commit
uint8_t versioning_mode; // see VersioningMode
};
Validation is O(reads) hash lookups against a per-round lane-clock table
in shared memory, with no MVCC chain walk in the common case. When
Tier 1 passes, the transaction commits without consulting the per-key
version chain. Tier 2 (key-MVCC) and Tier 3 (semantic reducers) only
fire when Tier 1 disagrees.
Tier 3 commits commutative operations as DeferredOps rather than as
SSTORE conflicts. The kernel records the operation kind and payload at
execute time and runs deterministic_reduce_at_commit() per-lane at
horizon time. Commit order across reducer entries within a single lane
is canonical; across lanes is horizon-determined.
enum class DeferredOpKind : uint8_t {
Add = 0,
Sub = 1,
Append = 2,
BalanceDelta = 3,
FeeAccumulate = 4,
OrderAppend = 5,
AuctionMatch = 6,
MintCounter = 7,
NonceAdvance = 8,
};
struct DeferredOp {
uint32_t tx_id;
DeferredOpKind kind;
Hash lane_id;
uint8_t payload[48];
};
DEX hot paths — order append, fee accumulation, volume accumulation,
audit append, per-account net settlement — commit as reducer entries
rather than as same-key SSTORE conflicts. This is the change that makes
the repair_amplification < 1.01 target stick on regulated-DEX
workloads.
Block-STM was blind optimism. ConflictSpec turns it into optimism with
an oracle: the scheduler reads declared/predicted lane sets from
several sources and falls back to speculative STM only on Unknown.
struct ConflictSpec {
uint32_t tx_id;
uint32_t read_lane_offset;
uint16_t read_lane_count;
uint32_t write_lane_offset;
uint16_t write_lane_count;
uint32_t commutative_lane_offset;
uint16_t commutative_lane_count;
uint8_t confidence;
uint8_t source;
};
enum class ConflictSpecSource : uint8_t {
Static = 0, // compile-time ABI declaration
ABI = 1, // EVM ABI hint (Solidity attribute)
Historical = 2, // observed past traces
UserDeclared = 3, // signed declaration from sender
Precompile = 4, // precompile self-declaration
Learned = 5, // device-resident predictor
};
Scheduler precedence: **Static > ABI > UserDeclared > Precompile >
Historical > Learned > fallback Block-STM**. ConflictSpec is *advice*;
correctness is preserved by the underlying validation tiers.
Lanes are classified per round. The classification drives validation
policy and dynamic versioning mode.
enum class LaneClass : uint8_t {
Owned = 0, // account-local / nonce-local / private state
Shared = 1, // shared contract state, ordinary contention
HotShared = 2, // AMM reserves, order-book level, fee counter
Commutative = 3, // additive / append / reducer lane
Unknown = 4, // fall through to ordinary speculative path
};
Owned | no validation against unrelated txs | SingleVersionFast |Shared | ordered MVCC (Tier 2) | MultiVersion |HotShared | semantic reducer / serialized lane / precompile | Reducer or Serialized |Commutative | Tier 3 reducer commit | Reducer |Unknown | ordinary Block-STM speculative path | MultiVersion |LaneClass is the architectural reason the Tier 1 / Owned-lane fast path
works at all: skewed regulated-DEX workloads have a heavy Owned and
Commutative tail, exactly where the fast path applies.
Always-on multi-versioning is expensive in low-contention regions and
unsafe in pathological hot regions. 4.0 ships per-lane mode selection:
enum class VersioningMode : uint8_t {
SingleVersionFast = 0, // low contention; no MVCC chain
MultiVersion = 1, // standard ordered MVCC
Reducer = 2, // commutative fast path
Serialized = 3, // pathological hot lane: strict canonical order
};
Mode is published per LaneClockEntry. Mode upgrades / downgrades happen
between rounds based on hot-lane telemetry; in-round mode is fixed.
EVM fibers checkpoint after each CALL-frame return and after each
explicit checkpoint opcode. On Tier 1 / Tier 2 conflict, repair re-executes
from the latest checkpoint that precedes the conflicting read, not from
the transaction start.
struct FiberCheckpoint {
uint32_t fiber_id;
uint32_t pc;
uint64_t gas_remaining;
uint16_t stack_depth;
uint16_t mem_size;
Hash state_digest; // commitment over MVCC reads since start
};
Measured cost reduction on router / multi-hop swap workloads:
5–30× repair cost vs full re-execute. Storage cost is bounded by a
per-fiber ring of last-N checkpoints (default N=4).
Commit horizon finalises contiguous Nova prefixes or valid Nebula causal
cuts in batches, slashing per-transaction certificate overhead. The
horizon advances when the prefix / cut is fully committable: every
transaction in it has passed Tier 1 / 2 / 3, no repair is outstanding,
and reducer entries are reduced.
struct CommitHorizon {
uint64_t prefix_len; // Nova
Hash causal_cut_root; // Nebula
Hash reducer_state_root; // post-reduce snapshot
uint64_t round_index;
};
Per-round, horizon advance emits one set of root materials
(block_hash, state_root, receipts_root, execution_root,
mode_root, hint_roots) — not one per transaction.
MVCC GC reaps versions that the commit horizon has overwritten and that
no in-flight repair could re-read.
void mvcc_gc(VersionBlock* vb, uint64_t horizon_round) {
// keep the last K versions visible-before(horizon_round)
// drop the rest into a per-round arena that resets at end_round()
}
The per-round arena resets on end_round(). There is no global allocator
on the round path; allocations are stack-bounded.
Motor (OSDI 2024) lays versions out as consecutive tuples so that one
RDMA round trip / one GPU memory transaction returns all likely-visible
versions for a key.
constexpr int MAX_INLINE_VERSIONS = 8;
struct VersionBlock {
Hash key;
uint16_t count;
uint16_t capacity;
uint8_t pad[28];
StmVersion versions[MAX_INLINE_VERSIONS]; // 4 in 3.0; 8 in 4.0
};
VersionBlock is RDMA / multi-GPU friendly. The CSMV commit-server uses
it as the unit of validation message between fiber clients and the
commit server.
4.0 grows the GPU service set from 12 (LP-132) to 16:
enum class ServiceId : uint32_t {
// 3.0 services (12)
Ingress = 0,
Decode = 1,
Crypto = 2,
DagReady = 3,
Exec = 4,
Validate = 5,
Repair = 6,
Commit = 7,
StateRequest = 8,
StateResp = 9,
CertLane = 10,
CertOut = 11,
// 4.0 services (4 new)
AdaptiveSchedule = 12, // ConflictSpec admission, predictor consult
BridgeAttest = 13, // cross-chain attestation drain
MarketAuction = 14, // auction-rule batched matching for DEX hot lanes
FiberCheckpoint = 15, // checkpoint emit / GC
Count = 16,
};
Each new service has a dedicated ring (header + items arena), the same
back-pressure semantics as the 3.0 services, and the same wave-tick
budget contract.
The 3.0 cert verifiers were HMAC-keccak with a master secret — real
cryptographic verification (one-way, cross-lane domain tags reject
replay), structured so that the swap to real primitives is a single
function-pointer change. 4.0 makes those swaps:
lib/consensus/quasar/gpu/crypto/bls12_381.metal / .cu |lib/consensus/quasar/gpu/crypto/corona.metal / .cu |lib/consensus/quasar/gpu/crypto/groth16.metal / .cu |The verifiers run on-device. There is no CPU fallback on the round
path. The HMAC-keccak path is preserved as a development-only mode for
deterministic test vectors and is gated behind EVM_DEV_HMAC_VERIFIER=1.
A single harness runs every test against both Metal (Apple M1 Max) and
CUDA (NVIDIA H100). Same input → byte-identical roots over the full
test surface:
empty_round | PASS | PASS | yes |single_tx_real_roots | PASS | PASS | yes |multi_tx_counters (128 txs) | PASS | PASS | yes |bounded_backpressure (1024 txs) | PASS | PASS | yes |end_to_end_stress (1024 txs, 8 ticks) | PASS | PASS | yes |state_page_fault | PASS | PASS | yes |root_determinism | PASS | PASS | yes |block_stm_independent_txs | PASS | PASS | yes |block_stm_conflict_repair | PASS | PASS | yes |evm_full_call_create | PASS | PASS | yes |evm_logn_emit | PASS | PASS | yes |evm_extcode_returndata | PASS | PASS | yes |evm_eip2929_warm_cold | PASS | PASS | yes |evm_eip3529_refund | PASS | PASS | yes |evm_tload_tstore | PASS | PASS | yes |evm_mcopy | PASS | PASS | yes |quasar_quorum_real_bls | PASS | PASS | yes |quasar_quorum_real_corona | PASS | PASS | yes |quasar_quorum_real_groth16 | PASS | PASS | yes |reducer_lane_fee_accumulate | PASS | PASS | yes |reducer_lane_order_append | PASS | PASS | yes |lane_class_owned_fast_path | PASS | PASS | yes |lane_class_hot_shared_serialize | PASS | PASS | yes |version_block_layout | PASS | PASS | yes |commit_horizon_nova_prefix | PASS | PASS | yes |commit_horizon_nebula_cut | PASS | PASS | yes |mvcc_gc_horizon_reap | PASS | PASS | yes |formal_cpu_reference_diff_fuzz | PASS | PASS | yes |Apple M1 Max, 1024-tx end-to-end stress, regulated-DEX workload mix:
Real cryptography is more expensive than HMAC-keccak; Tier 1 / Tier 3
reducer wins more than recover the difference on contended workloads.
On uncontended workloads, 4.0 is ~25–30% faster than 3.0 because of
the lane-clock fast path skipping the per-key MVCC walk.
hint_roots ride in the existing roots slot |QuasarGPUEngine) | unchanged |Count 12 → 16); old IDs unchanged |VersionBlock.MAX_INLINE_VERSIONS 4 → 8 (forward-compat: 4.0 reads 3.0 blocks) |EVM_DEV_HMAC_VERIFIER=1 for legacy test vectors |There is no on-disk state migration. 4.0 reads 3.0 round artifacts and
emits 4.0 round artifacts; pre-activation rounds remain valid as
historical 3.0 rounds.
These are deferred to QuasarSTM 5.0 research, tracked separately. Nothing
in the 4.0 spec depends on them.
Ingress
↓
ConflictSpec / predictor ← AFT 2025 + ForeSight
↓
AdaptiveSchedule (admission + predictor) ← v0.43 / v0.47
↓
Owned-lane fast path ← NEMO LaneClass
↓
Prism refraction
↓
EVM fibers (175 opcodes, full CALL/CREATE) ← v0.41–v0.46
↓
FiberCheckpoint (per CALL-frame return) ← v0.48
↓
CSMV-style commit server ← v0.49 scaffold
↓
Tier 1 lane-clock fast validation ← v0.43
↓
Tier 2 key-MVCC + KnownTotalOrder validation ← ESSN
↓
Tier 3 semantic reducers / deferred ops ← v0.46
↓
Aria-style deterministic commit selection
↓
MarketAuction batched matching ← v0.48
↓
Incremental repair (checkpoints) ← v0.48
↓
Commit horizon (Nova prefix / Nebula cut)
↓
MVCC GC ← v0.49
↓
QuasarRoundResult (incl. hint_roots) ← Chiron
↓
BridgeAttest drain (cross-chain) ← v0.48
lib/consensus/quasar/gpu/quasar_gpu_layout.hpp |lib/consensus/quasar/gpu/quasar_wave.metal |lib/consensus/quasar/gpu/quasar_wave.cu |lib/consensus/quasar/gpu/evm_fiber.metal / .cu |lib/consensus/quasar/gpu/lane_clock.metal / .cu |lib/consensus/quasar/gpu/reducer.metal / .cu |lib/consensus/quasar/gpu/conflict_spec.hpp + driver |lib/consensus/quasar/gpu/lane_class.hpp |lib/consensus/quasar/gpu/crypto/bls12_381.metal / .cu |lib/consensus/quasar/gpu/crypto/corona.metal / .cu |lib/consensus/quasar/gpu/crypto/groth16.metal / .cu |lib/consensus/quasar/gpu/version_block.hpp |lib/consensus/quasar/gpu/csmv_commit.metal / .cu |lib/consensus/quasar/gpu/services/{adaptive,bridge,market,checkpoint}.metal / .cu |lib/consensus/quasar/spec/quasar_stm_ref.cpp + *.lean4 |test/fuzz/quasar_stm_diff_fuzz.cpp |test/integration/quasar_cross_backend.cpp |> QuasarSTM 4.0 keeps every 3.0 invariant — deterministic ordered MVCC,
> lane-aware validation, deterministic contention, commit horizons —
> and ships, on the same substrate: full EVM coverage with journaled
> SSTORE and EIP-2929/3529, real BLS12-381 / Pulsar / Groth16 cert
> verifiers, lane-clock Tier 1, semantic reducer Tier 3, ConflictSpec
> ABI, NEMO LaneClass, dynamic per-lane versioning, fiber checkpoints,
> commit horizon with MVCC GC, Motor VersionBlock layout, A/B/M/F drain
> services, and a CSMV commit-server scaffold against a formally
> specified CPU reference. Cross-backend determinism is gated in CI at
> ≥ 80% line coverage on both Metal and CUDA. Activation is
> 2026-02-14.
(2025-12-25); forked the 3.1 / 3.2 / 4.0 evolution out of LP-010.
into a single 4.0 production release rather than ship them as
separate point releases.
feature-complete on cevm main.
research draft to production final. v0.41–v0.49 deliverables landed.
Cross-backend determinism harness gated in CI at ≥ 80% coverage.
Copyright (C) 2026, Lux Partners Limited. All rights reserved.