LP-0208: DAG Mempool — Narwhal/Bullshark-Style Leaderless Transaction Dissemination

← All proposals

LP-0208Draft

Status

Draft. No backwards compatibility. No flag day.

Activated at the genesis of the new final Lux network: **2025-12-25

16:20 Pacific (unix 1766708400)**. The pre-Quasar Edition Lux

network (2020–2025) is single-mempool-per-validator by construction

and is a separate network out of scope for this LP.

Abstract

Today's Lux mempool model: each validator maintains its own ranked

heap of pending transactions; a leader at each round picks from its

local mempool when proposing a block. Aggregate throughput is bounded

by the slowest validator — if one validator's mempool can admit

50,000 tx/s, that is the aggregate ceiling no matter how many

validators participate.

The DAG mempool inverts this. Every validator continuously broadcasts

a stream of ZAP headers; each header carries a list of tx hashes,

parent pointers to the most recent headers from peer validators, and

a leader-signed digest. The actual tx bytes live in the Kademlia DHT

(LP-201 layer C), content-addressed by hash. The aggregate structure

is a directed acyclic graph: every header has parents from peers, so

the graph captures the partial order of tx admission across the

validator set.

Throughput becomes sum(validators). If each validator emits 50k

tx-hashes/sec, a 64-validator set admits 3.2M tx-hashes/sec into the

DAG. The bottleneck moves from mempool admission to DHT throughput

and signature-verification rate per validator — both linearly

scalable.

DAG total-ordering — the step that picks a single canonical sequence

from the DAG for execution — is deferred to LP-209 (Mysticeti). This

LP defines only the construction surface: the wire format for

headers and bodies, the parent-linking discipline, the DHT-storage

contract for bodies, the garbage-collection rule for old headers,

and the per-validator emission rate.

LP-205 (Pipelined Quasar) is the sibling LP. Pipelined Quasar

specifies how a single rotating leader runs Select-Build-Sign at

pipeline depth 3. LP-208 specifies where the Select stage's txs come

from. The two LPs are orthogonal: a pipelined-Quasar chain can use a

single-mempool model (LP-205 only) or a DAG mempool model (LP-205 +

LP-208); a DAG-mempool chain can use round-blocking Quasar or

pipelined Quasar.

Together with LP-209 (Mysticeti), LP-210 (Block-STM), and LP-211

(cross-shard atomic commit), this LP enables the billion-user

sub-millisecond throughput path.

Motivation

Three concrete failures of the single-mempool-per-validator model:

1. Throughput is min-bounded. Aggregate admission rate at a chain

is min(per-validator-admission). The slowest validator caps the

chain. For a 64-validator chain with one validator on consumer

hardware (10k tx/s) and 63 on Blackwell (200k tx/s), aggregate is

10k tx/s — 99.2% of the validator-set capacity is wasted.

2. Tx loss on leader byzantine. A leader at round R picks from

its local mempool. If the leader is byzantine and withholds its

block, all txs in its local mempool are stuck until the next

round; honest validators' versions of those txs may have already

expired. Single-mempool models trade liveness for simplicity here.

3. Latency is leader-rotation-dependent. A tx admitted to

validator V's mempool at time T does not propagate into a block

until V is the leader at some future round. The gap is bounded by

leader-rotation period; for round-robin with N validators and

block period P, the worst-case tx-to-block latency is N × P.

The DAG mempool fixes all three:

1. Throughput is sum-bounded. Aggregate admission rate is

sum(per-validator-emission-rate). Slow validators do not

bottleneck fast ones.

2. Tx durability is byzantine-resilient. A tx is included in

every validator's parent-header stream — at least one of which is

honest by the consensus threshold. The DHT-stored body is

content-addressed, so any honest peer can serve it.

3. Latency is independent of leader rotation. A tx is included

in the next emitted header from the validator that admitted it,

typically within the per-validator emission period (default 50ms).

The leader at the consensus layer reads from the DAG; it does not

wait for its own mempool to fill.

The DAG structure

A continuously-growing directed acyclic graph. Vertices are headers;

edges are parent-pointers. The structure is replicated at every

validator and converges via the DHT.

Vertex (header) properties

Unique identifier: sha256(header.Bytes())
Validator-local: each header is signed by exactly one validator
Self-stamped: contains the validator's local wall-clock at emission
Parent-linked: contains one parent-pointer per other validator

(best-effort — see "Parent-linking discipline" below)

Tx-bearing: carries a list of tx hashes for the txs the

validator admitted since its last header

Edge properties

Cross-validator: a header from validator V has parent-pointers

to recent headers from validators ≠ V

Antiquated-tolerant: a parent-pointer to a header that has aged

out of the active DAG window is treated as a no-op; the DAG

remains acyclic

Quorum-target: each header attempts to point to at least

2N/3 + 1 parents (validator-quorum), but liveness is preserved if

fewer parents are reachable

Emission rate

Each validator emits a new header every K milliseconds.

Default K: 50ms (configurable per chain via dag.emission_ms)
Lower bound K: bounded by network RTT among validators; for an

intra-region validator set, K can be as low as 10ms; for a

globally-distributed set, K is typically 100-200ms

Upper bound K: bounded by the cert timeout in the consumer

consensus (LP-217 mode + LP-205 pipeline); K must be lower than

the cert wall-clock to avoid serializing on header emission

Wire format

Two new ZAP schema IDs allocated for this LP at 0xE0..0xE1 in the

LP-300 master registry. (Historical draft used 0xF0..0xFF but that

collides with LP-214's light-client envelopes at 0xF0..0xF2; the

canonical Option-B map places consensus-extension primitives

contiguously in 0xE0..0xEF: 0xE0..0xE1 DAG mempool, 0xE2..0xE7

cross-shard atomic (LP-211), 0xE8..0xEF rollup-VM (LP-218).)

Schema 0xE0: DAGHeader

Carried on unidirectional QUIC streams (LP-201) from emitter to

peers; replicated to the DHT (LP-201 layer C) under

sha256(header.Bytes()).

| Field | Type | Notes |

|---|---|---|

| version | uint8 | DAG protocol version; activation initializes to 1 |

| validator | [20]byte | NodeID of the emitter |

| epoch | uint64 | DAG epoch; advances on validator-set change |

| seq | uint64 | Per-validator monotonic sequence number; gaps allowed |

| timestamp | int64 | Emitter wall-clock at emission (Unix nanoseconds) |

| parent_hashes | []bytes32 | Parent header hashes, one per other validator (zero-padded if no parent available); length-prefixed list |

| tx_count | uint16 | Number of tx hashes in this header |

| tx_hashes | [tx_count][32]byte | List of tx hashes for txs admitted since previous header |

| body_hash | [32]byte | sha256(tx_hashes_bytes) — separate from header hash to allow body GC without invalidating header references |

| sig | [96]byte | BLS sig over version || validator || epoch || seq || timestamp || parent_hashes || body_hash; matches LP-022 BLS leg curve |

body_hash is the content-address of the DAGBody (schema 0xE1)

keyed in the DHT.

Schema 0xE1: DAGBody

Carried on bidirectional QUIC streams (LP-201) on DHTFindValue

response; stored in the DHT under body_hash.

| Field | Type | Notes |

|---|---|---|

| version | uint8 | DAG protocol version; matches the parent DAGHeader |

| tx_count | uint16 | Number of txs |

| tx_bytes | [tx_count][]byte | Length-prefixed list of full tx bytes — each tx is itself a ZAP frame |

A DAGBody is content-equivalent to a list of tx bytes. There is no

header re-embedded in the body — the body is a pure container so

that the same body bytes can serve multiple equivalent DAGHeaders

that happen to commit to the same tx-set.

Schema ID allocation

| Schema ID | Type | Notes |

|---|---|---|

| 0xE0 | DAGHeader | This LP |

| 0xE1 | DAGBody | This LP |

Schema ID rationale: per LP-300 master schema registry, 0xD0..0xDF is

LP-201 reserved (P2P transport), 0xE0..0xE1 this LP (DAG mempool),

0xE2..0xE7 LP-211 (cross-shard atomic), 0xE8..0xEF LP-218 (rollup-VM

envelopes, moved here from the original 0xE0..0xEF claim to avoid

overlapping LP-211), 0xF0..0xF2 LP-214 (light client requests/

responses), 0xF3..0xFF reserved future. The DAG-mempool draft

originally claimed 0xF0..0xFF; LP-214 took 0xF0..0xF2 for client

requests so DAG moved to 0xE0..0xE1. Schema IDs do not overlap

across LPs.

Parent-linking discipline

Each header from validator V emitted at time T contains parent

pointers chosen as follows:

1. For each other validator W in the active validator set, V

maintains a "freshest known header from W" pointer.

2. At header-emit time T, V snapshots its freshest-known set and

includes the hashes as parent_hashes.

3. If V has no fresh header from W (e.g., W is partitioned, or T is

within the first K-ms of the epoch), the corresponding slot is

zero-padded.

The discipline is honest-best-effort. A byzantine V can omit

parents, lie about freshness, or zero-pad maliciously; the LP-209

total-ordering algorithm handles these adversarial cases. This LP

only specifies the honest validator's behavior; adversarial

robustness is a property of the consumer (LP-209) plus the cert

profile (LP-217 mode).

Why DAG-mempool beats traditional mempool

Four operational properties, each backed by a specific mechanism.

1. Throughput is sum-bounded

A traditional mempool has admission rate

min(validator_admit_rate); aggregate is the slowest validator.

The DAG mempool has emission rate sum(validator_emit_rate). Each

validator emits its own headers; the DAG-wide rate is the sum. The

slowest validator contributes its share; it does not gate the

fastest.

Reference numbers (LP-203 GPU verify + LP-202 pipeline depth):

Per-validator emission: 1 header / 50ms × 1000 tx-hashes/header

= 20,000 tx-hashes/sec

64-validator chain: 1.28M tx-hashes/sec aggregate
256-validator chain: 5.12M tx-hashes/sec aggregate

Headers contain only hashes; bodies are fetched separately via DHT,

so per-validator bandwidth is tx_count × 32 bytes per emission

period = ~640 KiB/sec per validator at the reference rate. Trivial

network overhead.

2. No tx loss on byzantine leader

In a single-mempool model, a tx admitted to a byzantine leader's

mempool may be withheld from blocks indefinitely; honest validators

do not see it until the next leader proposes.

In the DAG mempool, every validator emits its own admitted txs

independently. A tx admitted at any honest validator appears in at

least one honest validator's header within K milliseconds, and the

body is replicated to the DHT under its content-hash. The leader at

the consumer layer reads from the DAG, not from its own mempool, so

byzantine-leader-withholds-tx is mechanically impossible.

The cert-profile (LP-217 mode) is unaffected: the consumer LP-205 /

LP-209 reads the DAG and orders it; the cert is computed over the

ordered output, not the DAG.

3. Content-addressed, no re-encoding

Each header references its body via body_hash. The body is the

tx bytes verbatim. There is no re-encoding step — the bytes that

the validator admitted are the bytes that go in the DHT and that

the consumer reads. This is the LP-200 ZAP stack guarantee

(immutable buffer, content-addressed, no codec re-marshal).

A traditional mempool re-encodes txs on insertion (for storage

layout) and on emission (for wire format). The DAG mempool does

neither; the LP-022 wire format IS the storage format.

4. Latency is leader-independent

In a single-mempool model, tx-to-block latency is

leader_rotation_period × position_in_rotation. The worst case

for a tx that just missed the current leader is one full rotation.

In the DAG mempool, tx-to-DAG latency is bounded by the validator's

emission period K (default 50ms). The tx is in the DAG as soon as

the validator that admitted it emits its next header. The

consumer's ordering (LP-209) picks the tx for execution at the

next ordering pass; for a typical chain with cert wall-clock

~5-15ms, the tx-to-block latency is dominated by K.

DAG to total-order

Total-ordering is deferred to LP-209 (Mysticeti). This LP

specifies only the DAG construction.

The rationale for the split:

Multiple ordering algorithms can consume the same DAG: Mysticeti

(LP-209 — committee-based ordering with sub-second finality);

Bullshark (round-robin-anchored ordering with FIFO fairness);

Tusk (asynchronous ordering with worst-case finality).

Each algorithm has a different latency / throughput / fairness

tradeoff. A chain at LP-217 PQ-off mode for HFT trading might use

Mysticeti for sub-100ms; a chain at PQ-heavy mode for archival

anchoring might use Bullshark for ordering fairness.

The DAG construction is invariant across these choices. Splitting

the construction from the ordering keeps the LP-208 wire format

stable while LP-209 (and future ordering LPs) evolve.

LP-208's contract to LP-209:

1. The DAG is a content-addressed DAG; LP-209 can refer to vertices

by hash and rely on byte-identity across validators.

2. Headers from honest validators carry honest tx-hash lists and

honest-best-effort parent pointers; LP-209 specifies how to

recover ordering against adversarial headers.

3. Bodies are eventually consistent in the DHT under

body_hash; LP-209 specifies when to wait for body availability

versus proceeding with header-only ordering.

Garbage collection

The DAG grows continuously. Without GC, validator memory and DHT

storage would grow without bound.

Header GC

A header at height H is eligible for GC when:

The consumer (LP-209) has total-ordered the DAG past height H, AND
The cert (LP-217 mode) covering all transactions hashed in the

header has finalized.

Eligible headers are removed from the active DAG window. Their

content-hashes remain valid forever (anyone who archives a header

can still reference it), but live validators no longer hold them in

memory. Validator local storage of GC'd headers falls to "cold" —

moved to long-term archival storage; no longer indexed for active

parent-pointer resolution.

Default window: 2 × cert_timeout_ms worth of headers, plus the

maximum LP-209 ordering-delay window. Concrete reference: at

K=50ms, LP-217 PQ-strict mode (cert ~15ms), LP-209 Mysticeti

(ordering delay ~3 rounds) — active window is ~5 × 50ms = 250ms of

headers = 5 headers per validator = 320 headers in a 64-validator

chain. Trivial memory.

Body GC

Bodies in the DHT (LP-201 layer C) age out per the LP-201 TTL rule:

default 24h with refresh-on-access. Bodies for headers that have

been GC'd at all live validators are still served by archival nodes

(LP-201 §"Archival role") until the TTL expires.

Bodies for headers still in the active DAG window are pinned

(LP-201 §"Content pinning") so they cannot be evicted from the DHT.

Restart / catch-up

A validator that restarts re-fetches the active DAG window from

peers. Each peer serves its own freshest headers via the

unidirectional DAGHeader stream (LP-201 schema 0xE0); the restarted

validator stitches the parent-pointers to reconstruct the DAG.

Bodies are re-fetched from the DHT as needed.

Catch-up latency: bounded by active_window_size × network_RTT.

For the reference 320-header window over a 100ms RTT, catch-up is

~32 seconds — fast enough for routine validator restart.

Throughput target

The DAG mempool's per-validator emission rate sets the per-chain

throughput ceiling at the admission layer. Reference targets:

| Per-validator emit rate | 64-validator chain admit rate | 256-validator chain admit rate |

|---|---|---|

| 10K tx-hashes/sec | 640K tx/sec | 2.56M tx/sec |

| 50K tx-hashes/sec | 3.2M tx/sec | 12.8M tx/sec |

| 100K tx-hashes/sec | 6.4M tx/sec | 25.6M tx/sec |

| 1M tx-hashes/sec | 64M tx/sec | 256M tx/sec |

The per-validator emit rate is bounded by:

GPU verify (LP-203): BLS leg-sig verify on Blackwell at

~500 μs aggregate per N=64; ~2,000 sig verifies/sec at full

precision. For tx sig verify (per-tx, not per-header), GPU bench is

~1M verifies/sec on Blackwell — the per-validator emit rate at

this hardware class is ~1M tx-hashes/sec.

DHT body write throughput (LP-201): bounded by replication

factor k=20 and network bandwidth. At ~1 KiB per tx and 1M

tx/sec emit, body throughput is ~1 GiB/sec into the DHT

per-validator — within Blackwell-class NIC bandwidth (200 Gbps

≈ 25 GiB/sec).

Mempool admission (per-validator local): bounded by tx-sig

verify rate on the local GPU and Block-STM speculator throughput.

Both scale with GPU class.

Projected throughput at billion-user scale: 25M+ tx/sec on a

256-validator chain with PQ-fast mode (LP-217) at depth-3 pipelining

(LP-205). Numbers above are projected from the LP-203 measured

bench rows extrapolated to the DAG emission model; measured

DAG-mempool throughput will land in the LP-203 addendum once the

implementation tag ships.

Composition

With LP-202 (pipelining contract)

DAG header production runs in parallel with the current block's

Sign stage at the leader. The Select stage of LP-205's pipelined

Quasar reads from the DAG, not from a local mempool — but the

Select work is unchanged in shape, only its input source changes.

LP-202 unwind primitives apply to the Select stage exactly as they

do in the single-mempool case.

With LP-203 (GPU verify)

Tx sig verify runs on Blackwell as DAG headers arrive. Each header

carries up to 1000 tx hashes; the bodies for those hashes are

batched into a single CUDA dispatch on the verify path. Verify

throughput tracks the LP-203 bench numbers directly — at ~1M

verifies/sec on Blackwell, the GPU is not the bottleneck up to a

per-validator emit rate of ~1M tx/sec.

With LP-217 (cert profile modes)

The DAG ordering (from LP-209 Mysticeti consuming this LP's DAG) is

committed at the LP-217 mode the consumer chain has configured —

PQ-fast for typical workloads, PQ-strict for custody/treasury,

PQ-heavy for archival anchoring. The DAG itself is mode-agnostic;

only the cert that finalizes the ordering of DAG vertices into

blocks carries the mode posture.

With LP-205 (pipelined Quasar)

The two LPs compose orthogonally:

LP-205 specifies the per-leader pipeline at depth 3.
LP-208 specifies the per-validator emission of headers into a DAG.

A pipelined-Quasar chain using a DAG mempool runs:

1. Every validator emits DAGHeader every K=50ms (this LP).

2. LP-209 Mysticeti totally orders the DAG; the ordered output is the

stream of txs.

3. The current leader's Select stage (LP-205) consumes from the

Mysticeti output instead of a local mempool.

4. Build (LP-205) constructs a block frame referencing the consumed

tx-set.

5. Sign (LP-205) produces a QuasarCert at the LP-217 mode.

The leader's Select stage is the only LP-205 stage that changes; the

other two stages are unchanged. The aggregate throughput becomes

sum(validator_emit_rate) rather than min(validator_mempool_rate).

Activation marker


activates: 2025-12-25T16:20:00-08:00
activates-unix: 1766708400

DAG mempool is opt-in at activation. A chain enables it via genesis

config:


dag_mempool:
  enabled: true
  emission_ms: 50
  active_window_headers: 320
  schema_ids:
    header: 0xE0
    body: 0xE1

Chains that do not enable DAG mempool use the single-mempool-per-

validator model unchanged. LP-205 (pipelined Quasar) works against

either input source.

Reference Implementation

The DAG mempool lives under

~/work/lux/consensus/mempool/dag/:

header.go — DAGHeader struct + ZAP encode/decode against schema

0xE0; sig-verify against the validator's BLS pubkey.

body.go — DAGBody struct + ZAP encode/decode against schema 0xE1;

body-hash computation.

emitter.go — Per-validator emission loop; signs a new header

every emission_ms; broadcasts via LP-201 unidirectional QUIC

stream; stores body in DHT via LP-201 layer C.

receiver.go — Per-peer header receive loop; validates sig,

parent-pointers, body-hash; inserts into local DAG view.

dag.go — In-memory DAG view: map[hash]Header + freshest-pointer

per validator; parent-pointer resolution.

gc.go — Active-window GC; cold-archive move; DHT-pinning

release.

dht.go — Body fetch via LP-201 DHTFindValue (0xDE); body store

via LP-201 DHTStore (0xDF) with content-pinning hint.

Integration points:

~/work/lux/consensus/protocol/quasar/select.go (LP-205 Select

stage) — new DAGSource implementing the MempoolSource

interface; chain config picks LocalMempoolSource or DAGSource.

The DAG mempool composes with the existing LP-201 DHT and LP-205

Select stage; no per-stage code rewrites required.

Test Cases

A conformant implementation MUST:

1. Emit headers at the configured emission_ms rate; gaps in seq

are allowed (e.g., transient partition) but seq is strictly

monotonic per validator.

2. Sign each header with the validator's BLS key over the canonical

pre-image (`version || validator || epoch || seq || timestamp ||

parent_hashes || body_hash`).

3. Reject incoming headers with invalid sig, mismatched body-hash,

non-monotonic seq, or out-of-epoch sender.

4. Resolve parent-pointers against the local DAG view; treat

unreachable parents as no-op (do not block on resolution).

5. Reproduce the GC contract: active-window headers in memory; aged

headers moved to cold archive; bodies pinned in DHT while header

is active.

6. Catch up from peers on restart by re-fetching the active window

and stitching the DAG.

7. Emit per-stage Prometheus metrics: dag_header_emit_total,

dag_header_receive_total, dag_body_fetch_total,

dag_active_window_size, dag_gc_cycles_total.

Conformance test vectors at

~/work/lux/consensus/test/vectors/dag/dag-mempool.jsonl.

Cross-references

LP-022 — ZAP wire protocol; the framing under schema IDs

0xE0/0xE1.

LP-077 — Round-digest binding; not used by DAG mempool

construction but used by the consumer LP-209 ordering.

LP-182 — Consensus wire / QuasarCert; the cert that finalizes

the LP-209 ordering of this DAG.

LP-200 — ZAP Stack umbrella; the immutable-buffer guarantee

this LP relies on for content-addressing.

LP-201 — ZAP P2P; schema IDs 0xE0..0xE1 do not collide with

LP-201's 0xD0..0xDF reservation; Kademlia DHT is used for body

storage.

LP-300 — Master Schema Registry; LP-208 schema IDs are

registered there. This LP's §"Wire format" is informative; LP-300

is normative for ID assignment.

LP-202 — Pipelining contract; per-stage unwind primitives

apply to the Select stage that consumes from this DAG.

LP-203 — GPU-native verify; saturates the per-header sig-verify

path.

LP-205 — Pipelined Quasar (sibling); consumes from this DAG

via the Select stage.

LP-209 — Mysticeti ordering (future); the consumer that

totally orders this DAG.

LP-217 — Cert profile modes; the consumer's ordering is

committed at the chosen mode.

Future Work

LP-209 — Mysticeti ordering. Total-ordering of the DAG with

sub-second finality; this LP's DAG is its input.

LP-210 — Block-STM speculation. Speculative execute against

the DAG's tx stream ahead of LP-209 ordering; commit on ordering

arrival.

LP-211 — Cross-shard atomic commit. DAG mempools across peer

shards exchange parent-pointers via cross-shard schema IDs

(LP-211 owns 0xE2..0xE7).

Status

Abstract

Motivation

The DAG structure

Vertex (header) properties

Edge properties

Emission rate

Wire format

Schema 0xE0: DAGHeader

Schema 0xE1: DAGBody

Schema ID allocation

Parent-linking discipline

Why DAG-mempool beats traditional mempool

1. Throughput is sum-bounded

2. No tx loss on byzantine leader

3. Content-addressed, no re-encoding

4. Latency is leader-independent

DAG to total-order

Garbage collection

Header GC

Body GC

Restart / catch-up

Throughput target

Composition

With LP-202 (pipelining contract)

With LP-203 (GPU verify)

With LP-217 (cert profile modes)

With LP-205 (pipelined Quasar)

Activation marker

Reference Implementation

Test Cases

Cross-references

Future Work

Copyright