Draft. No backwards compatibility. No flag day.
Activated at the genesis of the new final Lux network: **2025-12-25
16:20 Pacific (unix 1766708400)**. The pre-Quasar Edition Lux
network (2020–2025) is single-mempool-per-validator by construction
and is a separate network out of scope for this LP.
Today's Lux mempool model: each validator maintains its own ranked
heap of pending transactions; a leader at each round picks from its
local mempool when proposing a block. Aggregate throughput is bounded
by the slowest validator — if one validator's mempool can admit
50,000 tx/s, that is the aggregate ceiling no matter how many
validators participate.
The DAG mempool inverts this. Every validator continuously broadcasts
a stream of ZAP headers; each header carries a list of tx hashes,
parent pointers to the most recent headers from peer validators, and
a leader-signed digest. The actual tx bytes live in the Kademlia DHT
(LP-201 layer C), content-addressed by hash. The aggregate structure
is a directed acyclic graph: every header has parents from peers, so
the graph captures the partial order of tx admission across the
validator set.
Throughput becomes sum(validators). If each validator emits 50k
tx-hashes/sec, a 64-validator set admits 3.2M tx-hashes/sec into the
DAG. The bottleneck moves from mempool admission to DHT throughput
and signature-verification rate per validator — both linearly
scalable.
DAG total-ordering — the step that picks a single canonical sequence
from the DAG for execution — is deferred to LP-209 (Mysticeti). This
LP defines only the construction surface: the wire format for
headers and bodies, the parent-linking discipline, the DHT-storage
contract for bodies, the garbage-collection rule for old headers,
and the per-validator emission rate.
LP-205 (Pipelined Quasar) is the sibling LP. Pipelined Quasar
specifies how a single rotating leader runs Select-Build-Sign at
pipeline depth 3. LP-208 specifies where the Select stage's txs come
from. The two LPs are orthogonal: a pipelined-Quasar chain can use a
single-mempool model (LP-205 only) or a DAG mempool model (LP-205 +
LP-208); a DAG-mempool chain can use round-blocking Quasar or
pipelined Quasar.
Together with LP-209 (Mysticeti), LP-210 (Block-STM), and LP-211
(cross-shard atomic commit), this LP enables the billion-user
sub-millisecond throughput path.
Three concrete failures of the single-mempool-per-validator model:
1. Throughput is min-bounded. Aggregate admission rate at a chain
is min(per-validator-admission). The slowest validator caps the
chain. For a 64-validator chain with one validator on consumer
hardware (10k tx/s) and 63 on Blackwell (200k tx/s), aggregate is
10k tx/s — 99.2% of the validator-set capacity is wasted.
2. Tx loss on leader byzantine. A leader at round R picks from
its local mempool. If the leader is byzantine and withholds its
block, all txs in its local mempool are stuck until the next
round; honest validators' versions of those txs may have already
expired. Single-mempool models trade liveness for simplicity here.
3. Latency is leader-rotation-dependent. A tx admitted to
validator V's mempool at time T does not propagate into a block
until V is the leader at some future round. The gap is bounded by
leader-rotation period; for round-robin with N validators and
block period P, the worst-case tx-to-block latency is N × P.
The DAG mempool fixes all three:
1. Throughput is sum-bounded. Aggregate admission rate is
sum(per-validator-emission-rate). Slow validators do not
bottleneck fast ones.
2. Tx durability is byzantine-resilient. A tx is included in
every validator's parent-header stream — at least one of which is
honest by the consensus threshold. The DHT-stored body is
content-addressed, so any honest peer can serve it.
3. Latency is independent of leader rotation. A tx is included
in the next emitted header from the validator that admitted it,
typically within the per-validator emission period (default 50ms).
The leader at the consensus layer reads from the DAG; it does not
wait for its own mempool to fill.
A continuously-growing directed acyclic graph. Vertices are headers;
edges are parent-pointers. The structure is replicated at every
validator and converges via the DHT.
sha256(header.Bytes())(best-effort — see "Parent-linking discipline" below)
validator admitted since its last header
to recent headers from validators ≠ V
out of the active DAG window is treated as a no-op; the DAG
remains acyclic
2N/3 + 1 parents (validator-quorum), but liveness is preserved if
fewer parents are reachable
Each validator emits a new header every K milliseconds.
dag.emission_ms)intra-region validator set, K can be as low as 10ms; for a
globally-distributed set, K is typically 100-200ms
consensus (LP-217 mode + LP-205 pipeline); K must be lower than
the cert wall-clock to avoid serializing on header emission
Two new ZAP schema IDs allocated for this LP at 0xE0..0xE1 in the
LP-300 master registry. (Historical draft used 0xF0..0xFF but that
collides with LP-214's light-client envelopes at 0xF0..0xF2; the
canonical Option-B map places consensus-extension primitives
contiguously in 0xE0..0xEF: 0xE0..0xE1 DAG mempool, 0xE2..0xE7
cross-shard atomic (LP-211), 0xE8..0xEF rollup-VM (LP-218).)
Carried on unidirectional QUIC streams (LP-201) from emitter to
peers; replicated to the DHT (LP-201 layer C) under
sha256(header.Bytes()).
version | uint8 | DAG protocol version; activation initializes to 1 |validator | [20]byte | NodeID of the emitter |epoch | uint64 | DAG epoch; advances on validator-set change |seq | uint64 | Per-validator monotonic sequence number; gaps allowed |timestamp | int64 | Emitter wall-clock at emission (Unix nanoseconds) |parent_hashes | []bytes32 | Parent header hashes, one per other validator (zero-padded if no parent available); length-prefixed list |tx_count | uint16 | Number of tx hashes in this header |tx_hashes | [tx_count][32]byte | List of tx hashes for txs admitted since previous header |body_hash | [32]byte | sha256(tx_hashes_bytes) — separate from header hash to allow body GC without invalidating header references |sig | [96]byte | BLS sig over version || validator || epoch || seq || timestamp || parent_hashes || body_hash; matches LP-022 BLS leg curve |body_hash is the content-address of the DAGBody (schema 0xE1)
keyed in the DHT.
Carried on bidirectional QUIC streams (LP-201) on DHTFindValue
response; stored in the DHT under body_hash.
version | uint8 | DAG protocol version; matches the parent DAGHeader |tx_count | uint16 | Number of txs |tx_bytes | [tx_count][]byte | Length-prefixed list of full tx bytes — each tx is itself a ZAP frame |A DAGBody is content-equivalent to a list of tx bytes. There is no
header re-embedded in the body — the body is a pure container so
that the same body bytes can serve multiple equivalent DAGHeaders
that happen to commit to the same tx-set.
Schema ID rationale: per LP-300 master schema registry, 0xD0..0xDF is
LP-201 reserved (P2P transport), 0xE0..0xE1 this LP (DAG mempool),
0xE2..0xE7 LP-211 (cross-shard atomic), 0xE8..0xEF LP-218 (rollup-VM
envelopes, moved here from the original 0xE0..0xEF claim to avoid
overlapping LP-211), 0xF0..0xF2 LP-214 (light client requests/
responses), 0xF3..0xFF reserved future. The DAG-mempool draft
originally claimed 0xF0..0xFF; LP-214 took 0xF0..0xF2 for client
requests so DAG moved to 0xE0..0xE1. Schema IDs do not overlap
across LPs.
Each header from validator V emitted at time T contains parent
pointers chosen as follows:
1. For each other validator W in the active validator set, V
maintains a "freshest known header from W" pointer.
2. At header-emit time T, V snapshots its freshest-known set and
includes the hashes as parent_hashes.
3. If V has no fresh header from W (e.g., W is partitioned, or T is
within the first K-ms of the epoch), the corresponding slot is
zero-padded.
The discipline is honest-best-effort. A byzantine V can omit
parents, lie about freshness, or zero-pad maliciously; the LP-209
total-ordering algorithm handles these adversarial cases. This LP
only specifies the honest validator's behavior; adversarial
robustness is a property of the consumer (LP-209) plus the cert
profile (LP-217 mode).
Four operational properties, each backed by a specific mechanism.
A traditional mempool has admission rate
min(validator_admit_rate); aggregate is the slowest validator.
The DAG mempool has emission rate sum(validator_emit_rate). Each
validator emits its own headers; the DAG-wide rate is the sum. The
slowest validator contributes its share; it does not gate the
fastest.
Reference numbers (LP-203 GPU verify + LP-202 pipeline depth):
= 20,000 tx-hashes/sec
Headers contain only hashes; bodies are fetched separately via DHT,
so per-validator bandwidth is tx_count × 32 bytes per emission
period = ~640 KiB/sec per validator at the reference rate. Trivial
network overhead.
In a single-mempool model, a tx admitted to a byzantine leader's
mempool may be withheld from blocks indefinitely; honest validators
do not see it until the next leader proposes.
In the DAG mempool, every validator emits its own admitted txs
independently. A tx admitted at any honest validator appears in at
least one honest validator's header within K milliseconds, and the
body is replicated to the DHT under its content-hash. The leader at
the consumer layer reads from the DAG, not from its own mempool, so
byzantine-leader-withholds-tx is mechanically impossible.
The cert-profile (LP-217 mode) is unaffected: the consumer LP-205 /
LP-209 reads the DAG and orders it; the cert is computed over the
ordered output, not the DAG.
Each header references its body via body_hash. The body is the
tx bytes verbatim. There is no re-encoding step — the bytes that
the validator admitted are the bytes that go in the DHT and that
the consumer reads. This is the LP-200 ZAP stack guarantee
(immutable buffer, content-addressed, no codec re-marshal).
A traditional mempool re-encodes txs on insertion (for storage
layout) and on emission (for wire format). The DAG mempool does
neither; the LP-022 wire format IS the storage format.
In a single-mempool model, tx-to-block latency is
leader_rotation_period × position_in_rotation. The worst case
for a tx that just missed the current leader is one full rotation.
In the DAG mempool, tx-to-DAG latency is bounded by the validator's
emission period K (default 50ms). The tx is in the DAG as soon as
the validator that admitted it emits its next header. The
consumer's ordering (LP-209) picks the tx for execution at the
next ordering pass; for a typical chain with cert wall-clock
~5-15ms, the tx-to-block latency is dominated by K.
Total-ordering is deferred to LP-209 (Mysticeti). This LP
specifies only the DAG construction.
The rationale for the split:
(LP-209 — committee-based ordering with sub-second finality);
Bullshark (round-robin-anchored ordering with FIFO fairness);
Tusk (asynchronous ordering with worst-case finality).
tradeoff. A chain at LP-217 PQ-off mode for HFT trading might use
Mysticeti for sub-100ms; a chain at PQ-heavy mode for archival
anchoring might use Bullshark for ordering fairness.
the construction from the ordering keeps the LP-208 wire format
stable while LP-209 (and future ordering LPs) evolve.
LP-208's contract to LP-209:
1. The DAG is a content-addressed DAG; LP-209 can refer to vertices
by hash and rely on byte-identity across validators.
2. Headers from honest validators carry honest tx-hash lists and
honest-best-effort parent pointers; LP-209 specifies how to
recover ordering against adversarial headers.
3. Bodies are eventually consistent in the DHT under
body_hash; LP-209 specifies when to wait for body availability
versus proceeding with header-only ordering.
The DAG grows continuously. Without GC, validator memory and DHT
storage would grow without bound.
A header at height H is eligible for GC when:
header has finalized.
Eligible headers are removed from the active DAG window. Their
content-hashes remain valid forever (anyone who archives a header
can still reference it), but live validators no longer hold them in
memory. Validator local storage of GC'd headers falls to "cold" —
moved to long-term archival storage; no longer indexed for active
parent-pointer resolution.
Default window: 2 × cert_timeout_ms worth of headers, plus the
maximum LP-209 ordering-delay window. Concrete reference: at
K=50ms, LP-217 PQ-strict mode (cert ~15ms), LP-209 Mysticeti
(ordering delay ~3 rounds) — active window is ~5 × 50ms = 250ms of
headers = 5 headers per validator = 320 headers in a 64-validator
chain. Trivial memory.
Bodies in the DHT (LP-201 layer C) age out per the LP-201 TTL rule:
default 24h with refresh-on-access. Bodies for headers that have
been GC'd at all live validators are still served by archival nodes
(LP-201 §"Archival role") until the TTL expires.
Bodies for headers still in the active DAG window are pinned
(LP-201 §"Content pinning") so they cannot be evicted from the DHT.
A validator that restarts re-fetches the active DAG window from
peers. Each peer serves its own freshest headers via the
unidirectional DAGHeader stream (LP-201 schema 0xE0); the restarted
validator stitches the parent-pointers to reconstruct the DAG.
Bodies are re-fetched from the DHT as needed.
Catch-up latency: bounded by active_window_size × network_RTT.
For the reference 320-header window over a 100ms RTT, catch-up is
~32 seconds — fast enough for routine validator restart.
The DAG mempool's per-validator emission rate sets the per-chain
throughput ceiling at the admission layer. Reference targets:
The per-validator emit rate is bounded by:
~500 μs aggregate per N=64; ~2,000 sig verifies/sec at full
precision. For tx sig verify (per-tx, not per-header), GPU bench is
~1M verifies/sec on Blackwell — the per-validator emit rate at
this hardware class is ~1M tx-hashes/sec.
factor k=20 and network bandwidth. At ~1 KiB per tx and 1M
tx/sec emit, body throughput is ~1 GiB/sec into the DHT
per-validator — within Blackwell-class NIC bandwidth (200 Gbps
≈ 25 GiB/sec).
verify rate on the local GPU and Block-STM speculator throughput.
Both scale with GPU class.
Projected throughput at billion-user scale: 25M+ tx/sec on a
256-validator chain with PQ-fast mode (LP-217) at depth-3 pipelining
(LP-205). Numbers above are projected from the LP-203 measured
bench rows extrapolated to the DAG emission model; measured
DAG-mempool throughput will land in the LP-203 addendum once the
implementation tag ships.
DAG header production runs in parallel with the current block's
Sign stage at the leader. The Select stage of LP-205's pipelined
Quasar reads from the DAG, not from a local mempool — but the
Select work is unchanged in shape, only its input source changes.
LP-202 unwind primitives apply to the Select stage exactly as they
do in the single-mempool case.
Tx sig verify runs on Blackwell as DAG headers arrive. Each header
carries up to 1000 tx hashes; the bodies for those hashes are
batched into a single CUDA dispatch on the verify path. Verify
throughput tracks the LP-203 bench numbers directly — at ~1M
verifies/sec on Blackwell, the GPU is not the bottleneck up to a
per-validator emit rate of ~1M tx/sec.
The DAG ordering (from LP-209 Mysticeti consuming this LP's DAG) is
committed at the LP-217 mode the consumer chain has configured —
PQ-fast for typical workloads, PQ-strict for custody/treasury,
PQ-heavy for archival anchoring. The DAG itself is mode-agnostic;
only the cert that finalizes the ordering of DAG vertices into
blocks carries the mode posture.
The two LPs compose orthogonally:
A pipelined-Quasar chain using a DAG mempool runs:
1. Every validator emits DAGHeader every K=50ms (this LP).
2. LP-209 Mysticeti totally orders the DAG; the ordered output is the
stream of txs.
3. The current leader's Select stage (LP-205) consumes from the
Mysticeti output instead of a local mempool.
4. Build (LP-205) constructs a block frame referencing the consumed
tx-set.
5. Sign (LP-205) produces a QuasarCert at the LP-217 mode.
The leader's Select stage is the only LP-205 stage that changes; the
other two stages are unchanged. The aggregate throughput becomes
sum(validator_emit_rate) rather than min(validator_mempool_rate).
activates: 2025-12-25T16:20:00-08:00
activates-unix: 1766708400
DAG mempool is opt-in at activation. A chain enables it via genesis
config:
dag_mempool:
enabled: true
emission_ms: 50
active_window_headers: 320
schema_ids:
header: 0xE0
body: 0xE1
Chains that do not enable DAG mempool use the single-mempool-per-
validator model unchanged. LP-205 (pipelined Quasar) works against
either input source.
The DAG mempool lives under
~/work/lux/consensus/mempool/dag/:
header.go — DAGHeader struct + ZAP encode/decode against schema0xE0; sig-verify against the validator's BLS pubkey.
body.go — DAGBody struct + ZAP encode/decode against schema 0xE1;body-hash computation.
emitter.go — Per-validator emission loop; signs a new header every emission_ms; broadcasts via LP-201 unidirectional QUIC
stream; stores body in DHT via LP-201 layer C.
receiver.go — Per-peer header receive loop; validates sig,parent-pointers, body-hash; inserts into local DAG view.
dag.go — In-memory DAG view: map[hash]Header + freshest-pointerper validator; parent-pointer resolution.
gc.go — Active-window GC; cold-archive move; DHT-pinningrelease.
dht.go — Body fetch via LP-201 DHTFindValue (0xDE); body storevia LP-201 DHTStore (0xDF) with content-pinning hint.
Integration points:
~/work/lux/consensus/protocol/quasar/select.go (LP-205 Select stage) — new DAGSource implementing the MempoolSource
interface; chain config picks LocalMempoolSource or DAGSource.
The DAG mempool composes with the existing LP-201 DHT and LP-205
Select stage; no per-stage code rewrites required.
A conformant implementation MUST:
1. Emit headers at the configured emission_ms rate; gaps in seq
are allowed (e.g., transient partition) but seq is strictly
monotonic per validator.
2. Sign each header with the validator's BLS key over the canonical
pre-image (`version || validator || epoch || seq || timestamp ||
parent_hashes || body_hash`).
3. Reject incoming headers with invalid sig, mismatched body-hash,
non-monotonic seq, or out-of-epoch sender.
4. Resolve parent-pointers against the local DAG view; treat
unreachable parents as no-op (do not block on resolution).
5. Reproduce the GC contract: active-window headers in memory; aged
headers moved to cold archive; bodies pinned in DHT while header
is active.
6. Catch up from peers on restart by re-fetching the active window
and stitching the DAG.
7. Emit per-stage Prometheus metrics: dag_header_emit_total,
dag_header_receive_total, dag_body_fetch_total,
dag_active_window_size, dag_gc_cycles_total.
Conformance test vectors at
~/work/lux/consensus/test/vectors/dag/dag-mempool.jsonl.
0xE0/0xE1.
construction but used by the consumer LP-209 ordering.
the LP-209 ordering of this DAG.
this LP relies on for content-addressing.
LP-201's 0xD0..0xDF reservation; Kademlia DHT is used for body
storage.
registered there. This LP's §"Wire format" is informative; LP-300
is normative for ID assignment.
apply to the Select stage that consumes from this DAG.
path.
via the Select stage.
totally orders this DAG.
committed at the chosen mode.
sub-second finality; this LP's DAG is its input.
the DAG's tx stream ahead of LP-209 ordering; commit on ordering
arrival.
shards exchange parent-pointers via cross-shard schema IDs
(LP-211 owns 0xE2..0xE7).
Copyright and related rights waived via CC0.