Lux Proposals
← All proposals
LP-0205Draft

Status

Draft. No backwards compatibility. No flag day.

Activated at the genesis of the new final Lux network: **2025-12-25

16:20 Pacific (unix 1766708400)**. The pre-Quasar Edition Lux

network (2020–2025) is round-blocking by construction and is a separate

network out of scope for this LP.

Abstract

Quasar today is round-blocking: round N must finalize before round N+1

starts. The proposer cannot begin selecting candidate transactions for

N+1 until N's QuasarCert has aggregated. For a PQ-heavy-mode chain

(LP-217; was "Polaris" in LP-017) with cert wall-clock ~400 ms, this

caps single-leader throughput at

2.5 blocks/sec regardless of how fast the underlying stages are.

Pipelined Quasar removes the floor. Three blocks are in flight

simultaneously: block N is in the Sign stage while N+1 is in the Build

stage while N+2 is in the Select stage. Each stage runs against an

orthogonal subsystem — Select touches mempool, Build touches ZAP

encoder + Block-STM speculator, Sign touches the cert legs (LP-217

mode) — so they execute in parallel without cross-stage locks. Per-block

wall-clock collapses from sum(stages) to max(stages).

The depth bound is set by network RTT and consensus-round

independence: depth N requires N consecutive different leaders so that

a single byzantine proposer cannot stall the pipeline by holding all

N blocks. The reference implementation pipelines at depth 3, which

matches the LP-202 block-production stage list (Select / Build / Sign)

and saturates against a single-leader-per-block round-robin rotation.

The wire format is unchanged. The QuasarCert struct (LP-182) is the

same shape per block; the cert-mode (LP-217) is the same per chain;

the round-digest binding (LP-077) is the same per height. Pipelined

Quasar is a protocol-orchestration change: it specifies WHEN a stage

runs relative to other in-flight blocks, not what it produces.

This LP composes with the DAG mempool (LP-208 + LP-209) without

conflict: pipelined Quasar specifies the leader's pipeline; DAG

mempool specifies where the txs come from. A single-leader pipelined

Quasar consuming from a leaderless DAG mempool is the operational

shape of the billion-user sub-millisecond path together with LP-202

(atomic unwind), LP-203 (GPU verify), and LP-217 (PQ-off mode for the

HFT critical path).

Motivation

Three concrete bottlenecks of round-blocking Quasar at billion-user

scale:

1. Per-block latency dominates throughput. A PQ-heavy-mode chain

(LP-217; was "Polaris" in LP-017) at ~400 ms cert wall-clock is

throughput-capped at 2.5 blocks/sec. If

each block holds 1000 txs, that is 2,500 tps single-leader

regardless of how fast the matching engine, block-STM executor, or

GPU verify is. The bottleneck is not capacity — it is the serial

per-block wait on the cert.

2. Stage utilization is poor. During the Sign stage of block N,

the mempool, the Block-STM speculator, and the ZAP encoder all sit

idle. The cert-leg threshold signing is not bottlenecked on those

subsystems. Round-blocking wastes ~80% of single-validator wall

clock.

3. Leader rotation cost is paid per block. A new leader takes the

ZAP buffer's first byte for its block at round-start. Round-blocking

means the leader switch happens N times for N blocks; pipelining

means the leader-rotation happens at N consecutive heights in

parallel and the switch cost is amortized over the in-flight depth.

Pipelining is the standard HotStuff/PBFT throughput pattern (Buterin,

Castro-Liskov, the HotStuff-1B paper) applied to Quasar's specific

4-leg cert composition. The novelty here is not the pipeline

technique — it is the composition with LP-202's atomic-unwind

primitives, LP-203's GPU sign pipeline, and LP-217's mode-tier

degradation, so that pipeline failures unwind cleanly without manual

coordination.

Pipeline stages

Three stages, one stage per in-flight block. Each stage is defined by

its trigger, its work, the subsystem it touches, and its independence

properties relative to the other two stages.

Stage 1: Select

| Property | Value |
|---|---|
| Trigger | Round N+2 leader-rotation slot opens |
| Work | Mempool sample → tx-ordering → Block-STM speculative execute |
| Subsystem | Mempool (ranked heap) + Block-STM speculator |
| Output | Ordered candidate tx-set pinned to ZAP buffer offsets |
| Independence | Touches mempool's ranked heap; Build stage of block N+1 has already drained its tx-set, so heap pointer-shuffles are non-overlapping |
| Failure | Mempool empty → propose empty block; speculator timeout → drop speculative branch, propose with non-speculated set |

Stage 2: Build

| Property | Value |
|---|---|
| Trigger | Select stage of the same block produces ordered tx-set |
| Work | Construct block header ZAP frame referencing selected tx buffers |
| Subsystem | ZAP encoder (LP-022) + round-digest binder (LP-077) |
| Output | Block frame: parent-hash + tx-set offset list + leader sig over round-digest |
| Independence | Touches the ZAP encoder, which is per-block; Sign stage of N is operating on a different block's transcript |
| Failure | ZAP encode error → invalid block; fail back to Select with a smaller candidate set |

Stage 3: Sign

| Property | Value |
|---|---|
| Trigger | Build stage of the same block produces block frame |
| Work | Leader broadcasts block-hash; cert-leg threshold ceremony begins per LP-217 mode |
| Subsystem | Consensus engine (BLS + Pulsar + Corona + optionally Magnetar) |
| Output | QuasarCert at the operator's configured mode |
| Independence | Touches the cert legs, which are per-block aggregations against the block's round-digest; the digest is a pure function of the block bytes (LP-077), so two different blocks' aggregations are content-independent |
| Failure | Insufficient leg sigs → cert at LP-202 lower tier; surviving legs commit the block at the next-lower mode (LP-217) |

The stage list maps 1:1 onto LP-202's "Block production pipeline"

(3-stage). LP-202 specifies the per-block pipeline; this LP specifies

the multi-block pipeline composed of multiple per-block pipelines

running concurrently.

Pipeline depth

Pipeline depth is the number of consecutive blocks in flight at any

instant. Depth = 3 is the reference.

| Depth | Stage occupancy | Bound |
|---|---|---|
| 1 | Sign(N) only | Round-blocking baseline; throughput = 1 / cert_wallclock |
| 2 | Sign(N), Build(N+1) | Throughput = 1 / max(build, cert_wallclock) |
| 3 | Sign(N), Build(N+1), Select(N+2) | Throughput = 1 / max(select, build, cert_wallclock) |
| 4+ | Sign(N), Build(N+1), Select(N+2), Select(N+3) | Bounded by leader-rotation independence; see "Leader rotation" below |

Reference cap = 3. Depth 3 saturates the per-block stage list. A

fourth in-flight block would either repeat a stage (two Select stages

operating on different blocks) or operate ahead of leader rotation.

The first case is bounded by stage independence; the second is

bounded by the leader-rotation contract.

Depth limit. The hard depth bound is:


depth_max = min(
  cert_round_independence,       // how many heights have non-overlapping cert deps
  leader_rotation_independence,  // how many heights have different leaders
  mempool_speculation_width      // how many speculative branches the mempool can pin
)

For Lux validator sets with round-robin leader rotation, each of these

is at least N (validator count). For VRF-weighted random rotation, the

minimum is min(N, expected_consecutive_distinct_leaders). Depth 3 is

safe against both rotation schemes for any N ≥ 4.

Leader rotation

Pipelined Quasar requires that every in-flight block has a different

leader. Otherwise a single byzantine leader can stall the pipeline by

withholding its block at multiple in-flight heights.

Round-robin rotation. Leader at height H is

validators[H mod N]. Pipeline depth 3 requires

validators[H], validators[H+1], validators[H+2] are distinct, which

holds for any N ≥ 3.

VRF-weighted random rotation. Leader at height H is sampled from

the stake-weighted validator set via VRF over the previous block's

randomness beacon. Pipeline depth 3 requires three consecutive distinct

samples; on stake-weighted draws this holds with probability

1 − (stake_top / total_stake)^2 per step. For a sane stake

distribution (top validator ≤ 1/3 of total) this is > 89% per height.

When the rotation produces a non-distinct leader at depth, the

pipeline shortens: the second instance of that leader's block waits

for its predecessor to complete Sign before entering its own Build

stage. Shortened-depth pipelining is throughput-degraded but

liveness-preserving.

Leader failure. If the in-flight leader fails to propose

(crash, partition, byzantine withhold), the next height's leader takes

over via the standard Quasar timeout path (LP-017 §Round-timeout —

historical; LP-217 inherits this timeout contract unchanged). The

pipeline shortens by 1 for that span. The atomic-unwind primitive

(LP-202 round.Abandon()) discards the in-flight Select/Build state

for the failed height; the cert-leg aggregators GC their partial

state. No cross-height coordination needed.

Failure modes

| Failure | Trigger | Pipeline effect | Recovery |
|---|---|---|---|
| Leader fails to propose at height H | Round-timeout fires | Pipeline shortens by 1; next leader at H+1 unaffected; in-flight blocks at H+2/H+3 still progress in their stages | Standard Quasar timeout; next leader takes over at H |
| Leader byzantine (proposes invalid block) | Verify fails on at least one validator's parallel-by-tx check | Block at H rejected; cert-leg aggregation never starts; LP-202 round.Abandon() discards build state | Quorum override at LP-202 lower tier; next leader at H+1 proposes |
| Cert-leg threshold not met | Insufficient sigs by cert_timeout_ms | Cert published at LP-217 lower tier; block finalizes at the highest mode whose required-leg set is satisfied | LP-202 tier-degradation contract — no pipeline action needed |
| Network partition splits validators | NACK from > 1/3 stake | All in-flight blocks freeze at their stage; cert legs cannot aggregate to threshold | Partition heals; in-flight blocks resume from their stage's last commit point |
| Speculator mispredict (Block-STM conflict) | Conflict-detector returns abort | Speculative branch discarded; Select stage produces non-speculated tx-set | Stage retry within the same height; no cross-height effect |
| Mempool empty | Select samples 0 candidates | Build produces empty block (header only) | No pipeline effect; empty block finalizes normally; chain liveness preserved |
| QUIC stream reset on cert broadcast | Peer disconnects mid-broadcast | Affected leg's partial aggregation re-issued against alternate peer | LP-202 §"QUIC stream isolation invariants" — other in-flight blocks unaffected |
| Slow follower | Verify CPU-saturated | Follower falls behind by 1+ block; catches up via bootstrap stream (LP-202 §"Bootstrap pipeline") | Follower-local; pipeline at the leader is unaffected |

The failure-mode contract: any single-block failure unwinds via LP-202

primitives and does not propagate to other in-flight blocks. A

partition-class failure freezes the whole pipeline at its current

stage configuration; healing restarts from the frozen position

without re-doing the completed stages.

Wall-clock model

The throughput model for pipelined Quasar at depth D:


throughput = D / sum(stage_times)         (round-blocking baseline if D=1)
            = 1 / max(stage_times)         (steady-state at D = number_of_stages = 3)

Reference numbers from the LP-217 mode table (Blackwell, N=64

validators) crossed with the LP-202 pipeline-depth saturation:

| Mode | Sign wall-clock (final) | Select+Build est. | Throughput @ depth 3 |
|---|---|---|---|
| PQ-off | ~1 ms | ~3 ms | ~330 blocks/sec |
| PQ-fast | ~5 ms | ~3 ms | ~200 blocks/sec |
| PQ-strict | ~15 ms | ~3 ms | ~65 blocks/sec |
| PQ-heavy | ~80 ms | ~3 ms | ~12 blocks/sec |

Numbers above the PQ-strict row are GPU-saturated (LP-203 measured);

PQ-strict and PQ-heavy are projected based on LP-203's Pulsar +

Corona + Magnetar bench rows extrapolated to depth 3. The Select+Build

~3 ms estimate is the Block-STM ordering pass (LP-010 measured at

~1.2 ms for a 1000-tx block on M4 Max) plus the ZAP encoder

(measured at ~150 µs per block of 1000 txs) plus the mempool sample

pass (~1 ms heap traversal).

At PQ-off mode with the projected ~1 ms cert wall-clock, pipelining

takes single-leader throughput from 1000 blocks/sec round-blocking to

~330 blocks/sec at depth 3 — wait, that is slower. Re-reading: at

depth 3 the throughput is 1 / max(stages) = 1 / 3 ms ≈ 333 blocks/sec.

Round-blocking at PQ-off would be 1 / (1 ms + 3 ms) = 250 blocks/sec.

Pipelining wins by ~33% in this mode; the win scales with the ratio of

cert-time to non-cert-time. At PQ-heavy mode with 80 ms cert and 3 ms

prep, depth-3 pipelining yields 1 / 80 ms ≈ 12 blocks/sec versus

round-blocking 1 / 83 ms ≈ 12 blocks/sec — essentially the same,

because cert dominates.

Sweet spot. Pipelining wins are largest when stage times are

balanced. The LP-217 PQ-fast mode is the sweet spot: cert

wall-clock ~5 ms versus prep ~3 ms gives depth-3 throughput

1 / 5 ms = 200 blocks/sec vs round-blocking `1 / 8 ms = 125

blocks/sec` — a 60% win.

Beyond depth 3. Depth 4+ does not gain throughput because there

are only 3 stages. Additional depth would require either pipelining

across heights at the cert level (out of scope; would be a HotStuff-2B

4-phase pipeline) or restructuring the per-block stage list. Neither

is in scope for this LP.

Projected. All numbers in this section are projected from the LP-202

+ LP-203 + LP-217 measured rows. Measured pipelined-Quasar throughput

will land in the LP-203 bench addendum once the pipelined-Quasar

implementation tag ships.

Composition with LP-202

LP-202 specifies the per-block atomic-unwind primitives. Pipelined

Quasar uses every one of them:

| In-flight failure | LP-202 primitive used |
|---|---|
| Select-stage speculation abort | txn.Discard() on Block-STM transaction |
| Build-stage encode error | stream.CancelWrite(STREAM_PROTOCOL_VIOLATION) on the partial block frame |
| Sign-stage cert-leg miss | Cert tier stays at previous profile per LP-217 mode → cert still valid at lower tier |
| Cross-stage interaction | None — stages touch orthogonal state; no cross-stage unwind needed |

The composition is the load-bearing claim. Pipelined Quasar is

not a new consensus protocol — it is LP-202's per-block pipeline

contract instantiated at N=3 with leader rotation interleaving the

heights. The unwind primitives at each stage are unchanged. The

QuasarCert struct is unchanged. The relying-party policy (LP-217 mode)

is unchanged. Only the orchestration — which stage runs against which

block at which wall-clock — is new.

Wire/cert format unchanged

This LP introduces no new wire types. The block frame is the LP-186

block-VM frame; the cert is the LP-182 QuasarCert struct; the

round-digest is LP-077. Schema IDs 0xD0..0xDF (LP-201) and the

schema IDs in LP-200 are unaffected.

The pipeline-depth knob is a runtime config setting:


quasar:
  pipeline_depth: 3
  # default 3; minimum 1 (round-blocking); maximum bounded by
  # leader-rotation-independence (see "Leader rotation" above)

The knob is per-chain at genesis and is upgradable via the chain's

governance path (analog to LP-217 mode upgrade). Validators in a

chain must agree on the pipeline depth; mixed-depth validators within

a single chain would race on stage entry. The depth field is

broadcast in the LP-022 Handshake message under a new optional

field (no schema bump — backwards-tolerant field at the end of the

handshake payload).

Activation marker


activates: 2025-12-25T16:20:00-08:00
activates-unix: 1766708400

Pipelined Quasar is the default at activation. Round-blocking

(pipeline_depth: 1) remains a valid config for chains that

explicitly disable pipelining — typically PQ-heavy chains where the

cert dominates the wall-clock and pipelining gains are negligible.

Reference Implementation

The pipeline orchestrator lives under

~/work/lux/consensus/protocol/quasar/pipeline/:

Stage implementations are the existing Quasar code:

The orchestrator composes the existing implementations; no per-stage

code is rewritten for this LP.

Test Cases

A conformant implementation MUST:

1. Accept quasar.pipeline_depth ∈ {1, 2, 3} in config; reject

higher values that exceed the chain's leader-rotation independence.

2. At depth 3, produce 3 valid blocks per max(stage_times) wall-clock

in steady state.

3. On any in-flight stage failure, unwind via the LP-202 primitive

listed in §"Composition with LP-202"; other in-flight blocks

continue.

4. On leader-rotation collision (non-distinct leader at depth),

shorten the pipeline transparently without violating liveness.

5. Produce a QuasarCert at every height whose mode satisfies

cert_mode configured in LP-217.

Conformance test vectors land at

~/work/lux/consensus/test/vectors/pipeline/quasar-pipelined.jsonl.

Cross-references

Future Work

Copyright

Copyright and related rights waived via CC0.