Reproducibility by snapshot: replaying any solve, any day

Every Run record on the platform stores enough state to replay the solve a year later — same Σ matrix, same constraint set, same lot history, same wash-sale lock vector, same solver, same seed. The replay reproduces the same trades exactly, modulo floating-point determinism. This isn't a nice-to-have; it's the property that makes audit, regression-testing, and cross-solver comparison cheap. This post is about what gets stored, what we deliberately don't store, and why the storage cost is a bargain.

Storage / Run

≈ 80 — 800 KB

Replay cost

1 solve

Format

JSONB + binary blob

Retention

Indefinite (small fleet)

The shape of a snapshot

Five fields and one binary

A Run row carries the standard metadata (account_id, run_seq, as_of_date, solver, status, timing). The reproducibility-relevant fields are these:

Run · reproducibility fields

Field	Type	What it stores
snapshot	JSONB	Universe, lot history, wash-sale lock vector, marginal rates, today's prices, borrow curve
sigma_blob	Binary (LZ4-compressed)	Covariance matrix Σ as a flat float32 array; size N × N
loadings_blob	Binary (LZ4-compressed)	Factor loadings matrix B; size N × 6
params_snapshot	JSONB	Strategy params after override merge; the full set the optimizer actually used
result	JSONB	Target weights, realised P&L, tax accruals, solver status, iter count

The two binary blobs are the size-driving fields. For an N=100 universe the Σ blob is ≈ 40 KB compressed; for N=500 it's ≈ 1 MB; for N=1000 it's ≈ 4 MB. The fleet stays comfortably under 100 GB of Run data per year per 100 active accounts.

The replay function

One method, deterministic if the inputs are

Run.replay()

def replay(run_id: int, *, verify: bool = True) -> ReplayResult:
    run = repo.get_run(run_id)

    snap = Snapshot.deserialise(run.snapshot, sigma_blob=run.sigma_blob,
                                 loadings_blob=run.loadings_blob)
    params = run.params_snapshot

    # Same solver back-end the original used; same tolerance.
    solver = solver_factory(run.solver, tolerance=run.solver_tolerance)
    result = solver.solve(snap, params)

    if verify:
        assert_weights_close(result.target, run.result["target"], atol=1e-8)
        assert_trades_close(result.trades, run.result["trades"])

    return ReplayResult(snap=snap, params=params, result=result)

Source: packages/portfolios/taxview_portfolios/runs/replay.py — annotated. The function reconstructs the snapshot, re-runs the solver, and verifies the result matches what was originally persisted.

What we deliberately don't store

And why the omissions are correct

The full price history. The snapshot stores today's prices for the universe. Σ is built off the trailing 504-day return panel, which we recompute from the Securities table on demand. Storing the panel on every Run would be wasteful and (worse) duplicate the source-of-truth.
The benchmark weights as numbers. The snapshot stores the benchmark_id and rebalance date; the weights are recomputed from the Benchmark table on replay. The Benchmark methodology archive is the source-of-truth for point-in-time membership.
The intermediate solver state. We store the solver name, tolerance, status, and iteration count — but not the dual variables or the intermediate iterates. They're recoverable from a replay if needed and would dominate the storage budget if persisted.

The audit trail story

Three questions the snapshot answers, every time

The audit-grade properties of the system fall out of the snapshot for free:

Why was this trade made? Re-solve from the snapshot; the trade ticket and the lot identification are reproduced exactly. The audit doesn't require re-deriving anything.
What constraints applied? params_snapshot is the full set the optimizer used — including any overrides the account had on that date.
What was the risk model that day? sigma_blob and loadings_blob are the actual matrices, not a reference to a model that may have been retrained.

The audit story is "click Replay; the system reproduces the decision." Not "open a notebook and try to remember what the risk model looked like in 2024."

Cross-solver comparison

How the cuOPT vs CVXPY benchmark uses replay

The cuOPT vs CVXPY benchmark is built on the replay machinery. We pulled 252 trading days of Runs from the synthetic benchmark fleet, replayed each one through both solvers, and compared the results trade-by-trade. Because the snapshot already pins the inputs, the comparison is apples-to-apples by construction. No re-solving with stale Σ; no off-by-one corporate-action mismatch; no surprise "the cuOPT trades look different because the universe was slightly different on that day."

Limitations

What the snapshot doesn't pin

Floating-point determinism across hardware. cuOPT on an L4 GPU and cuOPT on a different GPU model can diverge in the last few decimal places of the dual. The result.target differences are typically < 1e-10 and don't change trades, but a strict bit-equivalent replay across hardware is not guaranteed.
External feed corrections. If a corporate action correction comes in for a date in the past, the prices in the Securities table change. A replay then catches the drift — the trades replay differently, by design. The original Run's persisted trades remain authoritative for audit.

Notes & references

Source: packages/portfolios/taxview_portfolios/db/models/run.py — Run model with sigma_blob and loadings_blob columns. Replay logic in replay.py.

Engineering note · reproducibility falls out of pinning the inputs, not of pinning the outputs. Save the questions, not the answers.