Engineering · system

Reproducibility by snapshot: replaying any solve, any day

Every Run row carries the strategy params, the covariance matrix, the constraint set, and the lot state at solve-time. The shape of the snapshot, the storage cost we accept, and the audit story that falls out for free.

May 20269 min read

Every Run record on the platform stores enough state to replay the solve a year later — same Σ matrix, same constraint set, same lot history, same wash-sale lock vector, same solver, same seed. The replay reproduces the same trades exactly, modulo floating-point determinism. This isn't a nice-to-have; it's the property that makes audit, regression-testing, and cross-solver comparison cheap. This post is about what gets stored, what we deliberately don't store, and why the storage cost is a bargain.

Storage / Run
≈ 80 — 800 KB
Replay cost
1 solve
Format
JSONB + binary blob
Retention
Indefinite (small fleet)
The shape of a snapshot

Five fields and one binary

A Run row carries the standard metadata (account_id, run_seq, as_of_date, solver, status, timing). The reproducibility-relevant fields are these:

Run · reproducibility fields
FieldTypeWhat it stores
snapshotJSONBUniverse, lot history, wash-sale lock vector, marginal rates, today's prices, borrow curve
sigma_blobBinary (LZ4-compressed)Covariance matrix Σ as a flat float32 array; size N × N
loadings_blobBinary (LZ4-compressed)Factor loadings matrix B; size N × 6
params_snapshotJSONBStrategy params after override merge; the full set the optimizer actually used
resultJSONBTarget weights, realised P&L, tax accruals, solver status, iter count
The two binary blobs are the size-driving fields. For an N=100 universe the Σ blob is ≈ 40 KB compressed; for N=500 it's ≈ 1 MB; for N=1000 it's ≈ 4 MB. The fleet stays comfortably under 100 GB of Run data per year per 100 active accounts.
The replay function

One method, deterministic if the inputs are

Run.replay()
def replay(run_id: int, *, verify: bool = True) -> ReplayResult:
    run = repo.get_run(run_id)

    snap = Snapshot.deserialise(run.snapshot, sigma_blob=run.sigma_blob,
                                 loadings_blob=run.loadings_blob)
    params = run.params_snapshot

    # Same solver back-end the original used; same tolerance.
    solver = solver_factory(run.solver, tolerance=run.solver_tolerance)
    result = solver.solve(snap, params)

    if verify:
        assert_weights_close(result.target, run.result["target"], atol=1e-8)
        assert_trades_close(result.trades, run.result["trades"])

    return ReplayResult(snap=snap, params=params, result=result)
Source: packages/portfolios/taxview_portfolios/runs/replay.py — annotated. The function reconstructs the snapshot, re-runs the solver, and verifies the result matches what was originally persisted.
What we deliberately don't store

And why the omissions are correct

  • The full price history. The snapshot stores today's prices for the universe. Σ is built off the trailing 504-day return panel, which we recompute from the Securities table on demand. Storing the panel on every Run would be wasteful and (worse) duplicate the source-of-truth.
  • The benchmark weights as numbers. The snapshot stores the benchmark_id and rebalance date; the weights are recomputed from the Benchmark table on replay. The Benchmark methodology archive is the source-of-truth for point-in-time membership.
  • The intermediate solver state. We store the solver name, tolerance, status, and iteration count — but not the dual variables or the intermediate iterates. They're recoverable from a replay if needed and would dominate the storage budget if persisted.
The audit trail story

Three questions the snapshot answers, every time

The audit-grade properties of the system fall out of the snapshot for free:

  • Why was this trade made? Re-solve from the snapshot; the trade ticket and the lot identification are reproduced exactly. The audit doesn't require re-deriving anything.
  • What constraints applied? params_snapshot is the full set the optimizer used — including any overrides the account had on that date.
  • What was the risk model that day? sigma_blob and loadings_blob are the actual matrices, not a reference to a model that may have been retrained.

The audit story is "click Replay; the system reproduces the decision." Not "open a notebook and try to remember what the risk model looked like in 2024."

Cross-solver comparison

How the cuOPT vs CVXPY benchmark uses replay

The cuOPT vs CVXPY benchmark is built on the replay machinery. We pulled 252 trading days of Runs from the synthetic benchmark fleet, replayed each one through both solvers, and compared the results trade-by-trade. Because the snapshot already pins the inputs, the comparison is apples-to-apples by construction. No re-solving with stale Σ; no off-by-one corporate-action mismatch; no surprise "the cuOPT trades look different because the universe was slightly different on that day."

Limitations

What the snapshot doesn't pin

  • Floating-point determinism across hardware. cuOPT on an L4 GPU and cuOPT on a different GPU model can diverge in the last few decimal places of the dual. The result.target differences are typically < 1e-10 and don't change trades, but a strict bit-equivalent replay across hardware is not guaranteed.
  • External feed corrections. If a corporate action correction comes in for a date in the past, the prices in the Securities table change. A replay then catches the drift — the trades replay differently, by design. The original Run's persisted trades remain authoritative for audit.
Notes & references
  1. Source: packages/portfolios/taxview_portfolios/db/models/run.py — Run model with sigma_blob and loadings_blob columns. Replay logic in replay.py.

Engineering note · reproducibility falls out of pinning the inputs, not of pinning the outputs. Save the questions, not the answers.

Related
  • Engineering · system

    The DailyRunner: orchestration, idempotency, and the kill switch

    How twenty accounts get rebalanced every morning before market open, why every solve is account-scoped and idempotent, what the per-account run_seq is for, and how a single boolean flag freezes the whole fleet.

  • Methodology · backtest

    Building a backtest you can defend

    Lookahead, survivorship, point-in-time benchmark constituents, holiday calendars, dividend timing, corporate actions. The hygiene checklist behind every backtest on this site, and why each item is on it.

  • Engine · benchmark

    cuOPT vs CVXPY: a per-strategy bake-off for tax-aware portfolio optimization

    Six strategies, two solvers, one question: where does GPU-accelerated mathematical programming earn its keep against a mature CPU convex framework — in solve-time, in solution quality, and in the after-tax dollar that lands in the account.