Engineering · system

The DailyRunner: orchestration, idempotency, and the kill switch

How twenty accounts get rebalanced every morning before market open, why every solve is account-scoped and idempotent, what the per-account run_seq is for, and how a single boolean flag freezes the whole fleet.

May 202611 min read

Twenty accounts get rebalanced every morning before market open. Each one runs a constrained convex optimization with its own constraint set, its own lot history, and its own wash-sale lock vector. The DailyRunner is the orchestration layer between the API surface and the optimizer — the thing that takes "rebalance the fleet" as a verb and turns it into a sequence of idempotent, account-scoped solves with audit trail. This post is about how it's wired.

Module
packages/runner
Entry point
taxview-runner run-daily
Solves per account / day
1
Kill switch
1 boolean
What it does in one sentence

For each account in the fleet, build a snapshot, solve, persist

The pseudocode is roughly twenty lines. The interesting parts are what it doesn't have to do — the runner is deliberately thin because the strategy module already speaks snapshot-in/trades-out.

DailyRunner.run_account
def run_account(self, account: Account, as_of: date) -> Run:
    if self.flag_frozen():
        return self._noop_run(account, as_of, reason="kill-switch")

    snap = self._build_snapshot(account, as_of)        # cov, lots, prices,
                                                        # wash-sale lock vec,
                                                        # marginal rates
    params = self._merge_overrides(snap, account.tags)

    result = self.solver.solve(snap, params)            # cuOPT or CLARABEL
    trades = self._lot_identify(result.target, snap)    # HIFO/ACB/FIFO

    run_seq = self._next_account_seq(account.id)        # per-account
    return self.repo.persist_run(
        account_id=account.id,
        run_seq=run_seq,
        snapshot=snap,
        params_snapshot=params,
        result=result,
        trades=trades,
        as_of=as_of,
    )
Source: packages/runner/taxview_runner/daily.py — annotated. Real source has more error handling and structured logging; the shape is what's shown.
Three properties the runner enforces

Idempotency, account-scoped sequencing, kill-switch

Runner invariants
PropertyHow it's enforcedWhy it matters
Idempotent per (account, as_of)Unique constraint on (account_id, as_of_date) in PostgresRe-running the morning solve from a cron retry doesn't double-trade
Account-scoped run_seqnext_seq = max(run_seq for account_id) + 1, inside the same txURLs and UI use RUN-001, RUN-002 per account — not the global PK
Inputs frozen on Run recordsnapshot, params_snapshot, solver, solver_status persistedReplays are deterministic; audit doesn't require re-deriving
Single kill switchRUNNER_FLAG_FROZEN env var; no-ops every account in flightEmergency freeze if a data feed is bad — minutes, not deploys
Per-account run_seq vs global run_id

Why we keep two

The autoincrement primary key on the Run table jumps around globally — for twenty accounts that all run at 5am, the run_id values for Account 1 might be 10001, 10021, 10041 across three days. That's surprising in URLs and confusing in UI. So every Run also carries a per-account run_seq starting at 1 for each account. URLs reference the seq; the database joins use the PK.

The seq is allocated atomically inside the same transaction that inserts the Run, with a unique constraint on (account_id, run_seq). If two cron retries race, one wins, the other gets a constraint violation and bails — exactly what idempotency requires.

The user-visible run number is per-account because that's how users see the world: "show me Run 47 of Account A," not "show me global Run 102,419."

The kill switch

Why it's a boolean, not a queue drain

When something is wrong with the data feed — yfinance returning stale prices, the risk model failing to reload, a borrow curve that didn't update — the right answer is usually "freeze everything, page someone, fix the underlying data, re-run." The flag is an environment variable read at the start of every account solve. Setting it doesn't roll back in-flight work, but no new accounts will solve until it's flipped back. The existing day's trades stay in the persisted state; the next day picks up from there.

Snapshot construction

What goes into the per-account input bundle

The snapshot is the deterministic input to the solve — every non-strategy variable the optimizer sees. It's persisted with the Run so a replay reproduces the same trades.

Snapshot contents
FieldSourceTypical size
UniverseBenchmark.constituents (point-in-time)100 — 1,000 names
Prices · todaySecurities.daily_pricesN rows
Σ matrixRiskModel cache, refreshed nightlyN × N floats
Factor loadings BSame risk modelN × 6 floats
Lot historyAccount.lots≈ 4 × N rows
Wash-sale lock vectorAccount.recent_trades, last 30 daysBoolean N-vector
Marginal ratesAccount.tax_settings{st, lt, niit}
Borrow curve (L/S only)Broker feed or defaultN rates
For more on what gets stored alongside each Run for replayability, see Reproducibility by snapshot.
Where the runner lives in the dependency graph

Up of optimizer, below of API

The runner depends on packages/portfolios (for the snapshot builders, account services, and run repository) and packages/optimizer (for the solver). It's depended on by services/api (the optimize router calls into DailyRunner.run_account for one-off rebalances) and by the standalone CLI (`taxview-runner run-daily`) that nightly cron triggers.

The intentional shape: every API request that triggers a rebalance goes through the same runner code path that the nightly cron uses. There's no second copy of the orchestration logic for "interactive" vs "scheduled" — both are the same function call. That's the property that makes audit trail trivial.

Failure modes the runner handles explicitly

Solver non-optimal, missing prices, stale risk model

  • Solver non-optimal. Both cuOPT and CLARABEL occasionally return a non-OPTIMAL status (numerical, near-degenerate). The runner's response is to retry once with CLARABEL at tighter tolerance, then fall through to "persist previous-day weights with status = NON_OPTIMAL" and emit an alert. The alert's at the account level so on-call doesn't get paged for a fleet-wide signal.
  • Missing prices. A name with no price for as_of (vendor downtime) is treated as held-at-prior-price for the solve, with a flag on the Run record. The next day's price refresh resolves the gap.
  • Stale risk model. The Σ matrix is loaded once at API startup (and refreshed nightly via a separate process). If the loaded matrix is older than 36 hours, the runner refuses to solve and emits a STALE_RISK alert. This is a deliberate choice: a stale Σ can produce silently wrong tracking-error bounds.
Notes & references
  1. Source code: packages/runner/taxview_runner/. The CLI entry point is taxview-runner; see the runner's pyproject.toml for the script reference.
  2. Related: Reproducibility by snapshot for how the Run record stores everything needed to replay a solve, and Daily, weekly, or monthly for the cadence-cost analysis that motivated the daily default.

Engineering note · the runner is intentionally thin. If something can be expressed as a snapshot field or a strategy parameter rather than runner logic, it should be.

Related