Methodology · backtest

Building a backtest you can defend

Lookahead, survivorship, point-in-time benchmark constituents, holiday calendars, dividend timing, corporate actions. The hygiene checklist behind every backtest on this site, and why each item is on it.

May 202612 min read

Backtest results are easy to generate and easy to fool yourself with. The same strategy can show 200 bp of after-tax alpha or 20 bp depending on whether the panel uses today's index constituents instead of the constituents on the rebalance date, whether dividends are added back in the price series or re-invested in the basis, and whether the optimizer reads the return that hasn't happened yet. This post is the hygiene checklist behind every backtest the platform publishes, and why each item is on it.

Items on the list
9
Most common error
Survivorship
Most expensive error
Lookahead in Σ
Hardest to catch
Dividend timing
The nine items

What we check before publishing any backtest number

The hygiene checklist
#CheckWhy
1Point-in-time index membershipToday's US large-cap (100) constituent set didn't exist in 2016. Use the index methodology archive.
2No dropped delisted namesNames that delisted between rebalance and today belong in the universe on the rebalance date.
3Σ excludes the rebalance-day returnBuilding Σ from prices through today, then solving with today, leaks tomorrow's distribution.
4Trades execute at next-bar pricesSolver runs on close-of-day data; trades execute the next morning. Don't fill at solve-day close.
5Holiday calendar from the exchangeHalf-day closes and country-specific holidays affect when a rebalance can actually clear.
6Corporate actions applied at effective dateA 3:1 split on day t means the basis splits on day t; the optimizer sees the split-adjusted price.
7Dividends re-invested in basisCash dividends adjust the lot's basis (or get re-invested per a stated policy), not the price series.
8Wash-sale lock vector built from real historyA backtest that ignores the wash-sale lookback overstates harvestable loss.
9Marginal rates applied per-period correctlyLong-term boundary calculation must use the lot's actual acquisition date, not a calendar-quarter approximation.
The deep ones

Three errors that look small and aren't

Survivorship. The most cited example: a 2010-start backtest of "the US large-cap (100) universe" using its 2026 constituents will overstate returns by 100–200 bp/yr because the names that fell out of the index in the intervening years are systematically the underperformers. Easy to spot when you look for it; easy to miss when you don't.

Lookahead in the risk model. Σ is an estimate of forward covariance from a backward window. If the window includes the day you're solving on, the optimizer sees a Σ informed by returns it shouldn't yet know. The usual symptom is suspiciously low realised TE in backtest — the optimizer is "hugging" the index in part because it can read the future. The fix is a one-day shift; we use the trailing 504-day panel ending the previous business day.

Dividend timing. An adjusted-price series rolls dividends into the price retroactively, so the holder always looks like they re-invested at the moment of ex-div. Real holders receive cash on the pay date, which is usually 2–4 weeks after ex-div. The optimizer's TE numbers change a few basis points if you model the lag correctly. We do.

A backtest that uses today's index constituents on a 2016 rebalance date is the canonical survivorship-bias mistake — and it's still the most common one in published material.

What 'illustrative' actually means on this site

The honest framing

Charts and tables on this site are flagged "illustrative" in oxblood mono when they use synthetic numbers — usually because the real benchmark suite hasn't landed yet, or because the production risk model isn't open. The hygiene checklist applies to both: synthetic numbers have to be plausibly consistent with what the real run would produce, not optimised for the post.

Where this checklist runs

Pre-publication, every figure

Internally, every chart that lands on a marketing page is produced by either (a) the production runner against pinned snapshots, or (b) a synthetic data script that's checked in and version-controlled. There's no notebook-on-someone's-laptop path. Numbers that change on republish change because the underlying snapshot or generator changed, and the diff is visible.

For the snapshot mechanism that pins production runs, see Reproducibility by snapshot. For the risk-model construction that feeds Σ, see Risk-model construction.

Notes & references
  1. Lo (2002). The Statistics of Sharpe Ratios. Financial Analysts Journal 58(4). The point-in-time discipline applies as much to performance attribution as to factor construction.
  2. Index provider methodology documents — the published index-methodology archives are the canonical source for point-in-time membership.

Methodology note · the test of an honest backtest is whether the result holds when you fix one bias. Most don't.

Related