When the optimizer harvests a loss on a name, that name is force-zero for the next 30 days under §1091. Selling the position without buying anything else would blow through the TE budget — half the harvest's value is lost to the resulting drift. The replacement-security selection routine picks a substitute that's correlated enough to preserve tracking error but not "substantially identical" under the wash-sale rule. This post is about how that selection is made, what cluster heuristics drive it, and what audit trail it produces.
What 'substantially identical' rules out
§1091 disallows a loss on a sale if a substantially identical security is acquired in the 30 days before or after. The IRS has been deliberately fuzzy on what "substantially identical" means for ETFs and SMAs, but practitioner consensus is:
| Pair | Treatment |
|---|---|
| Same ticker | Identical |
| Different share class, same issuer (e.g. GOOG/GOOGL) | Identical |
| ADR vs underlying common | Identical |
| Two ETFs tracking the same index, different sponsors | Not identical (industry consensus) |
| ETF vs sector SMA replicating the index | Not identical |
| Two stocks in the same sector with high correlation | Not identical |
Σ-distance ranked, sector-filtered, exposure-checked
The replacement-security routine runs as a side-effect of a loss-realising sell. The algorithm:
def pick_replacement(sold_ticker, sold_amount, snapshot, cap_per_name=0.04):
# 1. Candidate pool: same sector, not in wash-lock, weight headroom.
candidates = [
t for t in snapshot.universe
if t != sold_ticker
and snapshot.sector[t] == snapshot.sector[sold_ticker]
and t not in snapshot.wash_lock
and snapshot.current_weight[t] + delta_weight(sold_amount) ≤ cap_per_name
]
# 2. Score by Σ-distance to the sold name.
scored = sorted(
candidates,
key=lambda t: sigma_distance(snapshot.sigma, sold_ticker, t),
)
# 3. Verify the top candidate's factor loading delta vs the sold name.
for cand in scored[:5]:
if max_factor_delta(snapshot.B, sold_ticker, cand) < 0.10:
return cand
# 4. Fall back to no replacement; let the next-day solve redistribute.
return NoneThree observations on the algorithm:
- Sector first, Σ second. Sector membership is a coarse but reliable starting filter. A finance name is rarely a good substitute for a tech name even if Σ disagrees, because the next macro shock will reveal the mismatch. We use GICS sectors at the 2-digit level.
- Σ-distance is forward-relevant. The metric is √((eᵢ − eⱼ)ᵀ Σ (eᵢ − eⱼ)) — the risk-model distance between the two names' "single-name baskets." Lower means the substitution costs less in TE.
- Factor delta is the sanity check. Σ-distance can be small for two names that happen to have opposite factor exposures. The factor-loading delta check catches this — if the candidate would shift the portfolio's aggregate factor profile by more than 0.10, we keep searching.
Σ-distance is necessary but not sufficient. Two names can have small covariance distance and opposite factor exposures — substituting one for the other looks fine in tracking-error and surprises you in the next factor regime.
Per-trade replacement reason
Every replacement trade is logged with the substitution relationship: sold ticker, replaced by ticker, Σ-distance score, factor-delta score, sector match. The Run record carries the full list. A compliance review can re-derive the optimizer's choice from the snapshot — same Σ, same B, same sector membership — and verify the replacement was an objective best.
| Field | Example value |
|---|---|
| trade.harvest_sell_ticker | AAPL |
| trade.replacement_buy_ticker | MSFT |
| replacement.sector_match | GICS 45 · IT |
| replacement.sigma_distance | 0.041 |
| replacement.factor_delta_max | 0.062 |
| replacement.lock_until | 2026-06-12 |
The honest fallback
If no candidate satisfies both the Σ-distance and factor-delta thresholds, the routine returns None and the optimizer's next-day solve redistributes the freed capital across the rest of the universe within the TE budget. This produces a small amount of TE drift (typically < 5 bp per harvest) but is the right behaviour: a bad substitution is worse than no substitution.
Edge cases the algorithm doesn't fully resolve
- Sector concentration in small universes. On a US large-cap (100) universe with a small-cap-only ESG screen, some sectors have only 2–3 names. If both the sold ticker and its only viable substitute are in the wash-lock list, no replacement is possible — the routine falls back to no-replacement.
- Two same-sector ETFs from one sponsor. Industry consensus says different-sponsor ETFs aren't identical; same-sponsor ETFs of the same index probably are. The platform doesn't auto-resolve this — accounts subscribed to ETF-substitution explicitly maintain a sponsor-aware exclusion list.
- Cascading harvests. If the substitute itself drops below basis on day t+5, the optimizer wants to harvest it too. This is fine, except the next substitution must skip both names. We've capped the chain at three hops to prevent pathological loops.
For the wash-sale statute itself, see the United States jurisdiction post. For the risk-model construction that feeds Σ, see risk-model construction.
- IRC §1091 — Loss from wash sales of stock or securities. The 'substantially identical' test is a question of fact, not law; practitioners follow industry consensus in the absence of bright-line guidance.
- GICS sectors — Global Industry Classification Standard. Two-digit GICS provides 11 sectors that map cleanly onto the platform's universe.
Methodology note · the substitute is the load-bearing trade in a harvest. Don't let it spoil what you saved.