Replacement-security selection under wash-sale: how the optimizer keeps the factor exposure

When the optimizer harvests a loss on a name, that name is force-zero for the next 30 days under §1091. Selling the position without buying anything else would blow through the TE budget — half the harvest's value is lost to the resulting drift. The replacement-security selection routine picks a substitute that's correlated enough to preserve tracking error but not "substantially identical" under the wash-sale rule. This post is about how that selection is made, what cluster heuristics drive it, and what audit trail it produces.

Trigger

Loss-realising sell

Lock duration

30 days (§1091)

Selection metric

Σ-distance + sector match

Audit trail

Per-trade replacement reason

The constraint

What 'substantially identical' rules out

§1091 disallows a loss on a sale if a substantially identical security is acquired in the 30 days before or after. The IRS has been deliberately fuzzy on what "substantially identical" means for ETFs and SMAs, but practitioner consensus is:

What the optimizer treats as substantially identical

Pair	Treatment
Same ticker	Identical
Different share class, same issuer (e.g. GOOG/GOOGL)	Identical
ADR vs underlying common	Identical
Two ETFs tracking the same index, different sponsors	Not identical (industry consensus)
ETF vs sector SMA replicating the index	Not identical
Two stocks in the same sector with high correlation	Not identical

The optimizer treats rows 1–3 as locked together for 30 days post-harvest. Rows 4–6 are eligible substitutes.

The selection algorithm

Σ-distance ranked, sector-filtered, exposure-checked

The replacement-security routine runs as a side-effect of a loss-realising sell. The algorithm:

Replacement-security selection

def pick_replacement(sold_ticker, sold_amount, snapshot, cap_per_name=0.04):
    # 1. Candidate pool: same sector, not in wash-lock, weight headroom.
    candidates = [
        t for t in snapshot.universe
        if t != sold_ticker
        and snapshot.sector[t] == snapshot.sector[sold_ticker]
        and t not in snapshot.wash_lock
        and snapshot.current_weight[t] + delta_weight(sold_amount) ≤ cap_per_name
    ]

    # 2. Score by Σ-distance to the sold name.
    scored = sorted(
        candidates,
        key=lambda t: sigma_distance(snapshot.sigma, sold_ticker, t),
    )

    # 3. Verify the top candidate's factor loading delta vs the sold name.
    for cand in scored[:5]:
        if max_factor_delta(snapshot.B, sold_ticker, cand) < 0.10:
            return cand

    # 4. Fall back to no replacement; let the next-day solve redistribute.
    return None

Source: packages/optimizer/taxview_optimizer/replacement.py — annotated. The selection is layered: a fast sector + Σ-distance prefilter, then an explicit factor-exposure check on the top candidates.

Three observations on the algorithm:

Sector first, Σ second. Sector membership is a coarse but reliable starting filter. A finance name is rarely a good substitute for a tech name even if Σ disagrees, because the next macro shock will reveal the mismatch. We use GICS sectors at the 2-digit level.
Σ-distance is forward-relevant. The metric is √((eᵢ − eⱼ)ᵀ Σ (eᵢ − eⱼ)) — the risk-model distance between the two names' "single-name baskets." Lower means the substitution costs less in TE.
Factor delta is the sanity check. Σ-distance can be small for two names that happen to have opposite factor exposures. The factor-loading delta check catches this — if the candidate would shift the portfolio's aggregate factor profile by more than 0.10, we keep searching.

Σ-distance is necessary but not sufficient. Two names can have small covariance distance and opposite factor exposures — substituting one for the other looks fine in tracking-error and surprises you in the next factor regime.

What the audit trail records

Per-trade replacement reason

Every replacement trade is logged with the substitution relationship: sold ticker, replaced by ticker, Σ-distance score, factor-delta score, sector match. The Run record carries the full list. A compliance review can re-derive the optimizer's choice from the snapshot — same Σ, same B, same sector membership — and verify the replacement was an objective best.

Per-replacement audit fields

Field	Example value
trade.harvest_sell_ticker	AAPL
trade.replacement_buy_ticker	MSFT
replacement.sector_match	GICS 45 · IT
replacement.sigma_distance	0.041
replacement.factor_delta_max	0.062
replacement.lock_until	2026-06-12

When no replacement is picked

The honest fallback

If no candidate satisfies both the Σ-distance and factor-delta thresholds, the routine returns None and the optimizer's next-day solve redistributes the freed capital across the rest of the universe within the TE budget. This produces a small amount of TE drift (typically < 5 bp per harvest) but is the right behaviour: a bad substitution is worse than no substitution.

Limitations

Edge cases the algorithm doesn't fully resolve

Sector concentration in small universes. On a US large-cap (100) universe with a small-cap-only ESG screen, some sectors have only 2–3 names. If both the sold ticker and its only viable substitute are in the wash-lock list, no replacement is possible — the routine falls back to no-replacement.
Two same-sector ETFs from one sponsor. Industry consensus says different-sponsor ETFs aren't identical; same-sponsor ETFs of the same index probably are. The platform doesn't auto-resolve this — accounts subscribed to ETF-substitution explicitly maintain a sponsor-aware exclusion list.
Cascading harvests. If the substitute itself drops below basis on day t+5, the optimizer wants to harvest it too. This is fine, except the next substitution must skip both names. We've capped the chain at three hops to prevent pathological loops.

For the wash-sale statute itself, see the United States jurisdiction post. For the risk-model construction that feeds Σ, see risk-model construction.

Notes & references

IRC §1091 — Loss from wash sales of stock or securities. The 'substantially identical' test is a question of fact, not law; practitioners follow industry consensus in the absence of bright-line guidance.
GICS sectors — Global Industry Classification Standard. Two-digit GICS provides 11 sectors that map cleanly onto the platform's universe.

Methodology note · the substitute is the load-bearing trade in a harvest. Don't let it spoil what you saved.

Replacement-security selection under wash-sale: how the optimizer keeps the factor exposure

What 'substantially identical' rules out

Σ-distance ranked, sector-filtered, exposure-checked

Per-trade replacement reason

The honest fallback

Edge cases the algorithm doesn't fully resolve

Risk model construction: the Σ matrix, the factor loadings, and why both matter

Tax-aware direct indexing, in full: model, data, backtest, observations

United States: §1091 wash sales, HIFO, and the realisation calendar