Skip to content

Synthetic Controls for Experimental Design

Abadie & Zhao (2025/2026), MIT working paper / Cambridge UP 2025.

1. The flipped workflow

Classical synthetic control answers: "I already have treated unit A — build a reweighted average of donors that approximates A in the pre-period, then impute A's post-period counterfactual."

Experimental design flips this: you have a pool of candidates and a budget k. You want to decide which k units to treat so that the post-period synthetic-control ATT has the tightest confidence interval.

Under the Abadie-Zhao framework,

\[ \operatorname{Var}\bigl[\widehat{\mathrm{ATT}} \mid D\bigr] \;\approx\; \sum_{i \in D} \sigma_i^2 \]

where \(\sigma_i^2\) is the feasible pre-period MSPE of the synthetic control fit for unit i. Picking the best k candidates by smallest pre-period MSPE minimises this variance.

2. API

import statspai as sp

res = sp.synth_experimental_design(
    data=df,              # long-format panel
    unit='unit',
    time='time',
    outcome='y',
    k=5,                  # budget: treat 5 units
    pre_period=(0, 19),   # closed interval, pre-treatment periods
    candidates=None,      # default: all units are candidates
    donors=None,          # default: non-candidates; leave-one-out fallback
    risk='mspe',          # 'mspe' or 'rmse'
    concentration_weight=0.0,  # penalise Herfindahl weight concentration
    penalization=0.0,     # simplex-solver ridge penalty
    n_random=500,         # Monte-Carlo sample for the random-assignment baseline
    random_state=0,
)

Returns a SynthExperimentalDesignResult with

  • selected — the k recommended units
  • ranking — DataFrame with per-candidate risk scores
  • weights — per-candidate donor weight vectors (for audit)
  • expected_variance, baseline_variance — sum-MSPE under the chosen vs random assignment
  • summary() — human-readable report

3. Recipe on a synthetic panel

import numpy as np, pandas as pd, statspai as sp

rng = np.random.default_rng(0)
n_units, n_periods = 30, 20
F = rng.normal(size=(n_periods, 3))
L = rng.normal(size=(n_units, 3))
Y = L @ F.T + 0.1 * rng.normal(size=(n_units, n_periods))

df = pd.DataFrame([
    {'unit': i, 'time': t, 'y': Y[i, t]}
    for i in range(n_units) for t in range(n_periods)
])

res = sp.synth_experimental_design(
    df, unit='unit', time='time', outcome='y',
    k=5, pre_period=(0, 19), random_state=0, n_random=200,
)
print(res.summary())

On this panel the expected variance falls 96% below random assignment — a three-fold tighter post-period CI, for free, just by choosing well-behaved units before you run the experiment.

4. When to use this vs sp.synth

Step Function
Deciding which units to treat sp.synth_experimental_design
Post-treatment counterfactual sp.synth(method='classic')
Inference on the ATT sp.scpi / sp.sdid
Cross-estimator robustness sp.synth_compare

The two are sequential: run synth_experimental_design at the planning stage, then let the experiment run and hand the result to the regular sp.synth pipeline.

5. Common pitfalls

  • Panel must be balanced inside pre_period. The function raises if any (unit, time) cell is NaN.
  • Setting candidates equal to all units triggers the leave-one-out fallback — each candidate's donor pool becomes the other n - 1 units. This is fine but inflates computation.
  • concentration_weight > 0 adds a Herfindahl penalty to avoid selecting units whose SC fit depends on a single donor — the Abadie- Zhao (2025/2026) paper recommends concentration_weight ≈ 0.5 when the donor pool is small.

6. References

  • Abadie, A. & Zhao, J. (2025/2026). Synthetic Controls for Experimental Design. MIT / Cambridge UP.
  • Abadie, A. (2021). "Using synthetic controls: feasibility, data requirements, and methodological aspects." JEL 59(2).

For Agents

Pre-conditions - panel data in long form (unit × time × outcome) - single treated unit (classic) or a treatment-timing column (staggered) - ≥ 10 donor (untreated) units with similar pre-treatment trajectories - ≥ 10 pre-treatment periods (fewer → large weight on any one year)

Identifying assumptions - Treatment effect on the treated is identified by the counterfactual implicit in the donor weights - No spillover from treated unit to donors (SUTVA) - Donor pool contains units whose outcomes plausibly track the treated counterfactual - Pre-treatment fit (RMSPE) is small relative to post-treatment effect for placebo inference

Failure modes → recovery

Symptom Exception Remedy Try next
Pre-treatment RMSPE > post-treatment effect AssumptionWarning Poor pre-fit — switch to method='demeaned'/'augmented' or enlarge donor pool. sp.synth
Placebo p-value ≥ 0.1 despite visible gap AssumptionWarning Use inference='conformal' (valid under weak assumptions) or report ranked placebo statistic. sp.synth
All weight concentrated on one donor AssumptionWarning Interpolation bias risk — check method='elastic_net' or augmented SCM. sp.synth
Treated unit outside donor convex hull IdentificationFailure Extrapolation needed — use method='unconstrained' or 'augmented'. sp.synth

Alternatives (ranked) - sp.sdid - sp.did - sp.matrix_completion - sp.causal_impact

Typical minimum N: 10