Synthetic Controls for Experimental Design¶
Abadie & Zhao (2025/2026), MIT working paper / Cambridge UP 2025.
1. The flipped workflow¶
Classical synthetic control answers: "I already have treated unit A — build a reweighted average of donors that approximates A in the pre-period, then impute A's post-period counterfactual."
Experimental design flips this: you have a pool of candidates and a
budget k. You want to decide which k units to treat so that the
post-period synthetic-control ATT has the tightest confidence interval.
Under the Abadie-Zhao framework,
where \(\sigma_i^2\) is the feasible pre-period MSPE of the synthetic
control fit for unit i. Picking the best k candidates by smallest
pre-period MSPE minimises this variance.
2. API¶
import statspai as sp
res = sp.synth_experimental_design(
data=df, # long-format panel
unit='unit',
time='time',
outcome='y',
k=5, # budget: treat 5 units
pre_period=(0, 19), # closed interval, pre-treatment periods
candidates=None, # default: all units are candidates
donors=None, # default: non-candidates; leave-one-out fallback
risk='mspe', # 'mspe' or 'rmse'
concentration_weight=0.0, # penalise Herfindahl weight concentration
penalization=0.0, # simplex-solver ridge penalty
n_random=500, # Monte-Carlo sample for the random-assignment baseline
random_state=0,
)
Returns a SynthExperimentalDesignResult with
selected— thekrecommended unitsranking— DataFrame with per-candidate risk scoresweights— per-candidate donor weight vectors (for audit)expected_variance,baseline_variance— sum-MSPE under the chosen vs random assignmentsummary()— human-readable report
3. Recipe on a synthetic panel¶
import numpy as np, pandas as pd, statspai as sp
rng = np.random.default_rng(0)
n_units, n_periods = 30, 20
F = rng.normal(size=(n_periods, 3))
L = rng.normal(size=(n_units, 3))
Y = L @ F.T + 0.1 * rng.normal(size=(n_units, n_periods))
df = pd.DataFrame([
{'unit': i, 'time': t, 'y': Y[i, t]}
for i in range(n_units) for t in range(n_periods)
])
res = sp.synth_experimental_design(
df, unit='unit', time='time', outcome='y',
k=5, pre_period=(0, 19), random_state=0, n_random=200,
)
print(res.summary())
On this panel the expected variance falls 96% below random assignment — a three-fold tighter post-period CI, for free, just by choosing well-behaved units before you run the experiment.
4. When to use this vs sp.synth¶
| Step | Function |
|---|---|
| Deciding which units to treat | sp.synth_experimental_design |
| Post-treatment counterfactual | sp.synth(method='classic') |
| Inference on the ATT | sp.scpi / sp.sdid |
| Cross-estimator robustness | sp.synth_compare |
The two are sequential: run synth_experimental_design at the
planning stage, then let the experiment run and hand the result to the
regular sp.synth pipeline.
5. Common pitfalls¶
- Panel must be balanced inside
pre_period. The function raises if any (unit, time) cell is NaN. - Setting
candidatesequal to all units triggers the leave-one-out fallback — each candidate's donor pool becomes the othern - 1units. This is fine but inflates computation. concentration_weight > 0adds a Herfindahl penalty to avoid selecting units whose SC fit depends on a single donor — the Abadie- Zhao (2025/2026) paper recommendsconcentration_weight ≈ 0.5when the donor pool is small.
6. References¶
- Abadie, A. & Zhao, J. (2025/2026). Synthetic Controls for Experimental Design. MIT / Cambridge UP.
- Abadie, A. (2021). "Using synthetic controls: feasibility, data requirements, and methodological aspects." JEL 59(2).
For Agents¶
Pre-conditions - panel data in long form (unit × time × outcome) - single treated unit (classic) or a treatment-timing column (staggered) - ≥ 10 donor (untreated) units with similar pre-treatment trajectories - ≥ 10 pre-treatment periods (fewer → large weight on any one year)
Identifying assumptions - Treatment effect on the treated is identified by the counterfactual implicit in the donor weights - No spillover from treated unit to donors (SUTVA) - Donor pool contains units whose outcomes plausibly track the treated counterfactual - Pre-treatment fit (RMSPE) is small relative to post-treatment effect for placebo inference
Failure modes → recovery
| Symptom | Exception | Remedy | Try next |
|---|---|---|---|
| Pre-treatment RMSPE > post-treatment effect | AssumptionWarning |
Poor pre-fit — switch to method='demeaned'/'augmented' or enlarge donor pool. | sp.synth |
| Placebo p-value ≥ 0.1 despite visible gap | AssumptionWarning |
Use inference='conformal' (valid under weak assumptions) or report ranked placebo statistic. | sp.synth |
| All weight concentrated on one donor | AssumptionWarning |
Interpolation bias risk — check method='elastic_net' or augmented SCM. | sp.synth |
| Treated unit outside donor convex hull | IdentificationFailure |
Extrapolation needed — use method='unconstrained' or 'augmented'. | sp.synth |
Alternatives (ranked)
- sp.sdid
- sp.did
- sp.matrix_completion
- sp.causal_impact
Typical minimum N: 10