statspai.synth¶
synth ¶
Synthetic Control module for StatsPAI.
Unified entry point: synth(method=...) dispatches to all variants.
Variants (20 methods)
- classic — Abadie, Diamond & Hainmueller (2010)
- penalized / ridge — Ridge-penalised SCM
- demeaned / detrended — Ferman & Pinto (2021)
- unconstrained / elastic_net — Doudchenko & Imbens (2016)
- augmented / ascm — Ben-Michael, Feller & Rothstein (2021)
- sdid — Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021)
- factor / gsynth — Xu (2017)
- staggered — Ben-Michael, Feller & Rothstein (2022)
- mc / matrix_completion — Athey, Bayati et al. (2021)
- discos / distributional — Gunsilius (2023)
- multi_outcome — Sun (2023)
- scpi / prediction_interval — Cattaneo, Feng & Titiunik (2021)
- bayesian — Bayesian SCM with MCMC posterior (Vives & Martinez 2024)
- bsts / causal_impact — Bayesian Structural Time Series (Brodersen et al. 2015)
- penscm / abadie_lhour — Penalized SCM with pairwise discrepancy (Abadie & L'Hour 2021)
- fdid / forward_did — Forward DID with optimal donor selection (Li 2024)
- cluster — Cluster SCM with donor grouping (Rho et al. 2025, arXiv:2503.21629) [@rho2025clustersc]
- sparse / lasso — Sparse SCM with L1 penalties (Amjad, Shah & Shen 2018)
- kernel / kernel_ridge — Kernel-based nonlinear SCM
Inference
- placebo — in-space permutation (default)
- conformal — Chernozhukov, Wüthrich & Zhu (2021)
- bootstrap / jackknife — for SDID
- prediction intervals — Cattaneo et al. (2021)
- bayesian posterior — full posterior credible intervals (Bayesian SCM)
- bsts posterior — Bayesian structural time series uncertainty
Diagnostics
- synth_sensitivity() — comprehensive robustness suite
- synth_loo() — leave-one-out donor analysis
- synth_time_placebo() — backdating tests
- synth_donor_sensitivity() — donor pool variation
- synth_rmspe_filter() — pre-RMSPE robustness
SyntheticControl ¶
Canonical Synthetic Control estimator (Abadie, Diamond & Hainmueller 2010) with nested V-W optimization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel |
required |
outcome
|
str
|
Column names. |
required |
unit
|
str
|
Column names. |
required |
time
|
str
|
Column names. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
covariates
|
list of str
|
Column names whose pre-treatment means are used as predictors for the V-weighted matching problem. |
None
|
special_predictors
|
list of tuple
|
R/Stata |
None
|
v_method
|
(auto, nested, equal)
|
|
'auto'
|
standardize_predictors
|
bool
|
Rescale predictors to unit range before the V optimization. |
True
|
n_random_starts
|
int
|
Additional random Dirichlet starts for the outer V optimiser. |
4
|
penalization
|
float
|
Ridge penalty on donor weights. |
0.0
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
fit ¶
Fit the Synthetic Control model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
placebo
|
bool
|
Run in-space placebo tests across all donor units. |
True
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
SynthComparison ¶
Structured container for multi-method SCM comparison results.
Attributes:
| Name | Type | Description |
|---|---|---|
results |
dict
|
Mapping of method name to |
comparison_table |
DataFrame
|
Side-by-side metrics for every successful method, sorted by
|
recommended |
str
|
Name of the recommended method. |
recommendation_reason |
str
|
Human-readable justification. |
summary ¶
Return a formatted multi-line summary string.
Returns:
| Type | Description |
|---|---|
str
|
|
plot ¶
Overlay all results using synthplot(type='compare').
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
Forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
matplotlib Figure or Axes
|
|
to_latex ¶
Render the comparison as a LaTeX table.
Forwards to :func:statspai.synth.exports.synth_to_latex with
the side-by-side multi-method layout.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
**kwargs
|
See :func: |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
|
to_markdown ¶
Render the comparison as a Markdown table.
Forwards to :func:statspai.synth.exports.synth_to_markdown.
to_excel ¶
Write a multi-sheet Excel workbook covering all methods.
Forwards to :func:statspai.synth.exports.synth_to_excel.
Returns the absolute path of the file written.
SequentialSDIDResult
dataclass
¶
Per-cohort and aggregated output of :func:sequential_sdid.
SyntheticSurvivalResult
dataclass
¶
Output of :func:synth_survival.
SynthExperimentalDesignResult
dataclass
¶
Structured output of :func:synth_experimental_design.
Attributes:
| Name | Type | Description |
|---|---|---|
selected |
list of unit ids
|
The |
ranking |
DataFrame
|
All candidates with columns
|
weights |
dict[unit_id, ndarray]
|
Leave-one-out SC weight vectors (aligned to |
donor_units |
list
|
The donor pool that each candidate was matched against (candidates excluded from each other's donor pool by default). |
expected_variance |
float
|
Sum of pre-period MSPEs over |
baseline_variance |
float
|
Same quantity for a random- |
method |
str
|
Always |
diagnostics |
dict
|
Extra metadata (n_units, pre_periods, solver, etc.). |
synth ¶
synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', covariates: Optional[List[str]] = None, penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05, inference: Optional[str] = None, treatment: Optional[str] = None, **kwargs) -> CausalResult
Public sp.synth entry point — see _dispatch_synth_impl for
the full docstring on methods and parameters.
Thin wrapper around the multi-branch dispatcher that attaches a
:class:Provenance record to the returned result so downstream
replication_pack / Quarto appendix / table footers can pick up
the call (function name, args, data hash) without each individual
SCM backend having to opt in. The 20-method dispatcher itself
lives in :func:_dispatch_synth_impl.
References
Abadie, A., Diamond, A. and Hainmueller, J. (2010). Synthetic control methods for comparative case studies. Journal of the American Statistical Association. [@abadie2010synthetic]
synthplot ¶
synthplot(result: Union[CausalResult, List[CausalResult]], type: str = 'trajectory', ax=None, figsize: Optional[tuple] = None, title: Optional[str] = None, top_n: int = 15, labels: Optional[List[str]] = None, **kwargs)
Unified plot function for all Synthetic Control variants.
Automatically detects the SCM variant and renders the appropriate
visualisation. Works with results from synth(method=...),
sdid(), augsynth(), gsynth(), staggered_synth(),
conformal_synth(), and all other variants.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult or list of CausalResult
|
Output of any |
required |
type
|
str
|
Plot type:
|
'trajectory'
|
ax
|
matplotlib Axes
|
Pre-existing axes for single-panel plots. |
None
|
figsize
|
tuple
|
Figure size. Auto-selected if None. |
None
|
title
|
str
|
Override the auto-generated title. |
None
|
top_n
|
int
|
Number of donors to show in weight plots. |
15
|
labels
|
list of str
|
Labels for |
None
|
**kwargs
|
Additional arguments passed to individual plotters. Notable:
|
{}
|
Returns:
| Type | Description |
|---|---|
(fig, ax) or (fig, axes)
|
|
Examples:
>>> result = sp.synth(df, ..., method='demeaned')
>>> sp.synthplot(result) # trajectory
>>> sp.synthplot(result, type='gap') # gap plot
>>> sp.synthplot(result, type='both') # two-panel
>>> sp.synthplot(result, type='weights') # donor weights
>>> sp.synthplot(result, type='placebo') # placebo distribution
Compare methods:
demeaned_synth ¶
demeaned_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, variant: Literal['demeaned', 'detrended'] = 'demeaned', penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult
De-meaned / De-trended Synthetic Control Method.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
covariates
|
list of str
|
Additional covariates to match on. |
None
|
variant
|
('demeaned', 'detrended')
|
|
'demeaned'
|
penalization
|
float
|
Ridge penalty on weights. |
0.0
|
placebo
|
bool
|
Run in-space placebo inference. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
robust_synth ¶
robust_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, variant: Literal['unconstrained', 'elastic_net', 'penalized'] = 'unconstrained', l1_penalty: float = 0.0, l2_penalty: float = 0.01, intercept: bool = True, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Robust / unconstrained Synthetic Control.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
covariates
|
list of str
|
Additional covariates to match on. |
None
|
variant
|
('unconstrained', 'elastic_net', 'penalized')
|
|
'unconstrained'
|
l1_penalty
|
float
|
Lasso (L1) penalty strength. |
0.0
|
l2_penalty
|
float
|
Ridge (L2) penalty strength. |
0.01
|
intercept
|
bool
|
Fit an intercept (level shift). Only for unconstrained / elastic_net. |
True
|
placebo
|
bool
|
Run in-space placebo inference. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
staggered_synth ¶
staggered_synth(data: DataFrame, outcome: str, unit: str, time: str, treatment: str, method: Literal['separate', 'pooled'] = 'separate', penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Staggered Adoption Synthetic Control.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treatment
|
str
|
Binary treatment indicator (0/1). Units transition from 0 to 1 at their respective adoption times. |
required |
method
|
('separate', 'pooled')
|
|
'separate'
|
penalization
|
float
|
Ridge penalty on donor weights. |
0.0
|
placebo
|
bool
|
Run placebo inference. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
With |
Examples:
conformal_synth ¶
conformal_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, scm_method: str = 'classic', grid_size: int = 101, grid_range: Optional[Tuple[float, float]] = None, alpha: float = 0.05, penalization: float = 0.0) -> CausalResult
Conformal inference for synthetic control.
Constructs valid confidence intervals by inverting a sequence of conformal tests, one for each hypothesised treatment effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
scm_method
|
str
|
Which SCM variant to use for weight estimation. Currently supports 'classic' (constrained) and 'ridge'. |
'classic'
|
grid_size
|
int
|
Number of points in the hypothesis grid for CI inversion. |
101
|
grid_range
|
tuple of (float, float)
|
(min, max) of the hypothesis grid. If None, auto-determined from pre-treatment residual scale. |
None
|
alpha
|
float
|
Significance level. |
0.05
|
penalization
|
float
|
Ridge penalty (used when scm_method='ridge'). |
0.0
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
With |
Examples:
scest ¶
scest(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, w_constr: str = 'simplex', lasso_lambda: float = 1.0, ridge_lambda: float = 1.0) -> Dict[str, Any]
Estimate synthetic control weights.
Solves the constrained optimisation problem to find donor weights
that best reproduce the treated unit's pre-treatment outcomes.
Mirrors the R package's scest() function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable column name. |
required |
unit
|
str
|
Unit identifier column name. |
required |
time
|
str
|
Time period column name. |
required |
treated_unit
|
scalar
|
Identifier of the treated unit. |
required |
treatment_time
|
scalar
|
First treatment period. |
required |
w_constr
|
str
|
Weight constraint:
|
'simplex'
|
lasso_lambda
|
float
|
L1 penalty (used when |
1.0
|
ridge_lambda
|
float
|
L2 penalty (used when |
1.0
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
|
Examples:
scdata ¶
scdata(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any) -> Dict[str, Any]
Prepare data matrices for synthetic control estimation.
Reshapes a long-format panel into the matrices needed by scest
and scpi. Mirrors the R package's scdata() function.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable column name. |
required |
unit
|
str
|
Unit identifier column name. |
required |
time
|
str
|
Time period column name. |
required |
treated_unit
|
scalar
|
Identifier of the treated unit. |
required |
treatment_time
|
scalar
|
First treatment period. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
|
Examples:
mc_synth ¶
mc_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, lambda_reg: Optional[float] = None, max_iter: int = 500, tol: float = 1e-06, cv_folds: int = 5, alpha: float = 0.05, placebo: bool = True, seed: Optional[int] = None) -> CausalResult
Matrix Completion Synthetic Control Method.
Imputes the treated unit's post-treatment counterfactual by solving a nuclear-norm-penalised matrix completion problem on the full panel, following Athey et al. (2021).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
covariates
|
list of str
|
Time-varying covariates to partial out before matrix completion. |
None
|
lambda_reg
|
float
|
Nuclear norm penalty. If |
None
|
max_iter
|
int
|
Maximum Soft-Impute iterations. |
500
|
tol
|
float
|
Convergence tolerance (relative change in Frobenius norm). |
1e-6
|
cv_folds
|
int
|
Number of CV folds for automatic lambda selection. |
5
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
placebo
|
bool
|
Run placebo (permutation) inference by treating each control unit as if it were treated. |
True
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
With |
Notes
The algorithm uses the Soft-Impute / Singular Value Thresholding (SVT) procedure. At each iteration the current completion is projected onto observed entries, combined with the previous imputation at missing entries, then rank-reduced by soft-thresholding the singular values.
Examples:
multi_outcome_synth ¶
multi_outcome_synth(data: DataFrame, outcomes: List[str], unit: str, time: str, treated_unit: Any, treatment_time: Any, method: str = 'concatenated', standardize: bool = True, penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Multiple Outcomes Synthetic Control Method (Sun 2023).
Finds a single set of donor weights that simultaneously matches the treated unit across all K outcomes in the pre-treatment period.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data containing all outcome columns. |
required |
outcomes
|
list of str
|
Column names for the K outcome variables. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Value identifying the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
method
|
('concatenated', 'averaged')
|
Weight-estimation strategy.
|
'concatenated'
|
standardize
|
bool
|
Standardise each outcome to zero mean / unit variance before stacking or averaging (strongly recommended when outcome scales differ). |
True
|
penalization
|
float
|
Ridge-type penalty added to the diagonal of the donor
cross-product matrix ( |
0.0
|
placebo
|
bool
|
Run in-space placebo permutations for inference (each donor is pretended to be treated in turn). |
True
|
alpha
|
float
|
Significance level for confidence intervals and joint test. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Unified result object with:
|
Examples:
>>> result = sp.multi_outcome_synth(
... df,
... outcomes=['gdp', 'employment', 'wages'],
... unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... )
>>> print(result.summary())
>>> result.model_info['per_outcome_effects']
Notes
Sun (2023) shows that under a low-rank factor model the bias of the concatenated estimator shrinks as O(1/sqrt(K)), where K is the number of outcomes. The key requirement is that the outcomes share a common latent-factor structure.
qqsynth ¶
qqsynth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, n_quantiles: int = 100, placebo: bool = True, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult
Quantile Synthetic Control (alias for DiSCo with method='quantile').
Applies quantile-on-quantile regression to match quantile functions without the convexity constraints of the mixture approach.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
outcome
|
str
|
Outcome variable column. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
n_quantiles
|
int
|
Number of quantile grid points. |
100
|
placebo
|
bool
|
Run placebo permutation inference. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
>>> result = sp.qqsynth(df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989)
>>> print(result.summary())
See Also
discos : Full distributional synthetic controls with method selection.
discos_test ¶
Test for distributional treatment effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Output from |
required |
test
|
('ks', 'cvm', 'stochastic_dominance')
|
|
'ks'
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys: |
Examples:
discos_plot ¶
discos_plot(result: CausalResult, type: str = 'quantile_effect', ax=None, figsize: Tuple[int, int] = (10, 6), color: str = '#2C3E50', ci_alpha: float = 0.2, title: Optional[str] = None)
Visualise distributional synthetic control results.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Output from |
required |
type
|
('quantile_effect', 'quantile_comparison', 'gap', 'weights')
|
default 'quantile_effect'
|
'quantile_effect'
|
ax
|
Axes
|
Pre-existing axes for the plot. |
None
|
figsize
|
tuple
|
Figure size. |
(10, 6)
|
color
|
str
|
Primary plot colour. |
'#2C3E50'
|
ci_alpha
|
float
|
Transparency for CI band. |
0.2
|
title
|
str
|
Plot title override. |
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
Examples:
stochastic_dominance ¶
Test for stochastic dominance of the treated distribution over the counterfactual distribution.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Output from |
required |
order
|
(1, 2)
|
Order of stochastic dominance. 1 = first-order (CDF dominance). 2 = second-order (integrated CDF dominance). |
1
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys: |
Examples:
bayesian_synth ¶
bayesian_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit, treatment_time, covariates: Optional[List[str]] = None, n_iter: int = 2000, n_warmup: int = 1000, n_chains: int = 2, dirichlet_alpha: float = 1.0, seed: Optional[int] = None, alpha: float = 0.05) -> CausalResult
Bayesian Synthetic Control Method.
Estimates the ATT by placing a Dirichlet prior on donor weights and sampling from the posterior via Metropolis-Hastings MCMC. Returns full posterior credible intervals for the treatment effect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format with columns for unit, time, and outcome. |
required |
outcome
|
str
|
Name of the outcome variable column. |
required |
unit
|
str
|
Name of the unit identifier column. |
required |
time
|
str
|
Name of the time period column. |
required |
treated_unit
|
scalar
|
Value in unit that identifies the treated unit. |
required |
treatment_time
|
scalar
|
First period of treatment (inclusive). |
required |
covariates
|
list of str
|
Additional pre-treatment predictors to include in the matching objective. Covariates are appended to the pre-treatment outcome series for each unit before fitting. |
None
|
n_iter
|
int
|
Total MCMC iterations per chain (including warmup). |
2000
|
n_warmup
|
int
|
Number of warmup (burn-in) iterations for adaptation. Must be strictly less than n_iter. |
1000
|
n_chains
|
int
|
Number of independent MCMC chains. Multiple chains enable the R-hat convergence diagnostic. |
2
|
dirichlet_alpha
|
float
|
Concentration parameter for the symmetric Dirichlet prior on
donor weights. |
1.0
|
seed
|
int
|
Random seed for reproducibility. |
None
|
alpha
|
float
|
Significance level for credible intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
With |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the panel has fewer than 2 pre-treatment periods, no post-treatment periods, or no valid donor units. |
Examples:
>>> result = sp.bayesian_synth(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... n_iter=4000, n_warmup=2000, n_chains=4, seed=42,
... )
>>> print(result.summary())
Notes
The sampler uses a Dirichlet proposal on the simplex (re-normalised perturbation) with adaptive step-size tuning during warmup targeting an acceptance rate of ~0.35. Samples are thinned by a factor of 2 to reduce autocorrelation.
References
Vives, J. and Martinez, A. (2024). "Bayesian Synthetic Control Methods." Journal of Computational and Graphical Statistics.
causal_impact ¶
causal_impact(data: DataFrame, pre_period: Tuple[Any, Any], post_period: Tuple[Any, Any], outcome: Optional[str] = None, covariates: Optional[List[str]] = None, model: str = 'local_level', n_simulations: int = 1000, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult
Google CausalImpact-style causal inference for time series.
Fits a Bayesian structural time series model on the pre-intervention period and produces a counterfactual prediction for the post-period. The treatment effect is the difference between observed and counterfactual, with full posterior uncertainty.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Time-indexed DataFrame. If outcome is |
required |
pre_period
|
tuple of (start, end)
|
Pre-intervention period boundaries (inclusive). Values are matched against the DataFrame index. |
required |
post_period
|
tuple of (start, end)
|
Post-intervention period boundaries (inclusive). |
required |
outcome
|
str
|
Column name of the outcome variable. If |
None
|
covariates
|
list of str
|
Column names to use as controls. If |
None
|
model
|
``'local_level'`` or ``'local_linear_trend'``
|
State-space model type. |
'local_level'
|
n_simulations
|
int
|
Number of posterior draws for uncertainty quantification. |
1000
|
alpha
|
float
|
Significance level for confidence/credible intervals. |
0.05
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Unified result object with:
- |
Notes
The implementation follows Brodersen et al. (2015). Regression coefficients are estimated via ridge regression with GCV-selected penalty (an empirical-Bayes analogue of the spike-and-slab prior). Hyperparameters (observation/level/slope noise) are estimated by maximum likelihood through the Kalman filter.
Examples:
>>> import pandas as pd
>>> import statspai as sp
>>> # Wide-format: columns = [outcome, control1, control2, ...]
>>> result = sp.synth.causal_impact(
... data, pre_period=(1, 70), post_period=(71, 100)
... )
>>> print(result.summary())
References
Brodersen, K.H., Gallusser, F., Koehler, J., Remy, N. and Scott, S.L. (2015). "Inferring causal impact using Bayesian structural time series models." Annals of Applied Statistics, 9(1), 247-274. [@brodersen2015inferring]
bsts_synth ¶
bsts_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, model: str = 'local_level', n_simulations: int = 1000, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult
BSTS synthetic control with a panel-data interface.
Converts long-format panel data into the wide format expected by
:func:causal_impact, using control-unit outcome series as
covariates/regressors. This provides a CausalImpact-style analysis
that integrates seamlessly with StatsPAI's other synthetic-control
methods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data with columns for unit, time, and outcome. |
required |
outcome
|
str
|
Outcome variable column name. |
required |
unit
|
str
|
Unit identifier column name. |
required |
time
|
str
|
Time period column name. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
covariates
|
list of str
|
Additional time-varying covariates to include alongside control unit series. Each covariate is averaged across control units per time period and appended as an extra regressor. |
None
|
model
|
``'local_level'`` or ``'local_linear_trend'``
|
State-space model type. |
'local_level'
|
n_simulations
|
int
|
Number of posterior draws. |
1000
|
alpha
|
float
|
Significance level. |
0.05
|
seed
|
int
|
Random seed. |
None
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Unified result object. |
Examples:
>>> import statspai as sp
>>> result = sp.synth.bsts_synth(
... data, outcome='gdp', unit='country', time='year',
... treated_unit='West Germany', treatment_time=1990,
... )
>>> print(result.summary())
See Also
causal_impact : Wide-format CausalImpact interface.
penalized_synth ¶
penalized_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, lambda_pen: Optional[float] = None, penalty_type: str = 'pairwise', predictors: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Penalized Synthetic Control estimator (Abadie & L'Hour 2021).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data with columns for unit, time, outcome, and optionally covariates / predictors. |
required |
outcome
|
str
|
Name of the outcome column. |
required |
unit
|
str
|
Name of the unit identifier column. |
required |
time
|
str
|
Name of the time period column. |
required |
treated_unit
|
Any
|
Identifier of the treated unit. |
required |
treatment_time
|
Any
|
First treatment period (inclusive). |
required |
covariates
|
list of str
|
Covariate columns used only for the pairwise distance penalty.
When |
None
|
lambda_pen
|
float
|
Penalty parameter. |
None
|
penalty_type
|
('pairwise', 'max_dev', 'l1_pairwise')
|
Penalty functional form.
|
'pairwise'
|
predictors
|
list of str
|
Columns whose pre-treatment averages are appended to the covariate vector for distance computation. |
None
|
placebo
|
bool
|
Run in-space placebo permutation tests. |
True
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
With |
References
Abadie, A. and L'Hour, J. (2021). "A Penalized Synthetic Control Estimator for Disaggregated Data." Journal of the American Statistical Association, 116(536), 1817-1834. [@abadie2021penalized]
cluster_synth ¶
cluster_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, n_clusters: Optional[int] = None, cluster_method: str = 'kmeans', augment: bool = False, max_augment: int = 3, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult
Cluster Synthetic Control estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Name of the outcome column. |
required |
unit
|
str
|
Name of the unit-identifier column. |
required |
time
|
str
|
Name of the time-period column. |
required |
treated_unit
|
any
|
Identifier of the single treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
n_clusters
|
int or None
|
Number of clusters. |
None
|
cluster_method
|
('kmeans', 'spectral', 'hierarchical')
|
Clustering algorithm. |
'kmeans'
|
augment
|
bool
|
If |
False
|
max_augment
|
int
|
Maximum number of additional donors when augment is |
3
|
covariates
|
list of str or None
|
Additional columns to include in the clustering feature matrix. |
None
|
placebo
|
bool
|
Run in-space placebo permutation inference. |
True
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
seed
|
int or None
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
sparse_synth ¶
sparse_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, mode: str = 'lasso', lambda_w: Optional[float] = None, lambda_v: Optional[float] = None, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Sparse Synthetic Control estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
mode
|
('lasso', 'constrained_lasso', 'joint')
|
|
'lasso'
|
lambda_w
|
float or None
|
L1 penalty on donor weights. |
None
|
lambda_v
|
float or None
|
L1 penalty on feature weights ( |
None
|
covariates
|
list of str
|
Additional covariates to append to the pre-treatment outcome matrix before weight estimation. |
None
|
placebo
|
bool
|
Run in-space placebo permutation for inference. |
True
|
alpha
|
float
|
Significance level for confidence interval. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
kernel_synth ¶
kernel_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit, treatment_time, kernel: str = 'rbf', sigma: Optional[float] = None, degree: int = 2, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Kernel-based Nonlinear Synthetic Control Method.
Standard SCM assumes the counterfactual is a linear combination of donors. This estimator lifts the donor panel into a reproducing kernel Hilbert space (RKHS) and solves for synthetic control weights in that feature space, capturing nonlinear donor relationships.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format with columns for unit, time, and outcome. |
required |
outcome
|
str
|
Name of the outcome variable. |
required |
unit
|
str
|
Column identifying panel units. |
required |
time
|
str
|
Column identifying time periods. |
required |
treated_unit
|
Identifier of the treated unit. |
required | |
treatment_time
|
First treatment period (inclusive). |
required | |
kernel
|
``{'rbf', 'polynomial', 'laplacian'}``
|
Kernel function to use. |
``'rbf'``
|
sigma
|
float or None
|
Bandwidth for RBF / Laplacian kernels. If None, the median heuristic is used (recommended). |
None
|
degree
|
int
|
Degree for the polynomial kernel (ignored otherwise). |
2
|
covariates
|
list of str or None
|
Additional pre-treatment covariates to include in the feature
vector. If provided, each donor row is |
None
|
placebo
|
bool
|
Whether to run in-space placebo permutation for inference. |
True
|
alpha
|
float
|
Significance level for the confidence interval. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
Unified result with ATT estimate, SE, p-value, CI, and
period-level effects in |
Notes
The optimisation solved is:
.. math::
\min_{w \ge 0,\, \sum w = 1}
\bigl[K(Y_1, Y_1) - 2\,w^\top k(Y_1) + w^\top K\,w\bigr]
where :math:K_{ij} = k(Y_{0,i},\, Y_{0,j}) is the donor kernel
matrix and :math:k(Y_1)_j = k(Y_1,\, Y_{0,j}).
References
Scholkopf, B. and Smola, A.J. (2002). "Learning with Kernels."
kernel_ridge_synth ¶
kernel_ridge_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit, treatment_time, kernel: str = 'rbf', sigma: Optional[float] = None, degree: int = 2, ridge_lambda: float = 0.01, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult
Kernel Ridge Regression Synthetic Control.
Instead of constrained simplex weights, this estimator uses kernel
ridge regression to learn the mapping from donors to the treated unit.
The ridge penalty lambda prevents overfitting when the number of
donors is small relative to pre-treatment periods.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
Identifier of the treated unit. |
required | |
treatment_time
|
First treatment period (inclusive). |
required | |
kernel
|
``{'rbf', 'polynomial', 'laplacian'}``
|
Kernel function. |
``'rbf'``
|
sigma
|
float or None
|
Bandwidth (None = median heuristic). |
None
|
degree
|
int
|
Polynomial kernel degree. |
2
|
ridge_lambda
|
float
|
Regularisation parameter. Larger values shrink the coefficient vector toward zero. |
0.01
|
covariates
|
list of str or None
|
Additional pre-treatment covariates. |
None
|
placebo
|
bool
|
Run placebo permutation inference. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Notes
The solution is:
.. math::
\beta = (K + \lambda I)^{-1}\, k(Y_1)
and the counterfactual is :math:\hat{Y}_{1,\text{post}} =
Y_{0,\text{post}}^\top \beta.
No non-negativity or sum-to-one constraints are imposed, which gives the estimator more flexibility but may produce extrapolation.
synth_compare ¶
synth_compare(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, methods: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05, **kwargs) -> SynthComparison
Run multiple SCM variants and compare them side by side.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable column name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
None
|
treatment_time
|
any
|
First treatment period (inclusive). |
None
|
methods
|
list of str
|
SCM variants to compare. If |
None
|
placebo
|
bool
|
Whether to run placebo inference for each method. |
True
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
**kwargs
|
Additional keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
SynthComparison
|
Structured comparison object with |
Examples:
>>> comp = sp.synth_compare(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... )
>>> print(comp.summary())
>>> print(comp.recommended)
'demeaned'
>>> comp.plot()
Compare a subset of methods:
>>> comp = sp.synth_compare(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... methods=['classic', 'augmented', 'sdid', 'mc'],
... )
See Also
synth_recommend : Quick one-liner returning only the method name. synth : Unified SCM dispatcher.
synth_recommend ¶
synth_recommend(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, **kwargs) -> str
Quickly recommend the best SCM method for the given data.
Runs synth_compare internally with placebo=False for speed,
then returns just the recommended method name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable column name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
None
|
treatment_time
|
any
|
First treatment period (inclusive). |
None
|
**kwargs
|
Additional keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Name of the recommended SCM method (e.g., |
Examples:
>>> best = sp.synth_recommend(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... )
>>> best
'demeaned'
Then use it:
>>> result = sp.synth(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... method=best,
... )
See Also
synth_compare : Full comparison with all metrics and plots.
synth_power ¶
synth_power(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, effect_sizes: Optional[Sequence[float]] = None, n_simulations: int = 200, alpha: float = 0.05, seed: Optional[int] = None) -> DataFrame
Power analysis for Synthetic Control designs.
Estimates statistical power across a grid of hypothetical effect sizes using placebo-based inference. Identifies the Minimum Detectable Effect (MDE) — the smallest effect where power >= 0.80.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period (inclusive). |
required |
effect_sizes
|
array-like of float
|
Grid of hypothetical additive effect sizes to evaluate.
If |
None
|
n_simulations
|
int
|
Number of Monte-Carlo simulations per effect size. |
200
|
alpha
|
float
|
Significance level for the placebo test. |
0.05
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: The |
Notes
The null distribution is the set of RMSPE ratios from in-space placebos (computed once on the original data). For each effect size, the simulation adds delta to the treated unit's post-treatment outcomes and re-computes the RMSPE ratio. A small noise perturbation (10 % of pre-treatment residual SD) is added so that each simulation draw is unique.
This is a novel diagnostic — no existing SCM package provides an equivalent power-planning tool.
Examples:
>>> import statspai as sp
>>> power_df = sp.synth_power(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... n_simulations=500, seed=42,
... )
>>> power_df
effect_size power n_rejections n_simulations mde_flag
0 0.000000 0.04 20 500 False
1 1.234567 0.23 115 500 False
...
8 9.876543 0.82 410 500 True
9 11.111111 0.95 475 500 False
>>> mde_row = power_df[power_df['mde_flag']]
>>> print(f"MDE = {mde_row['effect_size'].values[0]:.2f}")
See Also
synth_mde : Quick MDE extraction. synth_power_plot : Visualise the power curve.
synth_mde ¶
synth_mde(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, power_target: float = 0.8, alpha: float = 0.05, n_simulations: int = 200, seed: Optional[int] = None) -> float
Minimum Detectable Effect for a Synthetic Control design.
Convenience wrapper around :func:synth_power that returns only
the MDE (the smallest effect size achieving the target power).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
power_target
|
float
|
Desired power level. |
0.80
|
alpha
|
float
|
Significance level for the placebo test. |
0.05
|
n_simulations
|
int
|
Number of simulations per effect size. |
200
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
float
|
Minimum detectable effect size. Returns |
Examples:
>>> import statspai as sp
>>> mde = sp.synth_mde(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989,
... seed=42,
... )
>>> print(f"MDE at 80%% power: {mde:.2f}")
See Also
synth_power : Full power curve with details.
synth_power_plot ¶
synth_power_plot(power_result: DataFrame, ax: Any = None, figsize: tuple = (9, 6), title: Optional[str] = None) -> Any
Plot the power curve from :func:synth_power.
Displays power (y-axis) against effect size (x-axis) with reference lines at power = 0.80 and the MDE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
power_result
|
DataFrame
|
Output of :func: |
required |
ax
|
Axes
|
Axes to plot on. If |
None
|
figsize
|
tuple
|
Figure size (width, height) in inches. |
(9, 6)
|
title
|
str
|
Custom plot title. Defaults to
|
None
|
Returns:
| Type | Description |
|---|---|
Axes
|
|
Examples:
>>> import statspai as sp
>>> power_df = sp.synth_power(
... df, outcome='gdp', unit='state', time='year',
... treated_unit='California', treatment_time=1989, seed=42,
... )
>>> sp.synth_power_plot(power_df)
See Also
synth_power : Compute the power curve.
synth_report ¶
synth_report(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', output: str = 'text', sensitivity: bool = True, alpha: float = 0.05, **kwargs) -> str
Generate a comprehensive Synthetic Control analysis report.
Runs synth() for the main estimation and optionally
synth_sensitivity() for robustness diagnostics, then formats
everything into a structured report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
None
|
treatment_time
|
any
|
First treatment period (inclusive). |
None
|
method
|
str
|
SCM variant passed to |
'classic'
|
output
|
str
|
Output format: |
'text'
|
sensitivity
|
bool
|
Whether to include the sensitivity analysis section. |
True
|
alpha
|
float
|
Significance level for CIs and hypothesis tests. |
0.05
|
**kwargs
|
Additional keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
Formatted analysis report. |
Examples:
synth_report_to_file ¶
synth_report_to_file(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', output: str = 'markdown', sensitivity: bool = True, alpha: float = 0.05, filename: str = 'report.md', **kwargs) -> str
Generate an SCM report and write it directly to a file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel data. |
required |
outcome
|
str
|
Outcome variable name. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time period column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
None
|
treatment_time
|
any
|
First treatment period (inclusive). |
None
|
method
|
str
|
SCM variant passed to |
'classic'
|
output
|
str
|
Output format: |
'markdown'
|
sensitivity
|
bool
|
Whether to include the sensitivity analysis section. |
True
|
alpha
|
float
|
Significance level. |
0.05
|
filename
|
str
|
Output file path. |
'report.md'
|
**kwargs
|
Additional keyword arguments forwarded to |
{}
|
Returns:
| Type | Description |
|---|---|
str
|
The generated report string (also written to filename). |
Examples:
synth_to_latex ¶
synth_to_latex(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], *, caption: Optional[str] = None, label: Optional[str] = None, booktabs: bool = True, show_ci: bool = True, show_weights: bool = False, top_n_weights: int = 5, digits: int = 4, method_names: Optional[Sequence[str]] = None) -> str
Formatted LaTeX table for synthetic-control results.
Single-result mode produces a vertical table with ATT, SE,
confidence interval, pre-RMSPE, fit quality, and (optionally) the
top-N donor weights. Comparison mode (SynthComparison or list
of results) produces a wide table with one column per method, the
standard textbook layout for empirical applied work.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
CausalResult, SynthComparison, or list of CausalResult
|
Object to render. |
required |
caption
|
str
|
Table caption. Defaults to a sensible auto-generated string. |
None
|
label
|
str
|
LaTeX label for cross-referencing. Defaults to
|
None
|
booktabs
|
bool
|
If True, use |
True
|
show_ci
|
bool
|
Include the confidence-interval row. |
True
|
show_weights
|
bool
|
Append a panel listing the top-N donor weights. |
False
|
top_n_weights
|
int
|
How many donors to show per method when |
5
|
digits
|
int
|
Number of decimal places. |
4
|
method_names
|
list of str
|
Override column labels in comparison mode. |
None
|
Returns:
| Type | Description |
|---|---|
str
|
LaTeX source ready to drop into a paper. Stars use the
standard |
Examples:
>>> result = sp.synth(df, ..., method='augmented')
>>> print(sp.synth_to_latex(result, show_weights=True))
Multi-method comparison:
synth_to_markdown ¶
synth_to_markdown(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], *, title: Optional[str] = None, show_ci: bool = True, show_weights: bool = False, top_n_weights: int = 5, digits: int = 4, method_names: Optional[Sequence[str]] = None) -> str
GitHub-flavoured Markdown table for synthetic-control results.
Mirrors :func:synth_to_latex in scope but emits a pipe-delimited
Markdown table that renders cleanly on GitHub, in pandoc, and in
most static-site generators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
title
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
show_ci
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
show_weights
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
top_n_weights
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
digits
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
method_names
|
Union[CausalResult, 'SynthComparison', List[CausalResult]]
|
See :func: |
required |
Returns:
| Type | Description |
|---|---|
str
|
Markdown source. |
synth_to_excel ¶
synth_to_excel(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], path: str, *, method_names: Optional[Sequence[str]] = None, digits: int = 6) -> str
Multi-sheet Excel workbook for synthetic-control results.
Sheets
"Summary"— one row per method (ATT, SE, CI, pre-RMSPE, fit quality, donor counts)."Weights"— donor weights per method (one column per method; missing donors are NaN)."Gap_<method>"— per-period treated / synthetic / gap for each method."Diagnostics"— scalar diagnostics (pre-RMSPE, post/pre RMSPE ratio, fit quality, n_donors, etc.).
Requires openpyxl (already a soft dependency of pandas
Excel I/O). Will raise ModuleNotFoundError with an actionable
hint if it is not installed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
obj
|
CausalResult, SynthComparison, or list of CausalResult
|
Object to export. |
required |
path
|
str
|
Destination |
required |
method_names
|
list of str
|
Override sheet / column labels. |
None
|
digits
|
int
|
Rounding for floating-point values. |
6
|
Returns:
| Type | Description |
|---|---|
str
|
Absolute path of the file that was written. |
synth_loo ¶
synth_loo(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, alpha: float = 0.05) -> DataFrame
Leave-one-out donor sensitivity for Synthetic Control.
Re-fits SCM dropping each donor in turn. Identifies influential donors whose removal shifts the ATT substantially.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel. |
required |
outcome
|
str
|
Outcome variable. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
penalization
|
float
|
Ridge penalty forwarded to SCM. |
0.0
|
alpha
|
float
|
Significance level for z-based p-values. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
synth_time_placebo ¶
synth_time_placebo(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, n_placebo_times: Optional[int] = None, alpha: float = 0.05) -> DataFrame
Time-placebo ("backdating") test for Synthetic Control.
Re-fits SCM using fake treatment times drawn from the pre-treatment period. If the method finds large "effects" where none should exist, the original estimate is suspect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel. |
required |
outcome
|
str
|
Outcome variable. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
Real first treatment period. |
required |
penalization
|
float
|
Ridge penalty forwarded to SCM. |
0.0
|
n_placebo_times
|
int
|
Max number of placebo treatment times to try. Default is all feasible pre-treatment times (leaving >= 2 pre-periods for each placebo fit). |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
synth_donor_sensitivity ¶
synth_donor_sensitivity(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, k: Optional[int] = None, n_samples: int = 100, penalization: float = 0.0, seed: Optional[int] = None) -> DataFrame
Donor-pool bootstrap sensitivity for Synthetic Control.
Draws n_samples random subsets of size k from the donor
pool and re-fits SCM for each, producing a distribution of ATT
estimates.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel. |
required |
outcome
|
str
|
Outcome variable. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
k
|
int
|
Donor subset size. Default is |
None
|
n_samples
|
int
|
Number of random donor subsets to draw. |
100
|
penalization
|
float
|
Ridge penalty forwarded to SCM. |
0.0
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
synth_rmspe_filter ¶
synth_rmspe_filter(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, thresholds: Optional[List[float]] = None, penalization: float = 0.0) -> DataFrame
Pre-RMSPE-filtered p-value robustness (Abadie et al. 2010).
Runs placebo SCM on every donor unit, computes each unit's pre-treatment RMSPE, then re-calculates the rank-based p-value after dropping placebos whose pre-RMSPE exceeds a multiple of the treated unit's pre-RMSPE.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel. |
required |
outcome
|
str
|
Outcome variable. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
thresholds
|
list of float
|
Multiples of treated-unit pre-RMSPE used as cut-offs.
Default |
None
|
penalization
|
float
|
Ridge penalty. |
0.0
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
synth_sensitivity ¶
synth_sensitivity(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, n_donor_samples: int = 100, seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]
Run all SCM sensitivity diagnostics in a single call.
Combines leave-one-out, time placebos, donor pool bootstrap, and pre-RMSPE filtering into one bundled report.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel. |
required |
outcome
|
str
|
Outcome variable. |
required |
unit
|
str
|
Unit identifier column. |
required |
time
|
str
|
Time column. |
required |
treated_unit
|
any
|
Identifier of the treated unit. |
required |
treatment_time
|
any
|
First treatment period. |
required |
penalization
|
float
|
Ridge penalty. |
0.0
|
n_donor_samples
|
int
|
Number of random donor subsets for donor sensitivity. |
100
|
seed
|
int
|
Random seed. |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
|
Examples:
synth_sensitivity_plot ¶
synth_sensitivity_plot(sensitivity_result: Dict[str, Any], figsize: Tuple[float, float] = (14, 10), title: Optional[str] = None) -> Any
Multi-panel sensitivity diagnostic plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sensitivity_result
|
dict
|
Output from :func: |
required |
figsize
|
tuple
|
Figure size in inches. |
(14, 10)
|
title
|
str
|
Super-title for the figure. |
None
|
Returns:
| Type | Description |
|---|---|
Figure
|
|
Examples:
synth_survival ¶
synth_survival(data: DataFrame, unit: str, time: str, survival: str, treated: str, treat_time: float, alpha: float = 0.05, n_placebos: int = 100, seed: int = 0) -> SyntheticSurvivalResult
Synthetic Survival Control estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long panel: one row per (unit, time) with a precomputed Kaplan-Meier
survival probability in column |
required |
unit
|
str
|
Unit (panel-id) column. |
required |
time
|
str
|
Time grid column. |
required |
survival
|
str
|
Column containing the survival probability :math: |
required |
treated
|
str
|
Column containing the name of the single treated unit. Accepts either a boolean column or a dedicated string/int identifier. |
required |
treat_time
|
float
|
Time at which treatment starts (times >= |
required |
alpha
|
float
|
Uniform placebo CI level. |
0.05
|
n_placebos
|
int
|
Number of placebo permutations used to bootstrap the uniform band. |
100
|
seed
|
int
|
|
0
|
Returns:
| Type | Description |
|---|---|
SyntheticSurvivalResult
|
Fitted counterfactual survival curve, gap trajectory, donor weights, and a placebo-based uniform confidence band. |
Examples:
synth_experimental_design ¶
synth_experimental_design(data: DataFrame, *, unit: str, time: str, outcome: str, k: int, candidates: Optional[Sequence[Any]] = None, donors: Optional[Sequence[Any]] = None, pre_period: Optional[Tuple[Any, Any]] = None, risk: str = 'mspe', concentration_weight: float = 0.0, penalization: float = 0.0, n_random: int = 500, random_state: Optional[int] = None) -> SynthExperimentalDesignResult
Pick k treated units to minimize the expected SC post-ATT variance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame (long format)
|
Must contain columns |
required |
unit
|
str
|
Column names for the panel. |
required |
time
|
str
|
Column names for the panel. |
required |
outcome
|
str
|
Column names for the panel. |
required |
k
|
int
|
Number of units to select for treatment. Must satisfy
|
required |
candidates
|
sequence
|
Units eligible for treatment. Defaults to all units. |
None
|
donors
|
sequence
|
Units available as donors. Defaults to "all units NOT in
|
None
|
pre_period
|
(start, end)
|
Closed interval of pre-treatment periods. Defaults to all
timestamps in |
None
|
risk
|
('mspe', 'rmse')
|
Loss functional for ranking candidates. |
'mspe'
|
concentration_weight
|
float
|
Penalty on donor-weight concentration (Herfindahl):
|
0.0
|
penalization
|
float
|
Ridge penalty passed to the simplex solver (Doudchenko & Imbens 2016 style). |
0.0
|
n_random
|
int
|
Monte-Carlo draws used to estimate |
500
|
random_state
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
SynthExperimentalDesignResult
|
|
Notes
The practical recipe (Abadie-Zhao 2025/2026, Section 4) is:
- For each candidate unit
i, solve the simplex SC problem against the donor pool restricted to non-candidates (to avoid coupling risk scores across candidates). - Record the pre-period MSPE as the plug-in estimate of
sigma^2_i. - Pick the
kcandidates with the smallestrisk_score:loss_i + lambda * H(w_i).
The implementation degrades gracefully when candidates covers all
units: we then use per-candidate leave-one-out donor pools.
Examples:
synthdid_estimate ¶
R-style alias: synthdid::synthdid_estimate.
sc_estimate ¶
R-style alias: synthdid::sc_estimate.
did_estimate ¶
R-style alias: synthdid::did_estimate.
synthdid_placebo ¶
synthdid_placebo(data: DataFrame, y: str, unit: str, time: str, treat_unit: Any, treat_time: Any, method: Literal['sdid', 'sc', 'did'] = 'sdid', **kw) -> DataFrame
Run placebo estimates assigning treatment to each control unit.
Replicates synthdid::synthdid_placebo.
Accepts the same arguments as :func:sdid, plus any extra keyword
arguments.
Returns:
| Type | Description |
|---|---|
DataFrame
|
One row per control unit with columns:
|
synthdid_plot ¶
synthdid_plot(result: CausalResult, ax=None, figsize: tuple = (10, 6), treated_color: str = '#2C3E50', synth_color: str = '#E74C3C', ci_alpha: float = 0.15, title: Optional[str] = None)
Plot observed vs synthetic trajectory.
Replicates synthdid::plot.synthdid_estimate.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Output of :func: |
required |
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(10, 6)
|
treated_color
|
str
|
|
'#2C3E50'
|
synth_color
|
str
|
|
'#2C3E50'
|
ci_alpha
|
float
|
|
0.15
|
title
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
synthdid_units_plot ¶
Horizontal bar chart of unit weight contributions.
Replicates synthdid::synthdid_units_plot.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
|
required |
top_n
|
int
|
Show the top-N donors by weight. |
10
|
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(8, 5)
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
synthdid_rmse_plot ¶
Pre-treatment RMSE of treated vs synthetic trajectory.
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
california_prop99 ¶
California Proposition 99 tobacco control dataset.
Returns a balanced panel of per-capita cigarette sales for 39 US states, 1970-2000. California implemented Proposition 99 in 1989.
This is the canonical synthdid example dataset.
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
german_reunification ¶
German reunification dataset (simulated).
Returns a balanced panel of GDP per capita for 17 OECD countries, 1960--2003. West Germany is the treated unit; treatment begins in 1990 (reunification).
The simulated trajectories reproduce the key stylised facts: Luxembourg has the highest GDP per capita (~40 000), Portugal the lowest (~10 000), and all countries share a common upward growth trend. Post-1990, West Germany exhibits an approximately 1 500 GDP-per-capita decline relative to its synthetic counterfactual.
References
Abadie, A., Diamond, A. & Hainmueller, J. (2015). "Comparative Politics and the Synthetic Control Method." American Journal of Political Science, 59(2), 495--510. [@abadie2015comparative]
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
basque_terrorism ¶
Basque Country terrorism dataset (simulated).
Returns a balanced panel of GDP per capita (thousands of 1986 USD) for 17 Spanish regions, 1955--1997. The Basque Country is the treated unit; treatment begins in 1970 (onset of ETA terrorism).
The simulated data reproduce the gradual widening of an approximately 10 % GDP gap between the Basque Country and its synthetic counterfactual after 1970.
References
Abadie, A. & Gardeazabal, J. (2003). "The Economic Costs of Conflict: A Case Study of the Basque Country." American Economic Review, 93(1), 113--132. [@abadie2003economic]
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Examples:
california_tobacco ¶
California Proposition 99 tobacco dataset (simulated, extended).
Returns a balanced panel of per-capita cigarette sales and covariates for 39 US states, 1970--2000. California is the treated unit; treatment begins in 1989 (Proposition 99).
This dataset extends the simpler california_prop99() panel with
additional covariates (retail price, log income, youth population
share, beer consumption), enabling covariate-matching SCM analyses.
References
Abadie, A., Diamond, A. & Hainmueller, J. (2010). "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." Journal of the American Statistical Association, 105(490), 493--505. [@abadie2010synthetic]
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Notes
cigsale: per-capita cigarette sales (packs).retprice: average retail price per pack (cents, real).lnincome: log of real per-capita personal income.age15to24: share of population aged 15--24 (percent).beer: per-capita beer consumption (gallons).
Examples: