`statspai.synth`¶

synth ¶

Synthetic Control module for StatsPAI.

Unified entry point: synth(method=...) dispatches to all variants.

Variants (20 methods)

classic — Abadie, Diamond & Hainmueller (2010)
penalized / ridge — Ridge-penalised SCM
demeaned / detrended — Ferman & Pinto (2021)
unconstrained / elastic_net — Doudchenko & Imbens (2016)
augmented / ascm — Ben-Michael, Feller & Rothstein (2021)
sdid — Arkhangelsky, Athey, Hirshberg, Imbens & Wager (2021)
factor / gsynth — Xu (2017)
staggered — Ben-Michael, Feller & Rothstein (2022)
mc / matrix_completion — Athey, Bayati et al. (2021)
discos / distributional — Gunsilius (2023)
multi_outcome — Sun (2023)
scpi / prediction_interval — Cattaneo, Feng & Titiunik (2021)
bayesian — Bayesian SCM with MCMC posterior (Vives & Martinez 2024)
bsts / causal_impact — Bayesian Structural Time Series (Brodersen et al. 2015)
penscm / abadie_lhour — Penalized SCM with pairwise discrepancy (Abadie & L'Hour 2021)
fdid / forward_did — Forward DID with optimal donor selection (Li 2024)
cluster — Cluster SCM with donor grouping (Rho et al. 2025, arXiv:2503.21629) [@rho2025clustersc]
sparse / lasso — Sparse SCM with L1 penalties (Amjad, Shah & Shen 2018)
kernel / kernel_ridge — Kernel-based nonlinear SCM

Inference

placebo — in-space permutation (default)
conformal — Chernozhukov, Wüthrich & Zhu (2021)
bootstrap / jackknife — for SDID
prediction intervals — Cattaneo et al. (2021)
bayesian posterior — full posterior credible intervals (Bayesian SCM)
bsts posterior — Bayesian structural time series uncertainty

Diagnostics

synth_sensitivity() — comprehensive robustness suite
synth_loo() — leave-one-out donor analysis
synth_time_placebo() — backdating tests
synth_donor_sensitivity() — donor pool variation
synth_rmspe_filter() — pre-RMSPE robustness

SyntheticControl ¶

Canonical Synthetic Control estimator (Abadie, Diamond & Hainmueller 2010) with nested V-W optimization.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel `(unit, time, outcome, ...)`.	required
`outcome`	`str`	Column names.	required
`unit`	`str`	Column names.	required
`time`	`str`	Column names.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`covariates`	`list of str`	Column names whose pre-treatment means are used as predictors for the V-weighted matching problem.	`None`
`special_predictors`	`list of tuple`	R/Stata `Synth`-style predictor specifications. Each entry is `(column, period_spec, op)` where `period_spec` is a scalar year, a list of years, or a `slice(start, stop)` (inclusive), and `op` is `'mean'` or `'sum'`. When omitted together with `covariates`, the pre-treatment outcome vector itself is used as the predictor (V has no identifying power and is fixed to the identity, following Kaul et al. 2015).	`None`
`v_method`	`(auto, nested, equal)`	`'auto'` → nested V-W when covariates / special predictors are supplied, equal V otherwise. `'nested'` forces the outer V optimisation even when only Y lags are used (note: the outer problem is then under-identified, per Kaul et al. 2015). Equal V reduces to the outcome-only simplex LS estimator.	`'auto'`
`standardize_predictors`	`bool`	Rescale predictors to unit range before the V optimization.	`True`
`n_random_starts`	`int`	Additional random Dirichlet starts for the outer V optimiser.	`4`
`penalization`	`float`	Ridge penalty on donor weights.	`0.0`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> sc = sp.SyntheticControl(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
... )
>>> res = sc.fit(placebo=False)
>>> res.method
'Synthetic Control Method'
>>> bool(res.estimate < 0)  # cigarette sales fell after Prop 99
True

fit ¶

fit(placebo: bool = True) -> CausalResult

Fit the Synthetic Control model.

Parameters:

Name	Type	Description	Default
`placebo`	`bool`	Run in-space placebo tests across all donor units.	`True`

Returns:

Type	Description
`CausalResult`

SynthComparison ¶

Structured container for multi-method SCM comparison results.

Attributes:

Name	Type	Description
`results`	`dict`	Mapping of method name to `CausalResult`.
`comparison_table`	`DataFrame`	Side-by-side metrics for every successful method, sorted by `pre_rmspe` ascending.
`recommended`	`str`	Name of the recommended method.
`recommendation_reason`	`str`	Human-readable justification.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> comp = sp.synth_compare(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     methods=['classic', 'demeaned'], placebo=False,
... )
>>> isinstance(comp, sp.SynthComparison)
True
>>> bool(comp.recommended in comp.results)
True
>>> sorted(comp.comparison_table['method'])
['classic', 'demeaned']

summary ¶

summary() -> str

Return a formatted multi-line summary string.

Returns:

Type	Description
`str`

plot ¶

plot(**kwargs: Any) -> Any

Overlay all results using synthplot(type='compare').

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	Forwarded to `synthplot`.	`{}`

Returns:

Type	Description
`matplotlib Figure or Axes`

to_latex ¶

to_latex(**kwargs: Any) -> str

Render the comparison as a LaTeX table.

Forwards to :func:statspai.synth.exports.synth_to_latex with the side-by-side multi-method layout.

Parameters:

Name	Type	Description	Default
`**kwargs`	`Any`	See :func:`synth_to_latex` (e.g. `caption`, `label`, `show_weights`, `digits`).	`{}`

Returns:

Type	Description
`str`

to_markdown ¶

to_markdown(**kwargs: Any) -> str

Render the comparison as a Markdown table.

Forwards to :func:statspai.synth.exports.synth_to_markdown.

to_excel ¶

to_excel(path: str, **kwargs: Any) -> str

Write a multi-sheet Excel workbook covering all methods.

Forwards to :func:statspai.synth.exports.synth_to_excel. Returns the absolute path of the file written.

SequentialSDIDResult `dataclass` ¶

Bases: ResultProtocolMixin

Per-cohort and aggregated output of :func:sequential_sdid.

Dataclass container bundling the aggregate ATT / SE / CI with a per_cohort table of cohort-specific ATT(g). Exposes a formatted :meth:summary.

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> per_cohort = pd.DataFrame(
...     {"cohort": [5, 8], "att": [2.10, 1.90], "se": [0.30, 0.40]}
... )
>>> res = sp.SequentialSDIDResult(
...     aggregate_att=2.0, aggregate_se=0.25,
...     aggregate_ci=(1.51, 2.49), per_cohort=per_cohort,
... )
>>> len(res.per_cohort)
2
>>> summary_text = res.summary()

SyntheticSurvivalResult `dataclass` ¶

Bases: ResultProtocolMixin

Output of :func:synth_survival.

Exposes the fitted counterfactual survival curve (s_synth), the observed treated curve (s_treated), their gap trajectory (gap), the donor weights, and a placebo-based uniform band.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> months = np.arange(0, 13)
>>> rows = []
>>> for arm in ['treated_arm', 'ctrl_1', 'ctrl_2', 'ctrl_3']:
...     bump = 0.03 if arm == 'treated_arm' else 0.0
...     for m in months:
...         haz = 0.06 - bump * (m >= 6)
...         rows.append({'arm': arm, 'month': m,
...                      'km': float(np.exp(-haz * m)),
...                      'is_treated': arm == 'treated_arm'})
>>> panel = pd.DataFrame(rows)
>>> r = sp.synth_survival(
...     panel, unit='arm', time='month', survival='km',
...     treated='is_treated', treat_time=6, n_placebos=20, seed=0,
... )
>>> isinstance(r, sp.SyntheticSurvivalResult)
True
>>> r.treated_unit
'treated_arm'
>>> int(len(r.time_grid))
13

SynthExperimentalDesignResult `dataclass` ¶

Bases: ResultProtocolMixin

Structured output of :func:synth_experimental_design.

Attributes:

Name	Type	Description
`selected`	`list of unit ids`	The `k` units recommended for treatment.
`ranking`	`DataFrame`	All candidates with columns `[unit, pre_mspe, pre_rmse, effective_donors, risk_score, selected]` sorted by `risk_score` ascending (best first).
`weights`	`dict[unit_id, ndarray]`	Leave-one-out SC weight vectors (aligned to `donor_units`) — useful for the post-experiment analysis and for diagnostics.
`donor_units`	`list`	The donor pool that each candidate was matched against (candidates excluded from each other's donor pool by default).
`expected_variance`	`float`	Sum of pre-period MSPEs over `selected` — proxy for the post-experiment ATT-variance under Abadie-Zhao 2025/2026 Eq. (3).
`baseline_variance`	`float`	Same quantity for a random-`k` assignment (average over `n_random` draws); the gain is `baseline_variance - expected_variance`.
`method`	`str`	Always `'abadie_zhao_2025'`.
`diagnostics`	`dict`	Extra metadata (n_units, pre_periods, solver, etc.).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> df = sp.utils.dgp_synth(n_units=40, n_periods=20, seed=0)
>>> res = sp.synth_experimental_design(
...     df, unit="unit", time="time", outcome="y",
...     k=5, pre_period=(0, 19), random_state=0,
... )
>>> len(res.selected)
5

synth ¶

synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', covariates: Optional[List[str]] = None, penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05, inference: Optional[str] = None, treatment: Optional[str] = None, **kwargs: Any) -> CausalResult

Public sp.synth entry point — see _dispatch_synth_impl for the full docstring on methods and parameters.

Thin wrapper around the multi-branch dispatcher that attaches a :class:Provenance record to the returned result so downstream replication_pack / Quarto appendix / table footers can pick up the call (function name, args, data hash) without each individual SCM backend having to opt in. The 20-method dispatcher itself lives in :func:_dispatch_synth_impl.

References

Abadie, A., Diamond, A. and Hainmueller, J. (2010). Synthetic control methods for comparative case studies. Journal of the American Statistical Association. [@abadie2010synthetic]

Examples:

Classic Abadie-Diamond-Hainmueller SCM on the bundled Proposition 99 panel (39 states x 31 years):

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.synth(df, outcome='packspercapita', unit='state',
...                   time='year', treated_unit='California',
...                   treatment_time=1989, method='classic')
>>> round(result.estimate, 1)  # post-1989 ATT, packs per capita
-18.2

Switch estimator via method= (same call signature):

>>> result = sp.synth(df, outcome='packspercapita', unit='state',
...                   time='year', treated_unit='California',
...                   treatment_time=1989, method='sdid')

To fit several SCM variants at once and get a recommendation, use :func:sp.synth_compare with the same arguments.

synthplot ¶

synthplot(result: Union[CausalResult, List[CausalResult]], type: str = 'trajectory', ax: Any = None, figsize: Optional[tuple[float, float]] = None, title: Optional[str] = None, top_n: int = 15, labels: Optional[List[str]] = None, **kwargs: Any) -> tuple[Any, Any]

Unified plot function for all Synthetic Control variants.

Automatically detects the SCM variant and renders the appropriate visualisation. Works with results from synth(method=...), sdid(), augsynth(), gsynth(), staggered_synth(), conformal_synth(), and all other variants.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult or list of CausalResult`	Output of any `synth()` variant. Pass a list for `type='compare'`.	required
`type`	`str`	Plot type: `'trajectory'` — treated vs synthetic over time. `'gap'` — effect (gap) over time. `'both'` — two-panel: trajectory + gap. `'weights'` — donor weight bar chart. `'placebo'` — placebo ATT distribution. `'placebo_gap'` — placebo gap spaghetti plot (Abadie et al. 2010). `'rmspe'` — post/pre RMSPE ratio histogram (Abadie et al. 2010). `'conformal'` — period-level effects + conformal CIs. `'staggered'` — cohort-level ATT comparison. `'factors'` — latent factor loadings (gsynth only). `'compare'` — overlay multiple results.	`'trajectory'`
`ax`	`matplotlib Axes`	Pre-existing axes for single-panel plots.	`None`
`figsize`	`tuple`	Figure size. Auto-selected if None.	`None`
`title`	`str`	Override the auto-generated title.	`None`
`top_n`	`int`	Number of donors to show in weight plots.	`15`
`labels`	`list of str`	Labels for `type='compare'`.	`None`
`**kwargs`	`Any`	Additional arguments passed to individual plotters. Notable: `pre_band=True` — for `type='trajectory'` / `'gap'` / `'both'`: overlay a ±1.96 × pre-RMSPE noise envelope. `pi_band=True` — for `type='trajectory'`: overlay the prediction-interval / conformal CI ribbon around the synthetic counterfactual when the result carries one (`sp.scpi` / `sp.conformal_synth`).	`{}`

Returns:

Type	Description
`(fig, ax) or (fig, axes)`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> result = sp.synth(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7, method="classic",
...     placebo=True,
... )
>>> fig, ax = sp.synthplot(result)                  # trajectory
>>> fig, ax = sp.synthplot(result, type='gap')      # gap plot
>>> fig, axes = sp.synthplot(result, type='both')   # two-panel
>>> fig, ax = sp.synthplot(result, type='weights')  # donor weights
>>> fig, ax = sp.synthplot(result, type='placebo')  # placebo dist

Compare methods:

>>> r2 = sp.synth(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7, method="demeaned",
...     placebo=False,
... )
>>> fig, ax = sp.synthplot(
...     [result, r2], type='compare',
...     labels=['Classic', 'De-meaned'],
... )

demeaned_synth ¶

demeaned_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, variant: Literal['demeaned', 'detrended'] = 'demeaned', penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult

De-meaned / De-trended Synthetic Control Method.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`covariates`	`list of str`	Additional covariates to match on.	`None`
`variant`	`('demeaned', 'detrended')`	`'demeaned'` — subtract unit-level pre-treatment means. `'detrended'` — subtract unit-level linear time trends.	`'demeaned'`
`penalization`	`float`	Ridge penalty on weights.	`0.0`
`placebo`	`bool`	Run in-space placebo inference.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`

Examples:

De-meaned synthetic control on the Proposition 99 tobacco panel, with California treated from 1989:

>>> import statspai as sp
>>> import numpy as np
>>> df = sp.california_prop99()
>>> result = sp.demeaned_synth(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
... )
>>> bool(np.isfinite(result.estimate))
True

References

ferman2021synthetic, doudchenko2016balancing

robust_synth ¶

robust_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, variant: Literal['unconstrained', 'elastic_net', 'penalized'] = 'unconstrained', l1_penalty: float = 0.0, l2_penalty: float = 0.01, intercept: bool = True, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Robust / unconstrained Synthetic Control.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`covariates`	`list of str`	Additional covariates to match on.	`None`
`variant`	`('unconstrained', 'elastic_net', 'penalized')`	`'unconstrained'` — no sign / sum constraints; optional intercept. `'elastic_net'` — L1 + L2 penalty, no sign constraints. `'penalized'` — classic SCM constraints + elastic-net penalty.	`'unconstrained'`
`l1_penalty`	`float`	Lasso (L1) penalty strength.	`0.0`
`l2_penalty`	`float`	Ridge (L2) penalty strength.	`0.01`
`intercept`	`bool`	Fit an intercept (level shift). Only for unconstrained / elastic_net.	`True`
`placebo`	`bool`	Run in-space placebo inference.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.robust_synth(df, outcome='packspercapita', unit='state',
...     time='year', treated_unit='California', treatment_time=1989,
...     variant='unconstrained')
>>> bool(result.estimate < 0)  # Prop 99 reduced cigarette sales
True

staggered_synth ¶

staggered_synth(data: DataFrame, outcome: str, unit: str, time: str, treatment: str, method: Literal['separate', 'pooled'] = 'separate', penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Staggered Adoption Synthetic Control.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treatment`	`str`	Binary treatment indicator (0/1). Units transition from 0 to 1 at their respective adoption times.	required
`method`	`('separate', 'pooled')`	`'separate'` — fit a separate SCM for each treated unit. `'pooled'` — partially pool weights across cohorts with the same adoption time.	`'separate'`
`penalization`	`float`	Ridge penalty on donor weights.	`0.0`
`placebo`	`bool`	Run placebo inference.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`	With `model_info` containing per-unit and per-cohort effects.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=30, n_periods=10, staggered=True, seed=0)
>>> result = sp.staggered_synth(
...     df, outcome='y', unit='unit', time='time',
...     treatment='treated', placebo=False,
... )
>>> bool(result.estimate is not None)
True
>>> _ = result.summary()

conformal_synth ¶

conformal_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, scm_method: str = 'classic', grid_size: int = 101, grid_range: Optional[Tuple[float, float]] = None, alpha: float = 0.05, penalization: float = 0.0) -> CausalResult

Conformal inference for synthetic control.

Constructs valid confidence intervals by inverting a sequence of conformal tests, one for each hypothesised treatment effect.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`scm_method`	`str`	Which SCM variant to use for weight estimation. Currently supports 'classic' (constrained) and 'ridge'.	`'classic'`
`grid_size`	`int`	Number of points in the hypothesis grid for CI inversion.	`101`
`grid_range`	`tuple of (float, float)`	(min, max) of the hypothesis grid. If None, auto-determined from pre-treatment residual scale.	`None`
`alpha`	`float`	Significance level.	`0.05`
`penalization`	`float`	Ridge penalty (used when scm_method='ridge').	`0.0`

Returns:

Type	Description
`CausalResult`	With `model_info` containing per-period p-values, conformal confidence sets, and the full test inversion grid.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()  # state, year, packspercapita, treated
>>> result = sp.conformal_synth(df, outcome='packspercapita',
...     unit='state', time='year', treated_unit='California',
...     treatment_time=1989)
>>> bool(result.ci[0] <= result.estimate <= result.ci[1])
True
>>> print(result.summary())

scest ¶

scest(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, w_constr: str = 'simplex', lasso_lambda: float = 1.0, ridge_lambda: float = 1.0) -> Dict[str, Any]

Estimate synthetic control weights.

Solves the constrained optimisation problem to find donor weights that best reproduce the treated unit's pre-treatment outcomes. Mirrors the R package's scest() function.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable column name.	required
`unit`	`str`	Unit identifier column name.	required
`time`	`str`	Time period column name.	required
`treated_unit`	`scalar`	Identifier of the treated unit.	required
`treatment_time`	`scalar`	First treatment period.	required
`w_constr`	`str`	Weight constraint: `'simplex'` : w >= 0, sum(w) = 1 `'lasso'` : L1-penalised (allows negative, non-summing) `'ridge'` : L2-penalised `'ols'` : ordinary least squares (unconstrained) `'ls'` : least squares (same as 'ols')	`'simplex'`
`lasso_lambda`	`float`	L1 penalty (used when `w_constr='lasso'`).	`1.0`
`ridge_lambda`	`float`	L2 penalty (used when `w_constr='ridge'`).	`1.0`

Returns:

Type Description

dict

Keys:

weights : np.ndarray (J,) of estimated donor weights
w_constr : echo of constraint type
Y_synth_pre : synthetic unit pre-treatment outcomes
Y_synth_post : synthetic unit post-treatment outcomes
residuals_pre : pre-treatment fit residuals
effects : post-treatment gaps (treated - synthetic)
pre_rmspe : root mean squared prediction error (pre)
donor_names : donor labels
sc_data : the prepared data dict from scdata

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()  # cols: state, year, packspercapita
>>> est = sp.scest(df, outcome='packspercapita', unit='state',
...     time='year', treated_unit='California', treatment_time=1989)
>>> bool(est['pre_rmspe'] >= 0)  # pre-treatment fit RMSPE
True
>>> est['Y_synth_post'].shape  # 12 post-treatment years (1989-2000)
(12,)

scdata ¶

scdata(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any) -> Dict[str, Any]

Prepare data matrices for synthetic control estimation.

Reshapes a long-format panel into the matrices needed by scest and scpi. Mirrors the R package's scdata() function.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable column name.	required
`unit`	`str`	Unit identifier column name.	required
`time`	`str`	Time period column name.	required
`treated_unit`	`scalar`	Identifier of the treated unit.	required
`treatment_time`	`scalar`	First treatment period.	required

Returns:

Type Description

dict

Keys:

Y_pre : treated unit pre-treatment outcomes (T0,)
Y_post : treated unit post-treatment outcomes (T1,)
Y_donors_pre : donor pre-treatment matrix (T0, J)
Y_donors_post : donor post-treatment matrix (T1, J)
donor_names : list of donor unit labels
pre_times : array of pre-treatment time values
post_times : array of post-treatment time values
times : full array of time values
treated_unit : echo of the treated unit label
treatment_time: echo of the first treatment period

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()  # cols: state, year, packspercapita
>>> prepared = sp.scdata(df, outcome='packspercapita', unit='state',
...     time='year', treated_unit='California', treatment_time=1989)
>>> prepared['Y_pre'].shape  # 19 pre-treatment years (1970-1988)
(19,)

mc_synth ¶

mc_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, lambda_reg: Optional[float] = None, max_iter: int = 500, tol: float = 1e-06, cv_folds: int = 5, alpha: float = 0.05, placebo: bool = True, seed: Optional[int] = None) -> CausalResult

Matrix Completion Synthetic Control Method.

Imputes the treated unit's post-treatment counterfactual by solving a nuclear-norm-penalised matrix completion problem on the full panel, following Athey et al. (2021).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`covariates`	`list of str`	Time-varying covariates to partial out before matrix completion.	`None`
`lambda_reg`	`float`	Nuclear norm penalty. If `None` (default), selected automatically via cross-validation on observed entries.	`None`
`max_iter`	`int`	Maximum Soft-Impute iterations.	`500`
`tol`	`float`	Convergence tolerance (relative change in Frobenius norm).	`1e-6`
`cv_folds`	`int`	Number of CV folds for automatic lambda selection.	`5`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`placebo`	`bool`	Run placebo (permutation) inference by treating each control unit as if it were treated.	`True`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`CausalResult`	With `.estimate` equal to the average post-treatment effect (ATT), period-level effects in `detail`, and full diagnostics in `model_info`.

Notes

The algorithm uses the Soft-Impute / Singular Value Thresholding (SVT) procedure. At each iteration the current completion is projected onto observed entries, combined with the previous imputation at missing entries, then rank-reduced by soft-thresholding the singular values.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.mc_synth(df, outcome='packspercapita', unit='state',
...                      time='year', treated_unit='California',
...                      treatment_time=1989, placebo=False, seed=0)
>>> print(result.summary())

multi_outcome_synth ¶

multi_outcome_synth(data: DataFrame, outcomes: List[str], unit: str, time: str, treated_unit: Any, treatment_time: Any, method: str = 'concatenated', standardize: bool = True, penalization: float = 0.0, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Multiple Outcomes Synthetic Control Method (Sun 2023).

Finds a single set of donor weights that simultaneously matches the treated unit across all K outcomes in the pre-treatment period.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data containing all outcome columns.	required
`outcomes`	`list of str`	Column names for the K outcome variables.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Value identifying the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`method`	`('concatenated', 'averaged')`	Weight-estimation strategy. `'concatenated'` -- stack all K standardised outcome panels vertically and solve one quadratic programme. `'averaged'` -- standardise each outcome, average across K, then solve SCM on the mean series.	`'concatenated'`
`standardize`	`bool`	Standardise each outcome to zero mean / unit variance before stacking or averaging (strongly recommended when outcome scales differ).	`True`
`penalization`	`float`	Ridge-type penalty added to the diagonal of the donor cross-product matrix (`penalization * I`). Helps when donors are collinear.	`0.0`
`placebo`	`bool`	Run in-space placebo permutations for inference (each donor is pretended to be treated in turn).	`True`
`alpha`	`float`	Significance level for confidence intervals and joint test.	`0.05`

Returns:

Type Description

CausalResult

Unified result object with:

estimate : average treatment effect across outcomes (mean of per-outcome ATTs).
model_info['per_outcome_effects'] : DataFrame with columns outcome, att, se, pvalue.
model_info['weights'] : dict mapping donor names to shared SCM weights.
model_info['gap_tables'] : dict of DataFrames (one per outcome) with time-level gaps.
model_info['joint_pvalue'] : joint p-value across all K outcomes (Fisher combination of placebo p-values).
model_info['Y_synth'] : dict mapping outcome name to full synthetic series.
model_info['Y_treated'] : dict mapping outcome name to observed treated series.
model_info['times'] : sorted list of all time periods.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(123)
>>> n_units, n_periods, t0 = 11, 20, 11
>>> alphas = rng.normal(10, 2, (n_units, 3))
>>> betas = rng.normal(0.5, 0.1, (n_units, 3))
>>> records = []
>>> for i in range(n_units):
...     for t in range(1, n_periods + 1):
...         row = {'unit': f'unit_{i}', 'time': t}
...         for k, name in enumerate(['gdp', 'employment', 'investment']):
...             y = alphas[i, k] + betas[i, k] * t + rng.normal(0, 0.3)
...             if i == 0 and t >= t0:        # unit_0 treated from t0
...                 y += [5.0, 3.0, 0.0][k]
...             row[name] = y
...         records.append(row)
>>> df = pd.DataFrame(records)
>>> result = sp.multi_outcome_synth(
...     df,
...     outcomes=['gdp', 'employment', 'investment'],
...     unit='unit', time='time',
...     treated_unit='unit_0', treatment_time=11,
...     placebo=False,
... )
>>> summary_text = result.summary()
>>> effects = result.model_info['per_outcome_effects']

Notes

Sun (2023) shows that under a low-rank factor model the bias of the concatenated estimator shrinks as O(1/sqrt(K)), where K is the number of outcomes. The key requirement is that the outcomes share a common latent-factor structure.

qqsynth ¶

qqsynth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, n_quantiles: int = 100, placebo: bool = True, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult

Quantile Synthetic Control (alias for DiSCo with method='quantile').

Applies quantile-on-quantile regression to match quantile functions without the convexity constraints of the mixture approach.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`outcome`	`str`	Outcome variable column.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`n_quantiles`	`int`	Number of quantile grid points.	`100`
`placebo`	`bool`	Run placebo permutation inference.	`True`
`alpha`	`float`	Significance level.	`0.05`
`seed`	`int`	Random seed.	`None`

Returns:

Type	Description
`CausalResult`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.qqsynth(df, outcome='packspercapita', unit='state',
...                     time='year', treated_unit='California',
...                     treatment_time=1989)
>>> bool(result.estimate is not None)
True

See Also

discos : Full distributional synthetic controls with method selection.

References

gunsilius2023distributional

discos_test ¶

discos_test(result: CausalResult, test: str = 'ks') -> Dict[str, Any]

Test for distributional treatment effects.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output from `discos()` or `qqsynth()`.	required
`test`	`('ks', 'cvm', 'stochastic_dominance')`	`'ks'`: two-sample Kolmogorov-Smirnov test comparing treated and counterfactual quantile functions. `'cvm'`: Cramér-von Mises test statistic (permutation-based). `'stochastic_dominance'`: first-order stochastic dominance test.	`'ks'`

Returns:

Type	Description
`dict`	Keys: `'test'`, `'statistic'`, `'pvalue'`, `'reject'`, `'alpha'`, and test-specific fields.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.discos(df, outcome='packspercapita', unit='state',
...                    time='year', treated_unit='California',
...                    treatment_time=1989)
>>> out = sp.discos_test(result, test='ks')
>>> out['test']
'Kolmogorov-Smirnov'
>>> bool('pvalue' in out)
True

discos_plot ¶

discos_plot(result: CausalResult, type: str = 'quantile_effect', ax: Any = None, figsize: Tuple[int, int] = (10, 6), color: str = '#2C3E50', ci_alpha: float = 0.2, title: Optional[str] = None) -> Any

Visualise distributional synthetic control results.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output from `discos()` or `qqsynth()`.	required
`type`	`('quantile_effect', 'quantile_comparison', 'gap', 'weights')`	default 'quantile_effect' `'quantile_effect'`: treatment effect Δ(τ) across quantiles with CIs. `'quantile_comparison'`: overlay treated vs. counterfactual quantile functions. `'gap'`: gap plot (treated − synthetic) over time. `'weights'`: horizontal bar chart of donor weights.	`'quantile_effect'`
`ax`	`Axes`	Pre-existing axes for the plot.	`None`
`figsize`	`tuple`	Figure size.	`(10, 6)`
`color`	`str`	Primary plot colour.	`'#2C3E50'`
`ci_alpha`	`float`	Transparency for CI band.	`0.2`
`title`	`str`	Plot title override.	`None`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.discos(df, outcome='packspercapita', unit='state',
...                    time='year', treated_unit='California',
...                    treatment_time=1989)
>>> fig, ax = sp.discos_plot(result, type='quantile_effect')
>>> fig2, ax2 = sp.discos_plot(result, type='quantile_comparison')

stochastic_dominance ¶

stochastic_dominance(result: CausalResult, order: int = 1) -> Dict[str, Any]

Test for stochastic dominance of the treated distribution over the counterfactual distribution.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output from `discos()` or `qqsynth()`.	required
`order`	`(1, 2)`	Order of stochastic dominance. 1 = first-order (CDF dominance). 2 = second-order (integrated CDF dominance).	`1`

Returns:

Type	Description
`dict`	Keys: `'order'`, `'dominates'` (bool), `'min_gap'`, `'max_gap'`, `'fraction_positive'`, `'statistic'`, `'pvalue'`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> result = sp.discos(df, outcome='packspercapita', unit='state',
...                    time='year', treated_unit='California',
...                    treatment_time=1989)
>>> out = sp.stochastic_dominance(result, order=1)
>>> bool('dominates' in out)
True

bayesian_synth ¶

bayesian_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, n_iter: int = 2000, n_warmup: int = 1000, n_chains: int = 2, dirichlet_alpha: float = 1.0, seed: Optional[int] = None, alpha: float = 0.05) -> CausalResult

Bayesian Synthetic Control Method.

Estimates the ATT by placing a Dirichlet prior on donor weights and sampling from the posterior via Metropolis-Hastings MCMC. Returns full posterior credible intervals for the treatment effect.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format with columns for unit, time, and outcome.	required
`outcome`	`str`	Name of the outcome variable column.	required
`unit`	`str`	Name of the unit identifier column.	required
`time`	`str`	Name of the time period column.	required
`treated_unit`	`scalar`	Value in unit that identifies the treated unit.	required
`treatment_time`	`scalar`	First period of treatment (inclusive).	required
`covariates`	`list of str`	Additional pre-treatment predictors to include in the matching objective. Covariates are appended to the pre-treatment outcome series for each unit before fitting.	`None`
`n_iter`	`int`	Total MCMC iterations per chain (including warmup).	`2000`
`n_warmup`	`int`	Number of warmup (burn-in) iterations for adaptation. Must be strictly less than n_iter.	`1000`
`n_chains`	`int`	Number of independent MCMC chains. Multiple chains enable the R-hat convergence diagnostic.	`2`
`dirichlet_alpha`	`float`	Concentration parameter for the symmetric Dirichlet prior on donor weights. `alpha = 1` gives a uniform prior on the simplex; values < 1 encourage sparsity; values > 1 encourage more uniform weights.	`1.0`
`seed`	`int`	Random seed for reproducibility.	`None`
`alpha`	`float`	Significance level for credible intervals.	`0.05`

Returns:

Type	Description
`CausalResult`	With `.estimate` equal to the posterior mean ATT averaged over all post-treatment periods, `.ci` giving the equal-tailed credible interval, and rich diagnostics in `model_info`.

Raises:

Type	Description
`ValueError`	If the panel has fewer than 2 pre-treatment periods, no post-treatment periods, or no valid donor units.

Examples:

>>> result = sp.bayesian_synth(
...     df, outcome='gdp', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     n_iter=4000, n_warmup=2000, n_chains=4, seed=42,
... )
>>> print(result.summary())

Notes

The sampler uses a Dirichlet proposal on the simplex (re-normalised perturbation) with adaptive step-size tuning during warmup targeting an acceptance rate of ~0.35. Samples are thinned by a factor of 2 to reduce autocorrelation.

References

Vives, J. and Martinez, A. (2024). "Bayesian Synthetic Control Methods." Journal of Computational and Graphical Statistics.

causal_impact ¶

causal_impact(data: DataFrame, pre_period: Tuple[Any, Any], post_period: Tuple[Any, Any], outcome: Optional[str] = None, covariates: Optional[List[str]] = None, model: str = 'local_level', n_simulations: int = 1000, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult

Google CausalImpact-style causal inference for time series.

Fits a Bayesian structural time series model on the pre-intervention period and produces a counterfactual prediction for the post-period. The treatment effect is the difference between observed and counterfactual, with full posterior uncertainty.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Time-indexed DataFrame. If outcome is `None`, the first column is the outcome and all remaining columns are controls.	required
`pre_period`	`tuple of (start, end)`	Pre-intervention period boundaries (inclusive). Values are matched against the DataFrame index.	required
`post_period`	`tuple of (start, end)`	Post-intervention period boundaries (inclusive).	required
`outcome`	`str`	Column name of the outcome variable. If `None`, uses the first column.	`None`
`covariates`	`list of str`	Column names to use as controls. If `None`, all columns except the outcome are used.	`None`
`model`	``'local_level'`` or ``'local_linear_trend'``	State-space model type. `'local_level'` uses a random-walk latent state; `'local_linear_trend'` adds a stochastic slope.	`'local_level'`
`n_simulations`	`int`	Number of posterior draws for uncertainty quantification.	`1000`
`alpha`	`float`	Significance level for confidence/credible intervals.	`0.05`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`CausalResult`	Unified result object with: - `estimate` — average treatment effect on the treated (ATT) - `detail` — per-period effects DataFrame - `model_info` — counterfactual trajectories, cumulative effects, regression coefficients, posterior draws, and diagnostics

Notes

The implementation follows Brodersen et al. (2015). Regression coefficients are estimated via ridge regression with GCV-selected penalty (an empirical-Bayes analogue of the spike-and-slab prior). Hyperparameters (observation/level/slope noise) are estimated by maximum likelihood through the Kalman filter.

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> # Wide-format: columns = [outcome, control1, control2, ...]
>>> result = sp.synth.causal_impact(
...     data, pre_period=(1, 70), post_period=(71, 100)
... )
>>> print(result.summary())

References

Brodersen, K.H., Gallusser, F., Koehler, J., Remy, N. and Scott, S.L. (2015). "Inferring causal impact using Bayesian structural time series models." Annals of Applied Statistics, 9(1), 247-274. [@brodersen2015inferring]

bsts_synth ¶

bsts_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, model: str = 'local_level', n_simulations: int = 1000, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult

BSTS synthetic control with a panel-data interface.

Converts long-format panel data into the wide format expected by :func:causal_impact, using control-unit outcome series as covariates/regressors. This provides a CausalImpact-style analysis that integrates seamlessly with StatsPAI's other synthetic-control methods.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data with columns for unit, time, and outcome.	required
`outcome`	`str`	Outcome variable column name.	required
`unit`	`str`	Unit identifier column name.	required
`time`	`str`	Time period column name.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`covariates`	`list of str`	Additional time-varying covariates to include alongside control unit series. Each covariate is averaged across control units per time period and appended as an extra regressor.	`None`
`model`	``'local_level'`` or ``'local_linear_trend'``	State-space model type.	`'local_level'`
`n_simulations`	`int`	Number of posterior draws.	`1000`
`alpha`	`float`	Significance level.	`0.05`
`seed`	`int`	Random seed.	`None`

Returns:

Type	Description
`CausalResult`	Unified result object. `model_info` contains additional keys `treated_unit`, `treatment_time`, `donor_units`.

Examples:

>>> import statspai as sp
>>> result = sp.synth.bsts_synth(
...     data, outcome='gdp', unit='country', time='year',
...     treated_unit='West Germany', treatment_time=1990,
... )
>>> print(result.summary())

See Also

causal_impact : Wide-format CausalImpact interface.

penalized_synth ¶

penalized_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, covariates: Optional[List[str]] = None, lambda_pen: Optional[float] = None, penalty_type: str = 'pairwise', predictors: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Penalized Synthetic Control estimator (Abadie & L'Hour 2021).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data with columns for unit, time, outcome, and optionally covariates / predictors.	required
`outcome`	`str`	Name of the outcome column.	required
`unit`	`str`	Name of the unit identifier column.	required
`time`	`str`	Name of the time period column.	required
`treated_unit`	`Any`	Identifier of the treated unit.	required
`treatment_time`	`Any`	First treatment period (inclusive).	required
`covariates`	`list of str`	Covariate columns used only for the pairwise distance penalty. When `None` the pre-treatment outcome values are used as the covariate vector for distance computation.	`None`
`lambda_pen`	`float`	Penalty parameter. `None` (default) triggers automatic selection via rolling-origin cross-validation on pre-treatment periods.	`None`
`penalty_type`	`('pairwise', 'max_dev', 'l1_pairwise')`	Penalty functional form. `'pairwise'` — `sum_j w_j * \|\|X1 - Xj\|\|^2` (Abadie & L'Hour original). `'max_dev'` — `max_j { w_j * \|\|X1 - Xj\|\|^2 }`. `'l1_pairwise'` — `sum_j w_j * \|\|X1 - Xj\|\|_1`.	`'pairwise'`
`predictors`	`list of str`	Columns whose pre-treatment averages are appended to the covariate vector for distance computation.	`None`
`placebo`	`bool`	Run in-space placebo permutation tests.	`True`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`

Returns:

Type	Description
`CausalResult`	With `detail` set to the effects-by-period DataFrame.

References

Abadie, A. and L'Hour, J. (2021). "A Penalized Synthetic Control Estimator for Disaggregated Data." Journal of the American Statistical Association, 116(536), 1817-1834. [@abadie2021penalized]

cluster_synth ¶

cluster_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, n_clusters: Optional[int] = None, cluster_method: str = 'kmeans', augment: bool = False, max_augment: int = 3, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05, seed: Optional[int] = None) -> CausalResult

Cluster Synthetic Control estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Name of the outcome column.	required
`unit`	`str`	Name of the unit-identifier column.	required
`time`	`str`	Name of the time-period column.	required
`treated_unit`	`any`	Identifier of the single treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`n_clusters`	`int or None`	Number of clusters. `None` selects automatically via silhouette score (k from 2 to min(J-1, 10)).	`None`
`cluster_method`	`('kmeans', 'spectral', 'hierarchical')`	Clustering algorithm.	`'kmeans'`
`augment`	`bool`	If `True`, augment the selected cluster with the closest donors from other clusters.	`False`
`max_augment`	`int`	Maximum number of additional donors when augment is `True`.	`3`
`covariates`	`list of str or None`	Additional columns to include in the clustering feature matrix.	`None`
`placebo`	`bool`	Run in-space placebo permutation inference.	`True`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`seed`	`int or None`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`CausalResult`

sparse_synth ¶

sparse_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, mode: str = 'lasso', lambda_w: Optional[float] = None, lambda_v: Optional[float] = None, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Sparse Synthetic Control estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`mode`	`('lasso', 'constrained_lasso', 'joint')`	`'lasso'` — L1-penalised weights, no sum-to-one constraint. `'constrained_lasso'` — L1 + non-negativity + sum-to-one. `'joint'` — Joint V and W optimisation (full SparseSC).	`'lasso'`
`lambda_w`	`float or None`	L1 penalty on donor weights. `None` selects via cross-validation.	`None`
`lambda_v`	`float or None`	L1 penalty on feature weights (`'joint'` mode only). `None` selects via cross-validation.	`None`
`covariates`	`list of str`	Additional covariates to append to the pre-treatment outcome matrix before weight estimation.	`None`
`placebo`	`bool`	Run in-space placebo permutation for inference.	`True`
`alpha`	`float`	Significance level for confidence interval.	`0.05`

Returns:

Type	Description
`CausalResult`

Examples:

>>> import statspai as sp
>>> result = sp.sparse_synth(
...     df, outcome='gdp', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     mode='lasso',
... )
>>> result.summary()

kernel_synth ¶

kernel_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, kernel: str = 'rbf', sigma: Optional[float] = None, degree: int = 2, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Kernel-based Nonlinear Synthetic Control Method.

Standard SCM assumes the counterfactual is a linear combination of donors. This estimator lifts the donor panel into a reproducing kernel Hilbert space (RKHS) and solves for synthetic control weights in that feature space, capturing nonlinear donor relationships.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format with columns for unit, time, and outcome.	required
`outcome`	`str`	Name of the outcome variable.	required
`unit`	`str`	Column identifying panel units.	required
`time`	`str`	Column identifying time periods.	required
`treated_unit`	`Any`	Identifier of the treated unit.	required
`treatment_time`	`Any`	First treatment period (inclusive).	required
`kernel`	``{'rbf', 'polynomial', 'laplacian'}``	Kernel function to use.	``'rbf'``
`sigma`	`float or None`	Bandwidth for RBF / Laplacian kernels. If None, the median heuristic is used (recommended).	`None`
`degree`	`int`	Degree for the polynomial kernel (ignored otherwise).	`2`
`covariates`	`list of str or None`	Additional pre-treatment covariates to include in the feature vector. If provided, each donor row is `[outcomes \| covariates]`.	`None`
`placebo`	`bool`	Whether to run in-space placebo permutation for inference.	`True`
`alpha`	`float`	Significance level for the confidence interval.	`0.05`

Returns:

Type	Description
`CausalResult`	Unified result with ATT estimate, SE, p-value, CI, and period-level effects in `detail`.

Notes

The optimisation solved is:

.. math::

\min_{w \ge 0,\, \sum w = 1}
\bigl[K(Y_1, Y_1) - 2\,w^\top k(Y_1) + w^\top K\,w\bigr]

where :math:K_{ij} = k(Y_{0,i},\, Y_{0,j}) is the donor kernel matrix and :math:k(Y_1)_j = k(Y_1,\, Y_{0,j}).

References

Scholkopf, B. and Smola, A.J. (2002). "Learning with Kernels."

kernel_ridge_synth ¶

kernel_ridge_synth(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, kernel: str = 'rbf', sigma: Optional[float] = None, degree: int = 2, ridge_lambda: float = 0.01, covariates: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05) -> CausalResult

Kernel Ridge Regression Synthetic Control.

Instead of constrained simplex weights, this estimator uses kernel ridge regression to learn the mapping from donors to the treated unit. The ridge penalty lambda prevents overfitting when the number of donors is small relative to pre-treatment periods.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`Any`	Identifier of the treated unit.	required
`treatment_time`	`Any`	First treatment period (inclusive).	required
`kernel`	``{'rbf', 'polynomial', 'laplacian'}``	Kernel function.	``'rbf'``
`sigma`	`float or None`	Bandwidth (None = median heuristic).	`None`
`degree`	`int`	Polynomial kernel degree.	`2`
`ridge_lambda`	`float`	Regularisation parameter. Larger values shrink the coefficient vector toward zero.	`0.01`
`covariates`	`list of str or None`	Additional pre-treatment covariates.	`None`
`placebo`	`bool`	Run placebo permutation inference.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`

Notes

The solution is:

.. math::

\beta = (K + \lambda I)^{-1}\, k(Y_1)

and the counterfactual is :math:\hat{Y}_{1,\text{post}} = Y_{0,\text{post}}^\top \beta.

No non-negativity or sum-to-one constraints are imposed, which gives the estimator more flexibility but may produce extrapolation.

synth_compare ¶

synth_compare(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, methods: Optional[List[str]] = None, placebo: bool = True, alpha: float = 0.05, **kwargs: Any) -> SynthComparison

Run multiple SCM variants and compare them side by side.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable column name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	`None`
`treatment_time`	`any`	First treatment period (inclusive).	`None`
`methods`	`list of str`	SCM variants to compare. If `None` (default), all 20 registered methods are attempted, in ascending complexity order: `classic, penalized, demeaned, detrended, unconstrained, elastic_net, augmented, sdid, gsynth, mc, discos, scpi, penscm, fdid, sparse, cluster, kernel, kernel_ridge, bayesian, bsts`. Pass an explicit subset to reduce runtime.	`None`
`placebo`	`bool`	Whether to run placebo inference for each method.	`True`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`**kwargs`	`Any`	Additional keyword arguments forwarded to `synth()`.	`{}`

Returns:

Type	Description
`SynthComparison`	Structured comparison object with `.results`, `.comparison_table`, `.recommended`, and `.plot()`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> comp = sp.synth_compare(
...     df, outcome="packspercapita", unit="state", time="year",
...     treated_unit="California", treatment_time=1989,
...     methods=["classic", "demeaned"],
... )
>>> print(comp.summary())
>>> comp.recommended            # method with the best pre-period fit

See Also

synth_recommend : Quick one-liner returning only the method name. synth : Unified SCM dispatcher.

synth_recommend ¶

synth_recommend(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, **kwargs: Any) -> str

Quickly recommend the best SCM method for the given data.

Runs synth_compare internally with placebo=False for speed, then returns just the recommended method name.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable column name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	`None`
`treatment_time`	`any`	First treatment period (inclusive).	`None`
`**kwargs`	`Any`	Additional keyword arguments forwarded to `synth_compare()`.	`{}`

Returns:

Type	Description
`str`	Name of the recommended SCM method (e.g., `'classic'`, `'augmented'`, `'sdid'`).

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> best = sp.synth_recommend(
...     df, outcome="packspercapita", unit="state", time="year",
...     treated_unit="California", treatment_time=1989,
... )
>>> result = sp.synth(
...     df, outcome="packspercapita", unit="state", time="year",
...     treated_unit="California", treatment_time=1989,
...     method=best,
... )

See Also

synth_compare : Full comparison with all metrics and plots.

synth_power ¶

synth_power(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, effect_sizes: Optional[Sequence[float]] = None, n_simulations: int = 200, alpha: float = 0.05, seed: Optional[int] = None) -> DataFrame

Power analysis for Synthetic Control designs.

Estimates statistical power across a grid of hypothetical effect sizes using placebo-based inference. Identifies the Minimum Detectable Effect (MDE) — the smallest effect where power >= 0.80.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period (inclusive).	required
`effect_sizes`	`array-like of float`	Grid of hypothetical additive effect sizes to evaluate. If `None`, auto-generates 10 steps from 0 to 3 * pre-treatment SD of the outcome.	`None`
`n_simulations`	`int`	Number of Monte-Carlo simulations per effect size.	`200`
`alpha`	`float`	Significance level for the placebo test.	`0.05`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`DataFrame`	Columns: `effect_size`, `power`, `n_rejections`, `n_simulations`, `mde_flag`. The `mde_flag` column is `True` for the row corresponding to the Minimum Detectable Effect (first row with power >= 0.80).

Notes

The null distribution is the set of RMSPE ratios from in-space placebos (computed once on the original data). For each effect size, the simulation adds delta to the treated unit's post-treatment outcomes and re-computes the RMSPE ratio. A small noise perturbation (10 % of pre-treatment residual SD) is added so that each simulation draw is unique.

This is a novel diagnostic — no existing SCM package provides an equivalent power-planning tool.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> power_df = sp.synth_power(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     n_simulations=200, seed=42,
... )
>>> list(power_df.columns)
['effect_size', 'power', 'n_rejections', 'n_simulations', 'mde_flag']
>>> bool((power_df['power'] >= 0).all())
True
>>> mde_row = power_df[power_df['mde_flag']]

See Also

synth_mde : Quick MDE extraction. synth_power_plot : Visualise the power curve.

synth_mde ¶

synth_mde(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, power_target: float = 0.8, alpha: float = 0.05, n_simulations: int = 200, seed: Optional[int] = None) -> float

Minimum Detectable Effect for a Synthetic Control design.

Convenience wrapper around :func:synth_power that returns only the MDE (the smallest effect size achieving the target power).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`power_target`	`float`	Desired power level.	`0.80`
`alpha`	`float`	Significance level for the placebo test.	`0.05`
`n_simulations`	`int`	Number of simulations per effect size.	`200`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`float`	Minimum detectable effect size. Returns `np.inf` if no effect size in the default grid achieves the target power.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> mde = sp.synth_mde(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     seed=42,
... )
>>> bool(mde >= 0)
True

See Also

synth_power : Full power curve with details.

synth_power_plot ¶

synth_power_plot(power_result: DataFrame, ax: Any = None, figsize: tuple = (9, 6), title: Optional[str] = None) -> Any

Plot the power curve from :func:synth_power.

Displays power (y-axis) against effect size (x-axis) with reference lines at power = 0.80 and the MDE.

Parameters:

Name	Type	Description	Default
`power_result`	`DataFrame`	Output of :func:`synth_power`. Must contain columns `effect_size`, `power`, and `mde_flag`.	required
`ax`	`Axes`	Axes to plot on. If `None`, a new figure is created.	`None`
`figsize`	`tuple`	Figure size (width, height) in inches.	`(9, 6)`
`title`	`str`	Custom plot title. Defaults to `"SCM Power Curve — Minimum Detectable Effect"`.	`None`

Returns:

Type	Description
`Axes`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> power_df = sp.synth_power(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     n_simulations=50, seed=42,
... )
>>> ax = sp.synth_power_plot(power_df)
>>> bool(ax is not None)
True

See Also

synth_power : Compute the power curve.

synth_report ¶

synth_report(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', output: str = 'text', sensitivity: bool = True, alpha: float = 0.05, **kwargs: Any) -> str

Generate a comprehensive Synthetic Control analysis report.

Runs synth() for the main estimation and optionally synth_sensitivity() for robustness diagnostics, then formats everything into a structured report.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	`None`
`treatment_time`	`any`	First treatment period (inclusive).	`None`
`method`	`str`	SCM variant passed to `synth()`.	`'classic'`
`output`	`str`	Output format: `'text'`, `'markdown'`, or `'latex'`.	`'text'`
`sensitivity`	`bool`	Whether to include the sensitivity analysis section.	`True`
`alpha`	`float`	Significance level for CIs and hypothesis tests.	`0.05`
`**kwargs`	`Any`	Additional keyword arguments forwarded to `synth()`.	`{}`

Returns:

Type	Description
`str`	Formatted analysis report.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> report = sp.synth_report(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     sensitivity=False,
... )
>>> bool(isinstance(report, str))
True

>>> md = sp.synth_report(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     output='markdown', sensitivity=False,
... )
>>> bool(md.startswith('# Synthetic'))
True

synth_report_to_file ¶

synth_report_to_file(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any = None, treatment_time: Any = None, method: str = 'classic', output: str = 'markdown', sensitivity: bool = True, alpha: float = 0.05, filename: str = 'report.md', **kwargs: Any) -> str

Generate an SCM report and write it directly to a file.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel data.	required
`outcome`	`str`	Outcome variable name.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time period column.	required
`treated_unit`	`any`	Identifier of the treated unit.	`None`
`treatment_time`	`any`	First treatment period (inclusive).	`None`
`method`	`str`	SCM variant passed to `synth()`.	`'classic'`
`output`	`str`	Output format: `'text'`, `'markdown'`, or `'latex'`.	`'markdown'`
`sensitivity`	`bool`	Whether to include the sensitivity analysis section.	`True`
`alpha`	`float`	Significance level.	`0.05`
`filename`	`str`	Output file path.	`'report.md'`
`**kwargs`	`Any`	Additional keyword arguments forwarded to `synth()`.	`{}`

Returns:

Type	Description
`str`	The generated report string (also written to filename).

Examples:

>>> import os, tempfile
>>> import statspai as sp
>>> df = sp.california_prop99()
>>> path = os.path.join(tempfile.gettempdir(), 'california_scm.md')
>>> report = sp.synth_report_to_file(
...     df, outcome='packspercapita', unit='state', time='year',
...     treated_unit='California', treatment_time=1989,
...     filename=path, sensitivity=False,
... )
>>> bool(os.path.exists(path))
True

synth_to_latex ¶

synth_to_latex(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], *, caption: Optional[str] = None, label: Optional[str] = None, booktabs: bool = True, show_ci: bool = True, show_weights: bool = False, top_n_weights: int = 5, digits: int = 4, method_names: Optional[Sequence[str]] = None) -> str

Formatted LaTeX table for synthetic-control results.

Single-result mode produces a vertical table with ATT, SE, confidence interval, pre-RMSPE, fit quality, and (optionally) the top-N donor weights. Comparison mode (SynthComparison or list of results) produces a wide table with one column per method, the standard textbook layout for empirical applied work.

Parameters:

Name	Type	Description	Default
`obj`	`CausalResult, SynthComparison, or list of CausalResult`	Object to render. `SynthComparison` and lists trigger the side-by-side multi-method layout.	required
`caption`	`str`	Table caption. Defaults to a sensible auto-generated string.	`None`
`label`	`str`	LaTeX label for cross-referencing. Defaults to `"tab:synth"` (single) or `"tab:synth_compare"` (multi).	`None`
`booktabs`	`bool`	If True, use `\toprule` / `\midrule` / `\bottomrule` (requires `\usepackage{booktabs}`). Falls back to `\hline` if False.	`True`
`show_ci`	`bool`	Include the confidence-interval row.	`True`
`show_weights`	`bool`	Append a panel listing the top-N donor weights.	`False`
`top_n_weights`	`int`	How many donors to show per method when `show_weights=True`.	`5`
`digits`	`int`	Number of decimal places.	`4`
`method_names`	`list of str`	Override column labels in comparison mode.	`None`

Returns:

Type	Description
`str`	LaTeX source ready to drop into a paper. Stars use the standard `* p<0.1, p<0.05, * p<0.01` convention.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synth(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7, method="classic",
...     placebo=False,
... )
>>> tex = sp.synth_to_latex(res, show_weights=True)
>>> isinstance(tex, str)
True

Multi-method comparison:

>>> comp = sp.synth_compare(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7,
...     methods=['classic', 'sdid'], placebo=False,
... )
>>> tex = sp.synth_to_latex(comp, caption='SCM benchmark')
>>> isinstance(tex, str)
True

synth_to_markdown ¶

synth_to_markdown(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], *, title: Optional[str] = None, show_ci: bool = True, show_weights: bool = False, top_n_weights: int = 5, digits: int = 4, method_names: Optional[Sequence[str]] = None) -> str

GitHub-flavoured Markdown table for synthetic-control results.

Mirrors :func:synth_to_latex in scope but emits a pipe-delimited Markdown table that renders cleanly on GitHub, in pandoc, and in most static-site generators.

Parameters:

Name	Type	Description	Default
`obj`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`title`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`show_ci`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`show_weights`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`top_n_weights`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`digits`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required
`method_names`	`Union[CausalResult, 'SynthComparison', List[CausalResult]]`	See :func:`synth_to_latex`.	required

Returns:

Type	Description
`str`	Markdown source.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synth(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7, method="classic",
...     placebo=False,
... )
>>> md = sp.synth_to_markdown(res)
>>> isinstance(md, str)
True
>>> "**ATT**" in md
True

synth_to_excel ¶

synth_to_excel(obj: Union[CausalResult, 'SynthComparison', List[CausalResult]], path: str, *, method_names: Optional[Sequence[str]] = None, digits: int = 6) -> str

Multi-sheet Excel workbook for synthetic-control results.

Sheets

"Summary" — one row per method (ATT, SE, CI, pre-RMSPE, fit quality, donor counts).
"Weights" — donor weights per method (one column per method; missing donors are NaN).
"Gap_<method>" — per-period treated / synthetic / gap for each method.
"Diagnostics" — scalar diagnostics (pre-RMSPE, post/pre RMSPE ratio, fit quality, n_donors, etc.).

Requires openpyxl (already a soft dependency of pandas Excel I/O). Will raise ModuleNotFoundError with an actionable hint if it is not installed.

Parameters:

Name	Type	Description	Default
`obj`	`CausalResult, SynthComparison, or list of CausalResult`	Object to export.	required
`path`	`str`	Destination `.xlsx` file path.	required
`method_names`	`list of str`	Override sheet / column labels.	`None`
`digits`	`int`	Rounding for floating-point values.	`6`

Returns:

Type	Description
`str`	Absolute path of the file that was written.

Examples:

>>> import os
>>> import tempfile
>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synth(
...     df, outcome="y", unit="unit", time="time",
...     treated_unit=0, treatment_time=7, method="classic",
...     placebo=False,
... )
>>> path = sp.synth_to_excel(
...     res, os.path.join(tempfile.mkdtemp(), "synth.xlsx")
... )
>>> os.path.exists(path)
True

synth_loo ¶

synth_loo(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, alpha: float = 0.05) -> DataFrame

Leave-one-out donor sensitivity for Synthetic Control.

Re-fits SCM dropping each donor in turn. Identifies influential donors whose removal shifts the ATT substantially.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`outcome`	`str`	Outcome variable.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`penalization`	`float`	Ridge penalty forwarded to SCM.	`0.0`
`alpha`	`float`	Significance level for z-based p-values.	`0.05`

Returns:

Type	Description
`DataFrame`	Columns: `dropped_unit`, `att`, `se`, `pvalue`, `pre_rmse`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> loo = sp.synth_loo(df, outcome='packspercapita', unit='state',
...                    time='year', treated_unit='California',
...                    treatment_time=1989)
>>> loo.sort_values('att')

synth_time_placebo ¶

synth_time_placebo(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, n_placebo_times: Optional[int] = None, alpha: float = 0.05) -> DataFrame

Time-placebo ("backdating") test for Synthetic Control.

Re-fits SCM using fake treatment times drawn from the pre-treatment period. If the method finds large "effects" where none should exist, the original estimate is suspect.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`outcome`	`str`	Outcome variable.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	Real first treatment period.	required
`penalization`	`float`	Ridge penalty forwarded to SCM.	`0.0`
`n_placebo_times`	`int`	Max number of placebo treatment times to try. Default is all feasible pre-treatment times (leaving >= 2 pre-periods for each placebo fit).	`None`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`DataFrame`	Columns: `placebo_time`, `att`, `se`, `pvalue`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> tp = sp.synth_time_placebo(df, outcome='packspercapita', unit='state',
...     time='year', treated_unit='California', treatment_time=1989)
>>> tp.columns.tolist()
['placebo_time', 'att', 'se', 'pvalue']

synth_donor_sensitivity ¶

synth_donor_sensitivity(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, k: Optional[int] = None, n_samples: int = 100, penalization: float = 0.0, seed: Optional[int] = None) -> DataFrame

Donor-pool bootstrap sensitivity for Synthetic Control.

Draws n_samples random subsets of size k from the donor pool and re-fits SCM for each, producing a distribution of ATT estimates.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`outcome`	`str`	Outcome variable.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`k`	`int`	Donor subset size. Default is `floor(J * 0.75)` where J is the total number of donors.	`None`
`n_samples`	`int`	Number of random donor subsets to draw.	`100`
`penalization`	`float`	Ridge penalty forwarded to SCM.	`0.0`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`DataFrame`	Columns: `iteration`, `donors_used`, `att`, `pre_rmse`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> ds = sp.synth_donor_sensitivity(df, outcome='packspercapita',
...     unit='state', time='year', treated_unit='California',
...     treatment_time=1989, n_samples=50, seed=42)
>>> ds['att'].describe()

synth_rmspe_filter ¶

synth_rmspe_filter(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, thresholds: Optional[List[float]] = None, penalization: float = 0.0) -> DataFrame

Pre-RMSPE-filtered p-value robustness (Abadie et al. 2010).

Runs placebo SCM on every donor unit, computes each unit's pre-treatment RMSPE, then re-calculates the rank-based p-value after dropping placebos whose pre-RMSPE exceeds a multiple of the treated unit's pre-RMSPE.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`outcome`	`str`	Outcome variable.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`thresholds`	`list of float`	Multiples of treated-unit pre-RMSPE used as cut-offs. Default `[1, 2, 5, 10, 20, np.inf]`.	`None`
`penalization`	`float`	Ridge penalty.	`0.0`

Returns:

Type	Description
`DataFrame`	Columns: `threshold`, `n_placebos`, `pvalue`, `treated_pre_rmspe`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> rp = sp.synth_rmspe_filter(df, outcome='packspercapita', unit='state',
...     time='year', treated_unit='California', treatment_time=1989)
>>> rp.columns.tolist()
['threshold', 'n_placebos', 'pvalue', 'treated_pre_rmspe']

synth_sensitivity ¶

synth_sensitivity(data: DataFrame, outcome: str, unit: str, time: str, treated_unit: Any, treatment_time: Any, penalization: float = 0.0, n_donor_samples: int = 100, seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]

Run all SCM sensitivity diagnostics in a single call.

Combines leave-one-out, time placebos, donor pool bootstrap, and pre-RMSPE filtering into one bundled report.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`outcome`	`str`	Outcome variable.	required
`unit`	`str`	Unit identifier column.	required
`time`	`str`	Time column.	required
`treated_unit`	`any`	Identifier of the treated unit.	required
`treatment_time`	`any`	First treatment period.	required
`penalization`	`float`	Ridge penalty.	`0.0`
`n_donor_samples`	`int`	Number of random donor subsets for donor sensitivity.	`100`
`seed`	`int`	Random seed.	`None`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`dict`	Keys: `'loo'` — leave-one-out DataFrame `'time_placebo'` — time placebo DataFrame `'donor_sensitivity'` — donor bootstrap DataFrame `'rmspe_filter'` — RMSPE-filtered p-values DataFrame `'summary'` — formatted string summary

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> sens = sp.synth_sensitivity(df, outcome='packspercapita',
...     unit='state', time='year', treated_unit='California',
...     treatment_time=1989, n_donor_samples=50, seed=42)
>>> sorted(sens.keys())
['donor_sensitivity', 'loo', 'rmspe_filter', 'summary', 'time_placebo']
>>> print(sens['summary'])

synth_sensitivity_plot ¶

synth_sensitivity_plot(sensitivity_result: Dict[str, Any], figsize: Tuple[float, float] = (14, 10), title: Optional[str] = None) -> Any

Multi-panel sensitivity diagnostic plot.

Parameters:

Name	Type	Description	Default
`sensitivity_result`	`dict`	Output from :func:`synth_sensitivity`.	required
`figsize`	`tuple`	Figure size in inches.	`(14, 10)`
`title`	`str`	Super-title for the figure.	`None`

Returns:

Type	Description
`Figure`

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> sens = sp.synth_sensitivity(df, outcome='packspercapita',
...     unit='state', time='year', treated_unit='California',
...     treatment_time=1989, n_donor_samples=50, seed=42)
>>> fig = sp.synth_sensitivity_plot(sens)
>>> fig.savefig('synth_sensitivity.png', dpi=150)

synth_survival ¶

synth_survival(data: DataFrame, unit: str, time: str, survival: str, treated: str, treat_time: float, alpha: float = 0.05, n_placebos: int = 100, seed: int = 0) -> SyntheticSurvivalResult

Synthetic Survival Control estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long panel: one row per (unit, time) with a precomputed Kaplan-Meier survival probability in column `survival`. Each unit should have the same time grid (or be padded by forward/back-fill before calling — ragged grids are not accepted).	required
`unit`	`str`	Unit (panel-id) column.	required
`time`	`str`	Time grid column.	required
`survival`	`str`	Column containing the survival probability :math:`S_i(t)` (in :math:`(0,1)`).	required
`treated`	`str`	Column containing the name of the single treated unit. Accepts either a boolean column or a dedicated string/int identifier.	required
`treat_time`	`float`	Time at which treatment starts (times >= `treat_time` are the post-treatment window).	required
`alpha`	`float`	Uniform placebo CI level.	`0.05`
`n_placebos`	`int`	Number of placebo permutations used to bootstrap the uniform band.	`100`
`seed`	`int`		`0`

Returns:

Type	Description
`SyntheticSurvivalResult`	Fitted counterfactual survival curve, gap trajectory, donor weights, and a placebo-based uniform confidence band.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = [("treated_arm", m, float(np.exp(-0.04 * m)))
...         for m in range(12)]
>>> for c in range(6):                       # six donor arms
...     rows += [(f"control_{c}", m, float(np.exp(-(0.05 + 0.01 * c) * m)))
...              for m in range(12)]
>>> df = pd.DataFrame(rows, columns=["trial_arm", "month", "km_est"])
>>> r = sp.synth_survival(
...     df, unit="trial_arm", time="month",
...     survival="km_est", treated="treated_arm", treat_time=6,
... )
>>> r.summary()

synth_experimental_design ¶

synth_experimental_design(data: DataFrame, *, unit: str, time: str, outcome: str, k: int, candidates: Optional[Sequence[Any]] = None, donors: Optional[Sequence[Any]] = None, pre_period: Optional[Tuple[Any, Any]] = None, risk: str = 'mspe', concentration_weight: float = 0.0, penalization: float = 0.0, n_random: int = 500, random_state: Optional[int] = None) -> SynthExperimentalDesignResult

Pick k treated units to minimize the expected SC post-ATT variance.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame (long format)`	Must contain columns `[unit, time, outcome]`.	required
`unit`	`str`	Column names for the panel.	required
`time`	`str`	Column names for the panel.	required
`outcome`	`str`	Column names for the panel.	required
`k`	`int`	Number of units to select for treatment. Must satisfy `1 <= k <= len(candidates) - 1`.	required
`candidates`	`sequence`	Units eligible for treatment. Defaults to all units.	`None`
`donors`	`sequence`	Units available as donors. Defaults to "all units NOT in `candidates`"; if `candidates` covers all units we fall back to a leave-one-out protocol where each candidate's donor pool is every other unit.	`None`
`pre_period`	`(start, end)`	Closed interval of pre-treatment periods. Defaults to all timestamps in `data`.	`None`
`risk`	`('mspe', 'rmse')`	Loss functional for ranking candidates.	`'mspe'`
`concentration_weight`	`float`	Penalty on donor-weight concentration (Herfindahl): `risk_score = loss + lambda * H(w)` where `H(w) = sum(w_j^2)`. Abadie-Zhao show that for a fixed pre-MSPE, less-concentrated donors give tighter post-period confidence intervals.	`0.0`
`penalization`	`float`	Ridge penalty passed to the simplex solver (Doudchenko & Imbens 2016 style).	`0.0`
`n_random`	`int`	Monte-Carlo draws used to estimate `baseline_variance` (the expected sum-MSPE under random-`k` selection).	`500`
`random_state`	`int`		`None`

Returns:

Type	Description
`SynthExperimentalDesignResult`

Notes

The practical recipe (Abadie-Zhao 2025/2026, Section 4) is:

For each candidate unit i, solve the simplex SC problem against the donor pool restricted to non-candidates (to avoid coupling risk scores across candidates).
Record the pre-period MSPE as the plug-in estimate of sigma^2_i.
Pick the k candidates with the smallest risk_score: loss_i + lambda * H(w_i).

The implementation degrades gracefully when candidates covers all units: we then use per-candidate leave-one-out donor pools.

Examples:

>>> import statspai as sp
>>> df = sp.utils.dgp_synth(n_units=40, n_periods=20, seed=0)
>>> res = sp.synth_experimental_design(
...     df, unit='unit', time='time', outcome='y',
...     k=5, pre_period=(0, 19), random_state=0,
... )
>>> res.selected
[12, 7, 23, 4, 30]
>>> print(res.summary())

synthdid_estimate ¶

synthdid_estimate(data: DataFrame, y: str, unit: str, time: str, treat_unit: Any, treat_time: Any, **kw: Any) -> CausalResult

R-style alias: synthdid::synthdid_estimate.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synthdid_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> round(res.estimate, 2)  # true effect = 3.0
3.09

sc_estimate ¶

sc_estimate(data: DataFrame, y: str, unit: str, time: str, treat_unit: Any, treat_time: Any, **kw: Any) -> CausalResult

R-style alias: synthdid::sc_estimate.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.sc_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> round(res.estimate, 1)  # synthetic-control variant
2.5

did_estimate ¶

did_estimate(data: DataFrame, y: str, unit: str, time: str, treat_unit: Any, treat_time: Any, **kw: Any) -> CausalResult

R-style alias: synthdid::did_estimate.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.did_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> round(res.estimate, 2)  # true effect = 3.0
3.01

synthdid_placebo ¶

synthdid_placebo(data: DataFrame, y: str, unit: str, time: str, treat_unit: Any, treat_time: Any, method: Literal['sdid', 'sc', 'did'] = 'sdid', **kw: Any) -> DataFrame

Run placebo estimates assigning treatment to each control unit.

Replicates synthdid::synthdid_placebo.

Accepts the same arguments as :func:sdid, plus any extra keyword arguments.

Returns:

Type	Description
`DataFrame`	One row per control unit with columns: `unit`, `estimate`, `se`, `pvalue`.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> pl = sp.synthdid_placebo(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> list(pl.columns)
['unit', 'estimate', 'se', 'pvalue']
>>> len(pl)  # one row per control unit
11

synthdid_plot ¶

synthdid_plot(result: CausalResult, ax: Any = None, figsize: Tuple[float, float] = (10, 6), treated_color: str = '#2C3E50', synth_color: str = '#E74C3C', ci_alpha: float = 0.15, title: Optional[str] = None) -> Tuple[Any, Any]

Plot observed vs synthetic trajectory.

Replicates synthdid::plot.synthdid_estimate.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`sdid`.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`treated_color`	`str`		`'#2C3E50'`
`synth_color`	`str`		`'#2C3E50'`
`ci_alpha`	`float`		`0.15`
`title`	`str`		`None`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synthdid_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> fig, ax = sp.synthdid_plot(res)
>>> type(ax).__name__
'Axes'

synthdid_units_plot ¶

synthdid_units_plot(result: CausalResult, top_n: int = 10, ax: Any = None, figsize: Tuple[float, float] = (8, 5)) -> Tuple[Any, Any]

Horizontal bar chart of unit weight contributions.

Replicates synthdid::synthdid_units_plot.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`		required
`top_n`	`int`	Show the top-N donors by weight.	`10`
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(8, 5)`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synthdid_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> fig, ax = sp.synthdid_units_plot(res)
>>> type(ax).__name__
'Axes'

synthdid_rmse_plot ¶

synthdid_rmse_plot(result: CausalResult, ax: Any = None, figsize: Tuple[float, float] = (8, 5)) -> Tuple[Any, Any]

Pre-treatment RMSE of treated vs synthetic trajectory.

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> units, periods = 12, 10
>>> unit = np.repeat(np.arange(units), periods)
>>> time = np.tile(np.arange(1, periods + 1), units)
>>> d = (unit == 0) & (time >= 7)
>>> y = (unit * 0.4 + time * 0.2 + 3.0 * d
...      + rng.normal(0, 0.3, unit.size))
>>> df = pd.DataFrame({"y": y, "unit": unit, "time": time})
>>> res = sp.synthdid_estimate(
...     df, y="y", unit="unit", time="time",
...     treat_unit=0, treat_time=7, seed=42,
... )
>>> fig, ax = sp.synthdid_rmse_plot(res)
>>> type(ax).__name__
'Axes'

california_prop99 ¶

california_prop99() -> DataFrame

California Proposition 99 tobacco control dataset.

Returns a balanced panel of per-capita cigarette sales for 39 US states, 1970-2000. California implemented Proposition 99 in 1989.

This is the canonical synthdid example dataset.

Returns:

Type	Description
`DataFrame`	Columns: `state`, `year`, `packspercapita`, `treated`.

Examples:

>>> import statspai as sp
>>> df = sp.california_prop99()
>>> list(df.columns)
['state', 'year', 'packspercapita', 'treated']
>>> result = sp.sdid(df, y='packspercapita', unit='state',
...                  time='year', treat_unit='California',
...                  treat_time=1989)
>>> bool(result.estimand == 'ATT')
True

german_reunification ¶

german_reunification() -> DataFrame

German reunification dataset (simulated).

Returns a balanced panel of GDP per capita for 17 OECD countries, 1960--2003. West Germany is the treated unit; treatment begins in 1990 (reunification).

The simulated trajectories reproduce the key stylised facts: Luxembourg has the highest GDP per capita (~40 000), Portugal the lowest (~10 000), and all countries share a common upward growth trend. Post-1990, West Germany exhibits an approximately 1 500 GDP-per-capita decline relative to its synthetic counterfactual.

References

Abadie, A., Diamond, A. & Hainmueller, J. (2015). "Comparative Politics and the Synthetic Control Method." American Journal of Political Science, 59(2), 495--510. [@abadie2015comparative]

Returns:

Type	Description
`DataFrame`	Columns: `country`, `year`, `gdppc`, `treated`.

Examples:

>>> import statspai as sp
>>> df = sp.german_reunification()
>>> result = sp.synth(df, outcome='gdppc', unit='country',
...                   time='year', treated_unit='West Germany',
...                   treatment_time=1990)
>>> bool(result.estimate is not None)
True

basque_terrorism ¶

basque_terrorism() -> DataFrame

Basque Country terrorism dataset (simulated).

Returns a balanced panel of GDP per capita (thousands of 1986 USD) for 17 Spanish regions, 1955--1997. The Basque Country is the treated unit; treatment begins in 1970 (onset of ETA terrorism).

The simulated data reproduce the gradual widening of an approximately 10 % GDP gap between the Basque Country and its synthetic counterfactual after 1970.

References

Abadie, A. & Gardeazabal, J. (2003). "The Economic Costs of Conflict: A Case Study of the Basque Country." American Economic Review, 93(1), 113--132. [@abadie2003economic]

Returns:

Type	Description
`DataFrame`	Columns: `region`, `year`, `gdppc`, `treated`.

Examples:

>>> import statspai as sp
>>> df = sp.basque_terrorism()
>>> result = sp.synth(df, outcome='gdppc', unit='region',
...                   time='year', treated_unit='Basque Country',
...                   treatment_time=1970)
>>> bool(result.estimate is not None)
True

california_tobacco ¶

california_tobacco() -> DataFrame

California Proposition 99 tobacco dataset (simulated, extended).

Returns a balanced panel of per-capita cigarette sales and covariates for 39 US states, 1970--2000. California is the treated unit; treatment begins in 1989 (Proposition 99).

This dataset extends the simpler california_prop99() panel with additional covariates (retail price, log income, youth population share, beer consumption), enabling covariate-matching SCM analyses.

References

Abadie, A., Diamond, A. & Hainmueller, J. (2010). "Synthetic Control Methods for Comparative Case Studies: Estimating the Effect of California's Tobacco Control Program." Journal of the American Statistical Association, 105(490), 493--505. [@abadie2010synthetic]

Returns:

Type	Description
`DataFrame`	Columns: `state`, `year`, `cigsale`, `retprice`, `lnincome`, `age15to24`, `beer`, `treated`.

Notes

cigsale : per-capita cigarette sales (packs).
retprice : average retail price per pack (cents, real).
lnincome : log of real per-capita personal income.
age15to24 : share of population aged 15--24 (percent).
beer : per-capita beer consumption (gallons).

Examples:

>>> import statspai as sp
>>> df = sp.california_tobacco()
>>> result = sp.synth(df, outcome='cigsale', unit='state',
...                   time='year', treated_unit='California',
...                   treatment_time=1989)
>>> bool(result.estimate is not None)
True

statspai.synth¶

synth ¶

SyntheticControl ¶

fit ¶

SynthComparison ¶

summary ¶

plot ¶

to_latex ¶

to_markdown ¶

to_excel ¶

SequentialSDIDResult dataclass ¶

SyntheticSurvivalResult dataclass ¶

SynthExperimentalDesignResult dataclass ¶

synth ¶

synthplot ¶

demeaned_synth ¶

robust_synth ¶

staggered_synth ¶

conformal_synth ¶

scest ¶

scdata ¶

mc_synth ¶

multi_outcome_synth ¶

qqsynth ¶

discos_test ¶

discos_plot ¶

stochastic_dominance ¶

bayesian_synth ¶

causal_impact ¶

bsts_synth ¶

penalized_synth ¶

cluster_synth ¶

sparse_synth ¶

kernel_synth ¶

kernel_ridge_synth ¶

synth_compare ¶

synth_recommend ¶

synth_power ¶

synth_mde ¶

synth_power_plot ¶

synth_report ¶

synth_report_to_file ¶

synth_to_latex ¶

synth_to_markdown ¶

synth_to_excel ¶

synth_loo ¶

synth_time_placebo ¶

synth_donor_sensitivity ¶

synth_rmspe_filter ¶

synth_sensitivity ¶

synth_sensitivity_plot ¶

synth_survival ¶

synth_experimental_design ¶

synthdid_estimate ¶

sc_estimate ¶

did_estimate ¶

synthdid_placebo ¶

synthdid_plot ¶

synthdid_units_plot ¶

synthdid_rmse_plot ¶

california_prop99 ¶

german_reunification ¶

basque_terrorism ¶

california_tobacco ¶

`statspai.synth`¶

SequentialSDIDResult `dataclass` ¶

SyntheticSurvivalResult `dataclass` ¶

SynthExperimentalDesignResult `dataclass` ¶