`statspai.did`¶

did ¶

Difference-in-Differences (DID) module for StatsPAI.

Provides estimators for: - Classic 2×2 DID (two groups, two periods) - Triple Differences / DDD (two groups, two periods, within-unit subgroup) - Callaway & Sant'Anna (2021) — staggered DID with DR/IPW/REG - Sun & Abraham (2021) — interaction-weighted event study - Synthetic DID (Arkhangelsky et al. 2021) - Goodman-Bacon (2021) — TWFE decomposition diagnostic - Honest DID (Rambachan & Roth 2023) — parallel trends sensitivity - de Chaisemartin & D'Haultfoeuille (2020) — DID with treatment switching - Borusyak, Jaravel & Spiess (2024) — imputation DID estimator - Stacked DID (Cengiz, Dube, Lindner & Zipperer, 2019) - did_analysis() — one-call DID workflow - Wooldridge (2021) — extended TWFE with cohort × time interactions - Sant'Anna & Zhao (2020) — doubly robust DID - TWFE decomposition — Bacon (2021) + de Chaisemartin–D'Haultfoeuille (2020) weights

DIDAnalysis `dataclass` ¶

Bundled results from a full DID analysis workflow.

Returned by :func:sp.did_analysis; bundles the detected design, the main ATT estimate, optional event study / Bacon decomposition / honest-DID sensitivity, and a step-by-step log. Use .summary() for a human-readable report and .plot() for the event study.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> post = np.tile([0, 1], n)
>>> policy = np.repeat(rng.integers(0, 2, size=n), 2)
>>> wage = (2.0 + 0.5 * post + 1.0 * (policy * post)
...         + rng.normal(size=2 * n))
>>> df = pd.DataFrame({'post': post, 'policy': policy, 'wage': wage})
>>> report = sp.did_analysis(df, y='wage', treat='policy',
...                          time='post', run_sensitivity=False)
>>> isinstance(report, sp.DIDAnalysis)
True
>>> report.design
'2x2'

summary ¶

summary() -> str

Print comprehensive analysis summary.

plot ¶

plot(**kwargs: Any) -> Any

Plot event study if available, else main result.

HarvestDIDResult `dataclass` ¶

Bases: ResultProtocolMixin

Full diagnostic output of :func:harvest_did.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> es = pd.DataFrame({
...     "relative_time": [0, 1], "att": [0.4, 0.5],
...     "se": [0.1, 0.1], "pvalue": [0.01, 0.02], "n_comparisons": [3, 3],
... })
>>> res = sp.HarvestDIDResult(
...     estimate=0.45, se=0.07, ci=(0.31, 0.59), alpha=0.05,
...     n_comparisons=6,
...     comparisons=pd.DataFrame({"att": [0.4]}),
...     event_study=es,
...     pretrend_test={"pvalue": 0.6},
... )
>>> float(res.estimate)
0.45

SensitivityResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of Rambachan & Roth (2023) sensitivity analysis.

Attributes:

Name	Type	Description
`mbar_grid`	`ndarray`	Grid of M-bar values tested.
`ci_lower`	`ndarray`	Lower bound of the honest CI at each M-bar.
`ci_upper`	`ndarray`	Upper bound of the honest CI at each M-bar.
`breakdown_mbar`	`float`	Smallest M-bar for which the CI includes zero (sign reversal).
`att`	`float`	Point estimate of the ATT.
`att_se`	`float`	Standard error of the ATT.
`method`	`str`	Extrapolation method used (`'C-LF'`).
`alpha`	`float`	Significance level.

Methods:

Name	Description
`summary`	Print a formatted summary table.
`plot`	Matplotlib sensitivity plot (M-bar vs CI).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(60):
...     g = 5 if i < 30 else 0
...     for t in range(1, 9):
...         post = 1 if (g and t >= 5) else 0
...         y = 1.0 + 0.2 * t + i / 120 + 2.0 * post + rng.normal(0, 0.5)
...         rows.append({"unit": i, "time": t, "y": y, "g": g})
>>> df = pd.DataFrame(rows)
>>> result = sp.event_study(df, y="y", treat_time="g", time="time",
...                         unit="unit", window=(-3, 3))
>>> sens = sp.sensitivity_rr(result, Mbar=[0.0, 0.5, 1.0])
>>> type(sens).__name__
'SensitivityResult'
>>> bool(isinstance(sens.summary(), str))
True
>>> import matplotlib.pyplot as plt
>>> fig, ax = plt.subplots()
>>> ax = sens.plot(ax=ax)
>>> fig.savefig("sensitivity.png")

References

Rambachan, A. & Roth, J. (2023). [@rambachan2023more]

summary ¶

summary() -> str

Return a formatted summary string.

plot ¶

plot(ax: Any = None, figsize: tuple[float, float] = (8, 5), **kwargs: Any) -> Any

Sensitivity plot: M-bar on x-axis, honest CI band on y-axis.

Parameters:

Name	Type	Default
`ax`	`matplotlib Axes`	`None`
`figsize`	`tuple`	`(8, 5)`
`**kwargs`	passed to ``ax.fill_between``.	`{}`

Returns:

Type	Description
`Axes`

CSReport `dataclass` ¶

Structured output of :func:cs_report.

Attributes are plain pandas objects so downstream users can export to LaTeX, Markdown, or Excel without any custom converters.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for u in range(60):
...     g = rng.choice([3, 4, 0])  # cohort: treated at t=3, t=4, or never
...     ui = rng.normal(0, 1)
...     for t in range(1, 7):
...         post = 1 if (g != 0 and t >= g) else 0
...         y = ui + 0.3 * t + 2.0 * post + rng.normal(0, 0.5)
...         rows.append({"id": u, "t": t, "g": g, "y": y})
>>> df = pd.DataFrame(rows)
>>> rpt = sp.cs_report(
...     df, y="y", g="g", t="t", i="id",
...     n_boot=50, random_state=42, verbose=False)
>>> isinstance(rpt, sp.CSReport)
True
>>> isinstance(rpt.dynamic, pd.DataFrame)
True
>>> isinstance(rpt.breakdown, pd.DataFrame)
True

to_text ¶

to_text() -> str

Return the human-readable report as a single string.

plot ¶

plot(figsize: Any = (14, 10), suptitle: Optional[str] = None) -> Any

Render a 2×2 summary figure of the report.

The four quadrants show:

Top-left: event study (dynamic) with uniform confidence band
Top-right: θ(g) per-cohort aggregation with uniform band
Bottom-left: θ(t) per-calendar-time aggregation
Bottom-right: Rambachan–Roth breakdown M* across post event times

Requires matplotlib. Returns (fig, axes).

to_markdown ¶

to_markdown(float_format: str = '%.4f') -> str

Render the report as GitHub-Flavoured Markdown.

Suitable for pasting directly into a pull request, blog post, or Jupyter notebook Markdown cell.

to_excel ¶

to_excel(path: Any, float_format: Optional[str] = '%.6f', engine: Optional[str] = None) -> str

Dump the report to a multi-sheet Excel workbook.

Creates one sheet per block — Summary, Dynamic, Group, Calendar, Breakdown, Meta — so downstream Excel consumers (policy briefs, regulatory reports) can link to or copy from the individual tables directly.

Parameters:

Name	Type	Description	Default
`path`	`str \| Path`	Destination `.xlsx` path.	required
`float_format`	`str`	Passed through to :meth:`pandas.DataFrame.to_excel`. Pass `None` to preserve full precision.	`'%.6f'`
`engine`	`str`	Excel writer engine (`'openpyxl'` or `'xlsxwriter'`). If `None` pandas picks an installed one; raises a clear ImportError here if none is available.	`None`

Returns:

Type	Description
`str`	The path written.

to_latex ¶

to_latex(float_format: str = '%.4f', caption: Optional[str] = None, label: Optional[str] = None) -> str

Render the report as a LaTeX fragment.

Uses the booktabs package for each sub-table and wraps the result in a single table float. Requires \usepackage{booktabs} in the preamble of the consuming document.

ParallelTrendsRobustnessResult `dataclass` ¶

Bases: ResultProtocolMixin

Bundled pre-trends power + honest-DiD sensitivity for one DiD result.

Attributes:

Name	Type	Description
`power_table`	`DataFrame`	One row per quantity from the Roth (2022) pre-test analysis (joint statistic, p-value, power, non-centrality).
`ci_grid`	`DataFrame`	Long-format robust-CI grid: `family`, `M`, `ci_lower`, `ci_upper`, `rejects_zero`.
`breakdown`	`dict`	`family -> Mbar*`, the largest violation magnitude at which the effect stays significant.
`verdict`	`str`	One-line plain-language reading of the table.
`att, att_se`	`float`	Point estimate and standard error at relative time `e`.
`e`	`int`	Relative time the sensitivity analysis targets.
`alpha`	`float`	Significance level.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=80, n_periods=8, staggered=False, seed=0)
>>> df["first_treat"] = df["first_treat"].fillna(0)
>>> es = sp.event_study(df, y="y", treat_time="first_treat",
...                     time="time", unit="unit", window=(-3, 3))
>>> rob = sp.parallel_trends_robustness(es)
>>> bool(isinstance(rob.summary(), str))
True

References

Rambachan & Roth (2023) [@rambachan2023more]; Roth (2022) [@roth2022pretest].

summary ¶

summary() -> str

Return a formatted multi-section summary string.

to_latex ¶

to_latex(*, caption: Optional[str] = None, label: Optional[str] = None) -> str

Booktabs LaTeX table of the robust-CI grid and breakdown values.

plot ¶

plot(ax: Any = None, figsize: Tuple[float, float] = (8, 5), **kwargs: Any) -> Any

Robust-CI bands against M, one band per restriction family.

Parameters:

Name	Type	Default
`ax`	`matplotlib Axes`	`None`
`figsize`	`tuple`	`(8, 5)`
`**kwargs`	passed to ``ax.fill_between``.	`{}`

Returns:

Type	Description
`Axes`

DIDInputTypeError ¶

Bases: MethodIncompatibility, TypeError

DID input type error that preserves the historical TypeError catch.

did_analysis ¶

did_analysis(data: DataFrame, y: str, treat: str, time: str, id: Optional[str] = None, covariates: Optional[List[str]] = None, method: str = 'auto', estimator: str = 'dr', control_group: str = 'nevertreated', run_bacon: bool = True, run_event_study: bool = True, run_sensitivity: bool = True, event_window: Optional[tuple[int, int]] = None, cluster: Optional[str] = None, robust: bool = True, alpha: float = 0.05, **kwargs: Any) -> DIDAnalysis

Comprehensive DID analysis workflow.

Runs the full DID analysis pipeline in one call: design detection, Bacon decomposition (staggered), estimation, event study, and honest_did sensitivity analysis.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input dataset.	required
`y`	`str`	Outcome variable name.	required
`treat`	`str`	Treatment variable. For 2×2: binary (0/1). For staggered: first treatment period (0 = never treated).	required
`time`	`str`	Time period variable.	required
`id`	`str`	Unit identifier. Required for staggered designs.	`None`
`covariates`	`list of str`	Control variables.	`None`
`method`	`str`	Estimation method: 'auto', '2x2', 'cs', 'sa', 'bjs', 'sdid'.	`'auto'`
`estimator`	`str`	For CS: 'dr', 'ipw', or 'reg'.	`'dr'`
`control_group`	`str`	For CS/SA: 'nevertreated' or 'notyettreated'.	`'nevertreated'`
`run_bacon`	`bool`	Run Bacon decomposition for staggered designs.	`True`
`run_event_study`	`bool`	Run event study for dynamic effects + pre-trend test.	`True`
`run_sensitivity`	`bool`	Run honest_did sensitivity analysis.	`True`
`event_window`	`tuple of (int, int)`	Event study window, e.g. (-5, 5). Auto-detected if None.	`None`
`cluster`	`str`	Cluster variable for standard errors.	`None`
`robust`	`bool`	HC1 robust standard errors.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`DIDAnalysis`	Bundled results with `.summary()`, `.plot()` methods.

Examples:

Classic 2×2:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> post = np.tile([0, 1], n)
>>> policy = np.repeat(rng.integers(0, 2, size=n), 2)
>>> wage = 2.0 + 0.5 * post + 1.0 * (policy * post) + rng.normal(size=2 * n)
>>> df = pd.DataFrame({'post': post, 'policy': policy, 'wage': wage})
>>> report = sp.did_analysis(df, y='wage', treat='policy', time='post')
>>> type(report).__name__
'DIDAnalysis'
>>> print(report.summary())

Staggered — full pipeline (design auto-detected from id):

>>> rows = []
>>> cohorts = rng.choice([0, 3, 4], size=60)
>>> for i in range(60):
...     g = cohorts[i]
...     for t in range(1, 7):
...         treated = 1 if (g > 0 and t >= g) else 0
...         earn = 1.0 + 0.3 * t + 1.5 * treated + rng.normal()
...         rows.append({'worker': i, 'year': t,
...                      'first_treat': g, 'earnings': earn})
>>> panel = pd.DataFrame(rows)
>>> report = sp.did_analysis(panel, y='earnings', treat='first_treat',
...                          time='year', id='worker')
>>> report.design
'staggered'
>>> report.plot()

Quick estimate only (skip diagnostics):

>>> report = sp.did_analysis(panel, y='earnings', treat='first_treat',
...                          time='year', id='worker',
...                          run_bacon=False, run_sensitivity=False)

bacon_decomposition ¶

bacon_decomposition(data: DataFrame, y: str, treat: str, time: str, id: str, alpha: float = 0.05) -> Dict[str, Any]

Goodman-Bacon (2021) decomposition of the TWFE DID estimator.

Decomposes the overall TWFE coefficient into a weighted sum of 2×2 DID comparisons between different treatment timing groups.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Balanced panel data.	required
`y`	`str`	Outcome variable.	required
`treat`	`str`	Binary treatment indicator (0 before treatment, 1 after).	required
`time`	`str`	Time period variable.	required
`id`	`str`	Unit identifier.	required
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type Description

dict

Keys: - beta_twfe: overall TWFE estimate - decomposition: pd.DataFrame with columns [type, treated, control, estimate, weight] - weighted_sum: Σ(weight × estimate) — equals beta_twfe under the same dyad conventions as R bacondecomp and Stata bacondecomp - n_comparisons: number of 2×2 sub-comparisons - negative_weight_share: fraction of signed weight mass on truly negative Bacon weights. This is usually zero; already-treated control comparisons are reported separately as already_treated_control_weight_share.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> cohorts = {1: 4, 2: 4, 3: 7, 4: 7, 5: 99, 6: 99}  # 99 = never treated
>>> rows = []
>>> for unit, g in cohorts.items():
...     for year in range(1, 11):
...         treated = int(year >= g)
...         y = unit + 0.3 * year + 2.0 * treated + rng.normal(0, 0.5)
...         rows.append({'unit': unit, 'year': year,
...                      'outcome': y, 'treated': treated})
>>> df = pd.DataFrame(rows)
>>> result = sp.bacon_decomposition(df, y='outcome', treat='treated',
...                                 time='year', id='unit')
>>> sorted(result['decomposition']['type'].unique())
['Earlier vs Later Treated', 'Later vs Earlier Treated', 'Treated vs Untreated']
>>> bool(abs(result['weighted_sum'] - result['beta_twfe']) < 1e-8)
True

Notes

The decomposition identifies three types of comparisons:

Earlier vs Later treated: Units treated at time g₁ vs units treated later at g₂ (g₁ < g₂). These are "good" comparisons.
Later vs Earlier treated: Units treated at g₂ vs already-treated units at g₁. These are "forbidden" — they use treated units as controls and can introduce negative weighting bias.
Treated vs Never treated: Always valid comparisons.

A large already_treated_control_weight_share signals that TWFE is relying heavily on comparisons with already-treated controls, so a heterogeneity-robust estimator (C&S, Sun-Abraham) should be used.

See Goodman-Bacon (2021, JEcon), Theorem 1.

bjs_pretrend_joint ¶

bjs_pretrend_joint(result: CausalResult, data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, cluster: Optional[str] = None, horizon: Optional[List[int]] = None, n_boot: int = 300, seed: Optional[int] = None) -> Dict[str, Any]

Cluster-bootstrap joint Wald test for BJS pre-treatment coefficients.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`did_imputation` on `data` with a non-trivial `horizon` that covers negative values. Only its `model_info['event_study']` frame is consulted, to look up the observed pre-period point estimates that we re-test with a covariance-aware statistic.	required
`data`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`y`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`group`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`time`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`first_treat`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`controls`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`cluster`	`DataFrame`	Same arguments you passed to the original :func:`did_imputation` call. Needed to re-run BJS on each cluster-bootstrap resample.	required
`horizon`	`list of int`	If omitted, inferred from `result.model_info['event_study']`.	`None`
`n_boot`	`int`	Cluster-bootstrap replications. Clusters are sampled with replacement; unit ids are reassigned in the resampled frame so BJS refits cleanly.	`300`
`seed`	`int`	RNG seed for reproducibility.	`None`

Returns:

Type	Description
`dict`	`{'statistic', 'df', 'pvalue', 'method', 'n_boot', 'pre_cov'}` where `pre_cov` is the bootstrap covariance matrix used for the Wald quadratic form.

Notes

Cost: n_boot full BJS re-fits. On a 10 000-row balanced panel with |horizon|=10, expect roughly n_boot × 0.3 s = 90 s for the default n_boot=300 — the function is therefore opt-in, not run by default inside :func:did_imputation.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> df['first_treat'] = df['first_treat'].fillna(0).astype(int)
>>> imp = sp.did_imputation(
...     df, y='y', group='unit', time='time',
...     first_treat='first_treat', horizon=[-3, -2, -1, 0, 1, 2])
>>> jt = sp.bjs_pretrend_joint(
...     imp, df, y='y', group='unit', time='time',
...     first_treat='first_treat', n_boot=50, seed=0)
>>> jt['df']
3
>>> jt['method']
'cluster-bootstrap'

cohort_anchored_event_study ¶

cohort_anchored_event_study(data: DataFrame, y: str, treat: str, time: str, id: str, leads: int = 4, lags: int = 4, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Cohort-anchored event-study estimator (Rambachan-Roth successor).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`y`	`str`		required
`treat`	`str`	First-treatment-period column (0 = never-treated).	required
`time`	`str`		required
`id`	`str`		required
`leads`	`int`	Number of pre/post event-time periods to estimate.	`4`
`lags`	`int`	Number of pre/post event-time periods to estimate.	`4`
`cluster`	`str`	Cluster column for SE; defaults to `id`.	`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`CausalResult`	`estimate` is the average post-treatment effect across event times 0..lags. Per-event-time coefficients in `model_info['event_study']` (DataFrame with columns `rel_time`, `att`, `se`, `ci_low`, `ci_high`).

References

arXiv 2509.01829, Cohort-Anchored Robust Inference for Event-Study with Staggered Adoption (2025).

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> df['first_treat'] = df['first_treat'].fillna(0).astype(int)
>>> res = sp.cohort_anchored_event_study(
...     df, y='y', treat='first_treat', time='time', id='unit',
...     leads=2, lags=2)
>>> round(float(res.estimate), 4)
0.3668
>>> list(res.model_info['event_study'].columns)
['rel_time', 'att', 'se', 'ci_low', 'ci_high']

design_robust_event_study ¶

design_robust_event_study(data: DataFrame, y: str, treat: str, time: str, id: str, leads: int = 4, lags: int = 4, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Design-robust event-study with negative-weight diagnostics.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Same conventions as :func:`callaway_santanna`.	required
`treat`	`str`	Same conventions as :func:`callaway_santanna`.	required
`time`	`str`	Same conventions as :func:`callaway_santanna`.	required
`id`	`str`	Same conventions as :func:`callaway_santanna`.	required
`leads`	`int`	Event-time window.	`4`
`lags`	`int`	Event-time window.	`4`
`cluster`	`str`		`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`CausalResult`	Headline = average post-treatment effect. `model_info['weights']` reports the implicit TWFE weights per (cohort, time); negative entries flag contamination.

References

Wright, C. S. (2026). arXiv 2601.18801. See design_robust_es2026 bibkey at the bottom of this module for the full citation.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> df['first_treat'] = df['first_treat'].fillna(0).astype(int)
>>> res = sp.design_robust_event_study(
...     df, y='y', treat='first_treat', time='time', id='unit',
...     leads=2, lags=2)
>>> round(float(res.estimate), 4)
0.2886
>>> diag = res.model_info['diagnostics']
>>> diag['n_negative_weight_periods']
1

gardner_did ¶

gardner_did(data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, event_study: bool = False, horizon: Optional[List[int]] = None, cluster: Optional[str] = None, alpha: float = 0.05, vce: str = 'analytic', n_boot: int = 199, boot_seed: int = 0) -> CausalResult

Gardner (2021) two-stage DID estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`y`	`str`	Outcome column name.	required
`group`	`str`	Unit (panel-id) column.	required
`time`	`str`	Time column.	required
`first_treat`	`str`	First-treatment-period column. Never-treated units should be encoded as `0`, `NaN`, or `+inf`.	required
`controls`	`list of str`	Additional covariates included in both stages.	`None`
`event_study`	`bool`	If True, Stage 2 reports coefficients by relative time `k = t - first_treat_i`.	`False`
`horizon`	`list of int`	Relative-time leads/lags to report when `event_study=True`; defaults to `range(-5, 6)` intersected with available support.	`None`
`cluster`	`str`	Cluster variable for Stage-2 SEs. Defaults to `group`.	`None`
`alpha`	`float`	Two-sided CI level.	`0.05`
`vce`	`('analytic', 'bootstrap')`	Standard-error mode. `'analytic'` clusters the Stage-2 residuals (fast) but ignores the variance from estimating the Stage-1 fixed effects and is anti-conservative (empirically ~0.78 coverage at a nominal 95% level); a `UserWarning` recommends `'bootstrap'`. `'bootstrap'` resamples whole clusters and re-runs the full two-step procedure (Gardner 2021 / `did2s`), substantially improving coverage (≈0.90 vs ≈0.78 in simulations; it approaches nominal as the number of clusters grows). Point estimates are identical either way.	`'analytic'`
`n_boot`	`int`	Number of cluster-bootstrap replications when `vce='bootstrap'`.	`199`
`boot_seed`	`int`	Seed for the cluster bootstrap (deterministic results).	`0`

Returns:

Type	Description
`CausalResult`	`.estimate` is the overall ATT; `.model_info['event_study']` carries the event-study dict when requested. Supplies `.summary()`, `.cite()`, and is compatible with `sp.outreg2()`.

Notes

Identification requires the usual staggered-DID conditions (parallel trends, no anticipation) plus a linear two-way FE + additive covariate structure for the untreated potential outcome. Stage-2 standard errors cluster by unit — bootstrapping the whole two-step procedure gives a conservative covariance when covariate models are heavy.

References

Gardner, J. (2022). Two-stage differences in differences. Working paper. [@gardner2022stage]

Examples:

Staggered panel with a never-treated group (first_treat = 0). sp.did_2stage is an alias of this function.

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n_units, n_periods = 40, 8
>>> unit = np.repeat(np.arange(n_units), n_periods)
>>> time = np.tile(np.arange(1, n_periods + 1), n_units)
>>> first = np.where(unit < 20, 5, 0)  # 0 = never treated
>>> d = ((first > 0) & (time >= first)).astype(float)
>>> y = (0.5 * unit + 0.3 * time + 2.0 * d
...      + rng.normal(0, 0.5, unit.size))
>>> df = pd.DataFrame(
...     {"y": y, "unit": unit, "time": time, "g": first}
... )
>>> res = sp.gardner_did(
...     df, y="y", group="unit", time="time", first_treat="g"
... )
>>> round(res.estimate, 2)  # true ATT = 2.0
2.03

harvest_did ¶

harvest_did(data: DataFrame, *, unit: str, time: str, outcome: str, treat: Optional[str] = None, cohort: Optional[str] = None, never_value: Any = 0, horizons: Optional[Sequence[int]] = None, reference: int = -1, alpha: float = 0.05, weighting: str = 'precision') -> CausalResult

Harvest every valid 2×2 DID comparison and aggregate them.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel.	required
`unit`	`str`	Column names.	required
`time`	`str`	Column names.	required
`outcome`	`str`	Column names.	required
`treat`	`str`	Binary treatment indicator. If provided, the cohort (first treatment time) is inferred per unit.	`None`
`cohort`	`str`	Alternative to `treat`: a column containing the already-computed cohort (first treatment time) per unit.	`None`
`never_value`	`any`	Value that marks "never treated" in the `cohort` column. If you use `treat`, units without any treated observation are mapped to this value automatically.	`0`
`horizons`	`sequence of int`	Event-time horizons to evaluate. Defaults to `[-3, -2, -1, 0, 1, 2, 3, 4]`. Positive values are post-treatment, `0` is the first treated period, negative values are placebo/pre-trend.	`None`
`reference`	`int`	Pre-treatment reference horizon relative to each cohort's treatment time. `-1` = period immediately before treatment (standard event-study convention).	`-1`
`alpha`	`float`		`0.05`
`weighting`	`('precision', 'equal', 'n_treated')`	How to aggregate the harvested 2×2 estimates. `precision` uses inverse-variance weights (minimum-variance aggregate under independence). `equal` averages without weights. `n_treated` weights each comparison by its treated-unit count.	`'precision'`

Returns:

Type	Description
`CausalResult`	`estimand` is the aggregated post-treatment ATT (average over non-negative horizons). `detail` exposes the full 2×2 table; `model_info['event_study']` is the per-horizon aggregation.

Notes

Inference assumes independence across units within each cohort (unit-level cluster-robust SEs), but the cross-horizon covariance induced by shared units is ignored when aggregating the event study into a single ATT. For strict inference, wrap this call in :func:sp.inference.bootstrap at the unit level, or use the per-comparison table to feed :func:sp.inference.multiway_cluster_vcov.

Examples:

>>> import statspai as sp
>>> df = sp.utils.dgp_did(n_units=80, n_periods=12, seed=0)
>>> out = sp.harvest_did(
...     df, unit='unit', time='time', outcome='y',
...     treat='treated', horizons=range(-3, 5),
... )
>>> out.estimate

breakdown_m ¶

breakdown_m(result: CausalResult, e: int = 0, method: str = 'smoothness', alpha: float = 0.05) -> float

Compute the breakdown value of M.

The breakdown M is the largest violation magnitude under which the treatment effect at relative time e remains statistically significant. Larger M = more robust.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	DID result with event study.	required
`e`	`int`	Relative time period.	`0`
`method`	`str`		`'smoothness'`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`float`	Breakdown value M. The effect is significant for all M < M.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> rows = []  # staggered panel: half treated at t=5, half never
>>> for i in range(30):
...     g = 5 if i < 15 else 0
...     ui = rng.normal(0, 1)
...     for t in range(1, 9):
...         post = 1 if (g != 0 and t >= g) else 0
...         y = ui + 0.3 * t + 2.0 * post + rng.normal(0, 0.5)
...         rows.append({"unit": i, "time": t, "y": y, "g": g})
>>> df = pd.DataFrame(rows)
>>> r = sp.sun_abraham(df, y='y', g='g', t='time', i='unit')
>>> m_star = sp.breakdown_m(r, e=0)
>>> bool(m_star >= 0)  # parallel trends can deviate by up to M* per period
True

Notes

Formally, M* = sup{M : 0 ∉ CI(M)}.

For the smoothness restriction with n_drift periods: M* = (|θ̂| - z_{α/2} × SE) / n_drift

See Rambachan & Roth (2023, ReStud), Definition 2.

aggte_from_influence ¶

aggte_from_influence(source: Union[DataFrame, str, Path], type: str = 'simple', **aggte_kwargs: Any) -> CausalResult

Aggregate group-time ATTs from exported influence functions.

The post-hoc half of the Stata csdid saverif() workflow: rebuild the ATT(g, t) grid and influence-function matrix from a frame written by :func:influence_functions and run :func:statspai.aggte on it — no refit, no original data needed.

Parameters:

Name	Type	Description	Default
`source`	`DataFrame, str, or Path`	A frame produced by :func:`influence_functions`, or a path to one (`.parquet` or CSV).	required
`type`	`('simple', 'dynamic', 'group', 'calendar')`	Aggregation scheme, forwarded to :func:`statspai.aggte`.	`'simple'`
`**aggte_kwargs`	`Any`	Any other :func:`statspai.aggte` options — `min_e` / `max_e`, `balance_e`, `bstrap`, `n_boot`, `cband`, `alpha`, `random_state`.	`{}`

Returns:

Type	Description
`CausalResult`	Same shape as `sp.aggte` output.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=60, n_periods=6, staggered=True, seed=1)
>>> df['first_treat'] = df['first_treat'].fillna(0)
>>> cs = sp.callaway_santanna(df, y='y', g='first_treat', t='time',
...                           i='unit')
>>> rif = sp.influence_functions(cs)
>>> es = sp.aggte_from_influence(rif, type='dynamic', n_boot=200,
...                              random_state=0)
>>> es.estimand
'ATT'

References

Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230. [@callaway2021difference]

influence_functions ¶

influence_functions(result: CausalResult, path: Optional[Union[str, Path]] = None) -> DataFrame

Export per-unit influence functions of a Callaway–Sant'Anna fit.

Equivalent to Stata csdid, saverif(): the returned frame is self-contained — it carries everything :func:aggte_from_influence needs to recompute any aggregation without refitting.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`statspai.callaway_santanna`.	required
`path`	`str or Path`	If given, also write the frame to disk — `.parquet` via `to_parquet`, anything else via `to_csv(index=False)`.	`None`

Returns:

Type Description

DataFrame

Long format, one row per unit × (g, t) pair, columns:

unit — unit identifier (observation index for RCS fits)
unit_cohort — the unit's own first-treatment cohort (0 = never treated); used to rebuild aggregation weights
group, time, relative_time — the (g, t) cell
att — the cell's point estimate (repeated across units)
influence — the unit's influence-function value ψᵢ(g, t)
cluster — only when the fit used clustervars

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=60, n_periods=6, staggered=True, seed=1)
>>> df['first_treat'] = df['first_treat'].fillna(0)
>>> cs = sp.callaway_santanna(df, y='y', g='first_treat', t='time',
...                           i='unit')
>>> rif = sp.influence_functions(cs)
>>> set(rif.columns) >= {'unit', 'group', 'time', 'influence'}
True

References

Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230. [@callaway2021difference]

did_misclassified ¶

did_misclassified(data: DataFrame, y: str, treat: str, time: str, id: str, pi_misclass: float = 0.0, anticipation_periods: int = 0, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Staggered DiD robust to timing misclassification + anticipation.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`treat`	`str`		required
`time`	`str`		required
`id`	`str`		required
`pi_misclass`	`float in [0, 0.5]`	Probability that the recorded first-treatment period `g` is off by ±1 (symmetric). Pass 0 to skip this correction.	`0.0`
`anticipation_periods`	`int`	Number of leads to absorb as anticipation (subtracts the average of pre-event coefficients k = -1..-anticipation_periods from the post ATT estimate).	`0`
`cluster`	`str`		`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`CausalResult`	`estimate` is the corrected ATT; `model_info` reports both the naive and the corrected estimate, plus the anticipation offset and misclassification adjustment factor.

References

arXiv 2507.20415, Staggered Adoption DiD Designs with Misclassification and Anticipation (2025).

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> df['first_treat'] = df['first_treat'].fillna(0).astype(int)
>>> res = sp.did_misclassified(df, y='y', treat='first_treat',
...                            time='time', id='unit')
>>> round(float(res.estimate), 4)
0.4354

dl_propensity_score ¶

dl_propensity_score(data: DataFrame, *, treatment: str, covariates: Sequence[str], hidden_sizes: Sequence[int] = (64, 32), max_iter: int = 300, random_state: int = 0) -> ndarray

Neural-net propensity score with balance-targeted loss.

Fits a small multi-layer perceptron e(X) = P(T=1 | X); if torch is available uses a proper MLP, otherwise falls back to scikit-learn's :class:MLPClassifier (lbfgs optimiser, ReLU).

Parameters:

Name	Type	Default
`data`	`DataFrame`	required
`treatment`	`str`	required
`covariates`	`sequence of str`	required
`hidden_sizes`	`sequence of int`	`(64, 32)`
`max_iter`	`int`	`300`
`random_state`	`int`	`0`

Returns:

Type	Description
`ndarray of shape (n,)`	Estimated propensity scores clipped to (0.02, 0.98).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> x1 = rng.normal(size=n)
>>> x2 = rng.normal(size=n)
>>> treat = rng.binomial(1, 1 / (1 + np.exp(-(0.5 * x1 + 0.3 * x2))))
>>> df = pd.DataFrame({"treat": treat, "x1": x1, "x2": x2})
>>> e = sp.dl_propensity_score(df, treatment="treat",
...                            covariates=["x1", "x2"], random_state=0)
>>> e.shape == (n,)
True
>>> bool(((e >= 0.02) & (e <= 0.98)).all())
True

References

Peng, Li, Wu & Li (arXiv:2404.04794, 2024). [@peng2024local]

overlap_weighted_did ¶

overlap_weighted_did(data: DataFrame, *, y: str, treat: str, time: str, covariates: Optional[Sequence[str]] = None, ps_model: Any = 'logit', alpha: float = 0.05) -> CausalResult

Overlap-weighted 2x2 DID.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Two-period panel with a binary `treat` indicator and a binary post/pre `time` indicator.	required
`y`	`str`		required
`treat`	`str`		required
`time`	`str`		required
`covariates`	`sequence of str`	Pre-treatment covariates for the propensity score. If omitted, reduces to standard (unweighted) 2x2 DID.	`None`
`ps_model`	`('logit', 'gbm', 'dl')`	How to estimate e(X) = P(treat=1 \| X). `'dl'` uses :func:`dl_propensity_score`.	`'logit'`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`CausalResult`	`estimand = 'ATT (overlap)'`. Uses a sandwich-style bootstrap-ready SE derived from weighted residuals.

References

Li, Morgan & Zaslavsky (JASA 2018). "Overlap-weighted difference-in-differences" (Economics Letters 2025).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 150
>>> x = rng.normal(0, 1, n)
>>> treat = rng.binomial(1, 1 / (1 + np.exp(-x)))
>>> base = 1.0 + 0.5 * x + rng.normal(0, 1, n)
>>> post = base + 0.4 + 1.5 * treat + rng.normal(0, 1, n)
>>> df = pd.DataFrame({
...     "y": np.concatenate([base, post]),
...     "treat": np.tile(treat, 2),
...     "time": np.repeat([0, 1], n),
...     "x": np.tile(x, 2),
... })
>>> res = sp.overlap_weighted_did(
...     df, y="y", treat="treat", time="time",
...     covariates=["x"],
... )
>>> round(res.estimate, 2)  # true effect = 1.5
1.8
>>> res.estimand
'ATT (overlap)'

bacon_plot ¶

bacon_plot(bacon_result: Dict[str, Any], ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, colors: Optional[Dict[str, str]] = None, **kwargs: Any) -> Tuple[Any, Any]

Scatter plot of Goodman-Bacon decomposition.

Each point is a 2×2 sub-comparison: x = weight, y = DD estimate. Color distinguishes comparison types (Treated vs Never-treated, Earlier vs Later, Later vs Already-treated).

Parameters:

Name	Type	Description	Default
`bacon_result`	`dict`	Output from `bacon_decomposition()`. Must contain `'decomposition'` DataFrame and `'beta_twfe'`.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`title`	`str`		`None`
`colors`	`dict`	Map comparison type → color. Defaults provided.	`None`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> bacon = sp.bacon_decomposition(df, y='y', treat='treated',
...                                time='time', id='unit')
>>> fig, ax = sp.bacon_plot(bacon)
>>> type(fig).__name__
'Figure'

References

goodmanbacon2021difference

cohort_event_study_plot ¶

cohort_event_study_plot(result: Any, ax: Any = None, figsize: Tuple[float, float] = (12, 7), title: Optional[str] = None, palette: Optional[List[str]] = None, show_aggregate: bool = True, aggregate_color: str = '#2C3E50', ci: bool = True, ci_alpha: float = 0.08, **kwargs: Any) -> Tuple[Any, Any]

Per-cohort event study plot (overlay).

Plots a separate event study line for each treatment cohort, showing heterogeneity in treatment effects across cohorts. Optionally overlays the aggregate event study.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Result from `callaway_santanna()` or `did(method='cs')`. Must have `detail` with 'group', 'relative_time', 'att' columns, and `model_info['event_study']` for aggregate.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(12, 7)`
`title`	`str`		`None`
`palette`	`list of str`	Colors for each cohort. Auto-generated if None.	`None`
`show_aggregate`	`bool`	Overlay the aggregate event study line.	`True`
`aggregate_color`	`str`	Color for aggregate line.	`'#2C3E50'`
`ci`	`bool`	Show confidence intervals for each cohort.	`True`
`ci_alpha`	`float`	CI band transparency.	`0.08`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> result = sp.did(df, y='y', treat='first_treat', time='time',
...                 id='unit', method='cs')
>>> fig, ax = sp.cohort_event_study_plot(result)
>>> type(fig).__name__
'Figure'

References

callaway2021difference

did_plot ¶

did_plot(data: DataFrame, y: str, time: str, treat: str, treat_time: Optional[Union[int, float]] = None, show_counterfactual: bool = True, labels: Optional[Dict[str, str]] = None, colors: Optional[Tuple[str, str, str]] = None, ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, annotate_effect: bool = True, **kwargs: Any) -> Tuple[Any, Any]

Classic DID diagram showing treatment effect with counterfactual.

Plots group means over time and adds a dashed counterfactual line for the treatment group (extrapolated from pre-treatment trend parallel to the control group).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input dataset.	required
`y`	`str`	Outcome variable.	required
`time`	`str`	Time period variable.	required
`treat`	`str`	Binary treatment group indicator (0/1).	required
`treat_time`	`int or float`	Treatment onset time. If None, inferred as the midpoint.	`None`
`show_counterfactual`	`bool`	Draw the dashed counterfactual line.	`True`
`labels`	`dict`	Custom labels: `{'treat': ..., 'control': ..., 'counterfactual': ...}`.	`None`
`colors`	`tuple`	(treatment, control, counterfactual) colors.	`None`
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`title`	`str`		`None`
`annotate_effect`	`bool`	Annotate the treatment effect arrow on the plot.	`True`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=False, seed=1)
>>> fig, ax = sp.did_plot(df, y='y', time='time', treat='treated',
...                       treat_time=5)
>>> type(fig).__name__
'Figure'

did_summary_plot ¶

did_summary_plot(result: Any, ax: Any = None, figsize: Tuple[float, float] = (9, 5), color: str = '#2C3E50', highlight_color: str = '#C0392B', reference: Optional[float] = None, title: Optional[str] = None, sort_by: Optional[str] = None) -> Tuple[Any, Any]

Forest plot of DID method-robustness summary.

Plots each method's point estimate with its confidence interval as a horizontal errorbar. Designed to consume the CausalResult returned by :func:statspai.did.did_summary.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`did_summary`. Must have a `detail` DataFrame with columns `estimate`, `ci_low`, `ci_high`, and either `method` or `estimator`.	required
`ax`	`matplotlib Axes`	Existing axes to draw on. If `None` a new figure is created.	`None`
`figsize`	`tuple`	Figure size when creating a new figure.	``(9, 5)``
`color`	`str`	Color for point estimates and CIs.	``"#2C3E50"``
`highlight_color`	`str`	Color for the cross-method mean line.	``"#C0392B"``
`reference`	`float`	Horizontal reference value (e.g. 0 for 'no effect'). Defaults to `0`.	`None`
`title`	`str`	Plot title. Defaults to `"DID Method-Robustness Summary"`.	`None`
`sort_by`	`('estimate', None)`	If `'estimate'`, sort methods by point estimate ascending. Otherwise keep the order in `result.detail`.	`'estimate'`

Returns:

Type	Description
`(fig, ax) : matplotlib figure and axes.`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit')
>>> fig, ax = sp.did_summary_plot(out)
>>> type(fig).__name__
'Figure'

enhanced_event_study_plot ¶

enhanced_event_study_plot(result: Any, ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, color: str = '#2C3E50', sig_color: Optional[str] = '#E74C3C', ci_alpha: float = 0.15, shade_pre: bool = True, shade_post: bool = True, pre_color: str = '#EBF5FB', post_color: str = '#FDEDEC', show_zero: bool = True, marker: str = 'o', markersize: int = 6, alpha_level: float = 0.05, **kwargs: Any) -> Tuple[Any, Any]

Enhanced event study plot with pre/post shading and significance coloring.

Improvement over the basic CausalResult.event_study_plot() — adds optional background shading for pre/post periods and colors significant coefficients differently.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	DID result with event study in `model_info['event_study']`.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`title`	`str`		`None`
`color`	`str`	Default color for estimates.	`'#2C3E50'`
`sig_color`	`str or None`	Color for significant estimates. None disables coloring.	`'#E74C3C'`
`ci_alpha`	`float`	Confidence band transparency.	`0.15`
`shade_pre`	`bool`	Shade pre-treatment region.	`True`
`shade_post`	`bool`	Shade post-treatment region.	`True`
`pre_color`	`str`	Pre-treatment shading color.	`'#EBF5FB'`
`post_color`	`str`	Post-treatment shading color.	`'#FDEDEC'`
`show_zero`	`bool`	Show horizontal zero line.	`True`
`marker`	`str`		`'o'`
`markersize`	`int`		`6`
`alpha_level`	`float`		`0.05`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> result = sp.did(df, y='y', treat='first_treat', time='time',
...                 id='unit', method='cs')
>>> fig, ax = sp.enhanced_event_study_plot(result, shade_pre=True)
>>> type(fig).__name__
'Figure'

ggdid ¶

ggdid(result: Any, ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, point_color: str = '#2E86AB', band_color: str = '#F18F01', show_pointwise: bool = True, show_uniform: bool = True) -> Tuple[Any, Any]

Plot an aggte() result, mirroring R :func:did::ggdid.

Automatically dispatches on result.model_info['aggregation']:

simple : a single point with pointwise CI
dynamic : event-study line with pointwise CI and uniform band
group : horizontal bars of θ̂(g) per cohort
calendar : time-series of θ̂(t) per calendar period

Uniform bands (sup-t simultaneous confidence bands) are drawn from the cband_lower / cband_upper columns created by :func:aggte.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`aggte`.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`title`	`str`		`None`
`point_color`	`str`	Colours for the pointwise estimate and the uniform band.	`'#2E86AB'`
`band_color`	`str`	Colours for the pointwise estimate and the uniform band.	`'#2E86AB'`
`show_pointwise`	`bool`	Draw pointwise CI lines.	`True`
`show_uniform`	`bool`	Draw uniform band (shaded region).	`True`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> cs = sp.callaway_santanna(df, y='y', g='first_treat',
...                           t='time', i='unit')
>>> agg = sp.aggte(cs, type='dynamic')
>>> fig, ax = sp.ggdid(agg)
>>> type(fig).__name__
'Figure'

group_time_plot ¶

group_time_plot(result: Any, plot_type: str = 'dot', ax: Any = None, figsize: Tuple[float, float] = (12, 7), title: Optional[str] = None, color: str = '#2C3E50', sig_color: str = '#E74C3C', insig_color: str = '#BDC3C7', alpha_level: float = 0.05, **kwargs: Any) -> Tuple[Any, Any]

Plot group-time ATT estimates from Callaway-Sant'Anna.

Two modes: - 'dot' — dot plot with CI error bars, colored by significance - 'heatmap' — (group × time) heatmap of ATT magnitudes

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Result from `callaway_santanna()` or `did(method='cs')`. Must have `detail` DataFrame with 'group', 'time', 'att' columns.	required
`plot_type`	`str`	'dot' or 'heatmap'.	`'dot'`
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(12, 7)`
`title`	`str`		`None`
`color`	`str`	Default color for dot plot.	`'#2C3E50'`
`sig_color`	`str`	Color for significant estimates.	`'#E74C3C'`
`insig_color`	`str`	Color for insignificant estimates.	`'#BDC3C7'`
`alpha_level`	`float`	Significance threshold.	`0.05`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> result = sp.did(df, y='y', treat='first_treat', time='time',
...                 id='unit', method='cs')
>>> fig, ax = sp.group_time_plot(result)
>>> fig2, ax2 = sp.group_time_plot(result, plot_type='heatmap')
>>> type(fig).__name__
'Figure'

References

callaway2021difference

parallel_trends_plot ¶

parallel_trends_plot(data: DataFrame, y: str, time: str, treat: str, id: Optional[str] = None, treat_time: Optional[Union[int, float]] = None, agg: str = 'mean', labels: Optional[Dict[str, str]] = None, colors: Optional[Tuple[str, str]] = None, ci: bool = True, ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, **kwargs: Any) -> Tuple[Any, Any]

Plot raw outcome means over time for treatment and control groups.

The workhorse pre-analysis plot: shows whether parallel trends is plausible before running DID.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input dataset.	required
`y`	`str`	Outcome variable.	required
`time`	`str`	Time period variable.	required
`treat`	`str`	Treatment group indicator. Binary (0/1) for 2×2, or first-treatment-period for staggered (0 = never treated).	required
`id`	`str`	Unit identifier (for panel data).	`None`
`treat_time`	`int or float`	Treatment onset time. Draws a vertical line if provided.	`None`
`agg`	`str`	Aggregation function: 'mean' or 'median'.	`'mean'`
`labels`	`dict`	Custom labels, e.g. `{'treat': 'New Jersey', 'control': 'Pennsylvania'}`.	`None`
`colors`	`tuple of str`	Colors for (treatment, control). Default: ('#E74C3C', '#2C3E50').	`None`
`ci`	`bool`	Show 95% confidence intervals (±1.96 SE of mean).	`True`
`ax`	`matplotlib Axes`	Existing axes to plot on.	`None`
`figsize`	`tuple`	Figure size.	`(10, 6)`
`title`	`str`	Plot title.	`None`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> fig, ax = sp.parallel_trends_plot(df, y='y', time='time',
...                                   treat='treated', treat_time=3)
>>> type(fig).__name__
'Figure'

sensitivity_plot ¶

sensitivity_plot(sensitivity: DataFrame, original_ci: Optional[Tuple[float, float]] = None, original_estimate: Optional[float] = None, ax: Any = None, figsize: Tuple[float, float] = (10, 6), title: Optional[str] = None, color: str = '#2C3E50', breakdown_color: str = '#E74C3C', original_color: str = '#27AE60', **kwargs: Any) -> Tuple[Any, Any]

Plot Rambachan & Roth (2023) sensitivity analysis.

Shows how the robust confidence interval changes as the maximum allowed parallel trends violation (M) increases.

Parameters:

Name	Type	Description	Default
`sensitivity`	`DataFrame`	Output from `honest_did()`. Columns: M, ci_lower, ci_upper, rejects_zero.	required
`original_ci`	`tuple of (float, float)`	Original CI (at M=0) for comparison.	`None`
`original_estimate`	`float`	Original point estimate.	`None`
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(10, 6)`
`title`	`str`		`None`
`color`	`str`	CI band color.	`'#2C3E50'`
`breakdown_color`	`str`	Color for the breakdown point marker.	`'#E74C3C'`
`original_color`	`str`	Color for original estimate marker.	`'#27AE60'`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> result = sp.did(df, y='y', treat='first_treat', time='time',
...                 id='unit', method='cs')
>>> sens = sp.honest_did(result, e=0)
>>> fig, ax = sp.sensitivity_plot(sens, original_estimate=result.estimate,
...                               original_ci=result.ci)
>>> type(fig).__name__
'Figure'

References

rambachan2023more

treatment_rollout_plot ¶

treatment_rollout_plot(data: DataFrame, time: str, treat: str, id: str, ax: Any = None, figsize: Tuple[float, float] = (12, 7), title: Optional[str] = None, treated_color: str = '#E74C3C', untreated_color: str = '#ECF0F1', never_color: str = '#BDC3C7', sort_by: str = 'treat_time', show_cohort_labels: bool = True, **kwargs: Any) -> Tuple[Any, Any]

Visualise staggered treatment adoption timing.

Draws a tile/heatmap where each row is a unit and each column is a time period. Treated periods are shaded, making the staggered rollout pattern immediately visible.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data with unit, time, and treatment columns.	required
`time`	`str`	Time period variable.	required
`treat`	`str`	First-treatment-period column (0 = never treated), or binary treatment indicator.	required
`id`	`str`	Unit identifier.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(12, 7)`
`title`	`str`		`None`
`treated_color`	`str`	Color for treated unit-periods.	`'#E74C3C'`
`untreated_color`	`str`	Color for untreated unit-periods.	`'#ECF0F1'`
`never_color`	`str`	Color for never-treated units.	`'#BDC3C7'`
`sort_by`	`str`	Sort units by: 'treat_time' (earliest first), 'id', or 'random'.	`'treat_time'`
`show_cohort_labels`	`bool`	Annotate cohort boundaries on the y-axis.	`True`

Returns:

Type	Description
`(fig, ax)`

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=0)
>>> fig, ax = sp.treatment_rollout_plot(df, time='time',
...                                     treat='first_treat', id='unit')
>>> type(fig).__name__
'Figure'

pretrends_power ¶

pretrends_power(result: Any, delta: Optional[ndarray] = None, alpha: float = 0.05) -> Dict[str, Any]

Power of the pre-trend test against a hypothesised violation.

Implements the power calculation from Roth (2022, AER: Insights). A non-significant pre-trend test is uninformative when the test has low power against economically meaningful violations of parallel trends.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Event-study result with pre-treatment estimates and SEs.	required
`delta`	`array - like`	Hypothesised trend violation in the pre-period (length = number of pre-periods). Default: linear trend `delta[k] = (k+1) * min(\|SE\|)` -- a violation equal to one SE at the furthest lag, declining linearly to near-zero.	`None`
`alpha`	`float`	Significance level of the pre-trend test.	`0.05`

Returns:

Type	Description
`dict`	Keys: `power`, `noncentrality`, `df`, `delta`, `critical_value`, `warning`.

References

Roth, J. (2022). Pre-test with Caution: Event-Study Estimates after Testing for Parallel Trends. AER: Insights, 4(3), 305--322. [@roth2022pretest]

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(80):
...     cohort = 4 if i < 40 else 0          # 0 = never treated
...     for t in range(8):
...         post = cohort > 0 and t >= cohort
...         y = 0.3 * t + (2.0 if post else 0.0) + (i % 5) + rng.normal()
...         rows.append((i, t, cohort, y))
>>> df = pd.DataFrame(rows, columns=["id", "t", "cohort", "y"])
>>> es = sp.event_study(df, y="y", treat_time="cohort", time="t", unit="id")
>>> sp.pretrends_power(es)

pretrends_summary ¶

pretrends_summary(result: Any, delta: Optional[ndarray] = None, alpha: float = 0.05) -> str

Print a combined pre-trends diagnostic report.

Runs pretrends_test and pretrends_power and formats the output in a single table.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Event-study result.	required
`delta`	`array - like`	Passed to `pretrends_power`.	`None`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`str`	Formatted report.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(60):
...     g = 5 if i < 30 else 0
...     for t in range(1, 9):
...         post = 1 if (g and t >= 5) else 0
...         y = 1.0 + 0.2 * t + i / 120 + 2.0 * post + rng.normal(0, 0.5)
...         rows.append({"unit": i, "time": t, "y": y, "g": g})
>>> df = pd.DataFrame(rows)
>>> result = sp.event_study(df, y="y", treat_time="g", time="time",
...                         unit="unit", window=(-3, 3))
>>> report = sp.pretrends_summary(result)  # also prints the report
>>> bool(isinstance(report, str))
True

pretrends_test ¶

pretrends_test(result: Any, type: str = 'wald', alpha: float = 0.05) -> Dict[str, Any]

Joint test of pre-treatment coefficients.

Tests H0: beta_pre = 0 (all pre-treatment event-study coefficients are jointly zero).

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Event-study result containing pre-treatment estimates and SEs.	required
`type`	``'wald'`` or ``'f'``	`'wald'`: chi-squared test statistic. `'f'`: scaled F-statistic (requires `df_resid` in model_info).	`'wald'`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`dict`	Keys: `statistic`, `pvalue`, `df`, `type`, `reject`, `interpretation`.

References

Standard Wald test; see Roth (2022) for caveats on interpretation.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(80):
...     cohort = 4 if i < 40 else 0          # 0 = never treated
...     for t in range(8):
...         post = cohort > 0 and t >= cohort
...         y = 0.3 * t + (2.0 if post else 0.0) + (i % 5) + rng.normal()
...         rows.append((i, t, cohort, y))
>>> df = pd.DataFrame(rows, columns=["id", "t", "cohort", "y"])
>>> es = sp.event_study(df, y="y", treat_time="cohort", time="t", unit="id")
>>> sp.pretrends_test(es)

sensitivity_rr ¶

sensitivity_rr(result: Any, Mbar: Optional[Union[ndarray, List[float]]] = None, method: str = 'C-LF', alpha: float = 0.05, n_grid: int = 20) -> SensitivityResult

Rambachan & Roth (2023) honest confidence intervals.

Computes confidence intervals for the ATT that are valid under bounded departures from parallel trends. The conditional linear-in-relative-time (C-LF) restriction assumes the post-treatment violation is bounded by a linear extrapolation of the pre-trend plus an additional M-bar of slack.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Event-study result with pre- and post-treatment estimates.	required
`Mbar`	`array - like`	Grid of M-bar values. Default: `np.linspace(0, 3 * max_pre_slope, n_grid)`.	`None`
`method`	``'C-LF'``	Extrapolation method. Currently only C-LF is implemented.	`'C-LF'`
`alpha`	`float`	Significance level.	`0.05`
`n_grid`	`int`	Number of grid points when `Mbar` is not supplied.	`20`

Returns:

Type	Description
`SensitivityResult`

Notes

.. versionchanged:: next The pre-period trend is now fitted by generalised least squares using the full pre-period covariance from result.model_info['vcv_pre'] when it is available (sp.event_study supplies it). Previously the fit always used diagonal 1/se**2 weights, i.e. it assumed the pre-treatment event-study coefficients were mutually independent -- they are not, since they share the omitted reference period and the unit/time fixed effects. Breakdown Mbar values therefore move slightly relative to earlier releases. When no covariance is available the diagonal fallback is still used, but it now warns loudly. Object with .summary(), .plot(), .mbar_grid, .ci_lower, .ci_upper, .breakdown_mbar.

References

Rambachan, A. & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555--2591. [@rambachan2023more]

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(80):
...     cohort = 4 if i < 40 else 0          # 0 = never treated
...     for t in range(8):
...         post = cohort > 0 and t >= cohort
...         y = 0.3 * t + (2.0 if post else 0.0) + (i % 5) + rng.normal()
...         rows.append((i, t, cohort, y))
>>> df = pd.DataFrame(rows, columns=["id", "t", "cohort", "y"])
>>> es = sp.event_study(df, y="y", treat_time="cohort", time="t", unit="id")
>>> sens = sp.sensitivity_rr(es, Mbar=[0, 0.01, 0.02, 0.05])
>>> sens.summary()

cs_report ¶

cs_report(data_or_result: Union[DataFrame, CausalResult], y: Optional[str] = None, g: Optional[str] = None, t: Optional[str] = None, i: Optional[str] = None, x: Optional[List[str]] = None, estimator: str = 'dr', control_group: str = 'nevertreated', anticipation: int = 0, alpha: float = 0.05, n_boot: int = 1000, random_state: Optional[int] = 0, min_e: float = -inf, max_e: float = inf, rr_method: str = 'smoothness', verbose: bool = True, save_to: Optional[str] = None) -> CSReport

One-call staggered-DID workflow: estimate → aggregate → sensitivity.

Parameters:

Name	Type	Description	Default
`data_or_result`	`DataFrame \| CausalResult`	Either a long-format panel (then `y, g, t, i` are required and :func:`callaway_santanna` is run first), or an already-fitted :func:`callaway_santanna` result.	required
`y`	`str`	Outcome / cohort / time / unit id columns (required when `data_or_result` is a DataFrame).	`None`
`g`	`str`	Outcome / cohort / time / unit id columns (required when `data_or_result` is a DataFrame).	`None`
`t`	`str`	Outcome / cohort / time / unit id columns (required when `data_or_result` is a DataFrame).	`None`
`i`	`str`	Outcome / cohort / time / unit id columns (required when `data_or_result` is a DataFrame).	`None`
`x`	`list of str`	Covariates for conditional parallel trends.	`None`
`estimator`	`('dr', 'ipw', 'reg')`		`'dr'`
`control_group`	`('nevertreated', 'notyettreated')`		`'nevertreated'`
`anticipation`	`int`		`0`
`alpha`	`float`		`0.05`
`n_boot`	`int`	Multiplier-bootstrap replications for :func:`aggte`.	`1000`
`random_state`	`int`	Seed for the bootstrap (set to `None` for non-reproducibility).	`0`
`min_e`	`float`	Event-time window passed to the dynamic aggregation.	`(-inf, inf)`
`max_e`	`float`	Event-time window passed to the dynamic aggregation.	`(-inf, inf)`
`rr_method`	`('smoothness', 'relative_magnitude')`	Sensitivity restriction handed to :func:`breakdown_m`.	`'smoothness'`
`verbose`	`bool`	If `True`, print the report before returning.	`True`
`save_to`	`str`	When set, treats the value as a path prefix and writes the report in every supported format in one call: `<prefix>.txt` — fixed-width plain-text report `<prefix>.md` — GitHub-flavoured Markdown `<prefix>.tex` — booktabs LaTeX fragment `<prefix>.xlsx` — multi-sheet workbook `<prefix>.png` — 2×2 summary figure (only if matplotlib is installed; silently skipped otherwise) Missing parent directories are created on the fly.	`None`

Returns:

Type	Description
`CSReport`	Structured container; call `.to_text()` to re-render.

Examples:

>>> import statspai as sp, numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for unit in range(80):
...     g = 4 if unit < 40 else 0             # cohort; 0 = never treated
...     for t in range(8):
...         post = g > 0 and t >= g
...         y = 0.3 * t + (2.0 if post else 0.0) + (unit % 5) + rng.normal()
...         rows.append((unit, t, g, y))
>>> df = pd.DataFrame(rows, columns=["id", "t", "g", "y"])
>>> rpt = sp.cs_report(df, y="y", g="g", t="t", i="id", random_state=42)
>>> rpt.dynamic           # event-study DataFrame w/ uniform bands

parallel_trends_robustness ¶

parallel_trends_robustness(result: Any, m_grid: Optional[Sequence[float]] = None, families: Sequence[str] = ('SD', 'RM'), alpha: float = 0.05, e: int = 0, delta: Optional[ndarray] = None) -> ParallelTrendsRobustnessResult

Run the full parallel-trends robustness pipeline on a DiD result.

Chains the joint pre-trend test, the Roth (2022) power calculation for that test, and Rambachan & Roth (2023) honest confidence intervals (plus their breakdown value Mbar*) for each requested restriction family, and reduces the whole thing to a one-line verdict.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	A fitted DiD/event-study result carrying event-study estimates in `result.model_info['event_study']` (e.g. from `sp.event_study`, `sp.callaway_santanna`, `sp.sun_abraham`). .. note:: If the result does not carry a pre-period covariance matrix in `model_info['vcv_pre']`, the pre-trend test and power fall back to assuming the pre-period coefficients are mutually independent and warn loudly. `sp.event_study` computes the full cluster-robust covariance; pass `expose_pre_vcov=True` to it to have this pipeline use the correct covariance instead of the diagonal fallback.	required
`m_grid`	`sequence of float`	Grid of violation magnitudes. Default: the `honest_did` default, multiples of the standard error at `e`.	`None`
`families`	`sequence of str`	Restriction families. `"SD"` maps to `honest_did`'s `method='smoothness'` (bounded second differences); `"RM"` maps to `method='relative_magnitude'`.	``("SD", "RM")``
`alpha`	`float`	Significance level.	`0.05`
`e`	`int`	Relative time whose effect the sensitivity analysis targets.	`0`
`delta`	`array - like`	Hypothesised pre-trend violation passed to `pretrends_power`.	`None`

Returns:

Type	Description
`ParallelTrendsRobustnessResult`	With `.summary()`, `.plot()`, `.to_latex()`, and the `power_table` / `ci_grid` / `breakdown` / `verdict` fields.

References

Rambachan, A. and Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555-2591. [@rambachan2023more]

Roth, J. (2022). Pretest with Caution: Event-Study Estimates after Testing for Parallel Trends. AER: Insights, 4(3), 305-322. [@roth2022pretest]

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=80, n_periods=8, staggered=False, seed=0)
>>> df["first_treat"] = df["first_treat"].fillna(0)
>>> es = sp.event_study(df, y="y", treat_time="first_treat",
...                     time="time", unit="unit", window=(-3, 3))
>>> rob = sp.parallel_trends_robustness(es, families=("SD", "RM"))
>>> sorted(rob.breakdown)
['RM', 'SD']
>>> bool("Mbar" in rob.verdict or "not robust" in rob.verdict)
True

did_report ¶

did_report(data: DataFrame, y: str, time: str, first_treat: str, group: str, save_to: str, methods: Union[str, List[str]] = 'auto', controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, include_sensitivity: bool = True, plot_sort_by: Optional[str] = 'estimate', verbose: bool = False) -> CausalResult

DID report bundle: fits selected methods and writes report artifacts.

Writes the following files to save_to:

did_summary.txt : text dump of result.summary().
did_summary.md : GitHub-Flavoured Markdown table.
did_summary.tex : LaTeX booktabs fragment.
did_summary.png : forest plot (requires matplotlib).
did_summary.json : detail table + model_info in JSON.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Same as :func:`did_summary`.	required
`y`	`DataFrame`	Same as :func:`did_summary`.	required
`time`	`DataFrame`	Same as :func:`did_summary`.	required
`first_treat`	`DataFrame`	Same as :func:`did_summary`.	required
`group`	`DataFrame`	Same as :func:`did_summary`.	required
`methods`	`DataFrame`	Same as :func:`did_summary`.	required
`controls`	`DataFrame`	Same as :func:`did_summary`.	required
`cluster`	`DataFrame`	Same as :func:`did_summary`.	required
`alpha`	`DataFrame`	Same as :func:`did_summary`.	required
`save_to`	`str`	Directory path. Created if it does not exist.	required
`include_sensitivity`	`bool`	Whether to run Rambachan-Roth breakdown M*. Defaults to `True` in `did_report` (vs `False` in `did_summary`) because a report is expected to be comprehensive.	`True`
`plot_sort_by`	`(estimate, None)`	Sort the forest plot by estimate ascending.	`'estimate'`
`verbose`	`bool`	Print progress lines.	`False`

Returns:

Type	Description
`CausalResult`	The underlying :func:`did_summary` output. All side-effect files are written to `save_to` as a bundle.

Examples:

>>> import tempfile
>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> out_dir = tempfile.mkdtemp()
>>> res = sp.did_report(df, y='y', time='time',
...                     first_treat='first_treat', group='unit',
...                     save_to=out_dir, methods=['cs', 'bjs'],
...                     include_sensitivity=False)
>>> import os
>>> 'did_summary.md' in os.listdir(out_dir)
True

did_summary ¶

did_summary(data: DataFrame, y: str, time: str, first_treat: str, group: str, methods: Union[str, List[str]] = 'auto', controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, include_sensitivity: bool = False, verbose: bool = False) -> CausalResult

One-call method-robustness comparison for staggered DID.

Fits every requested estimator to the same data and returns a single :class:CausalResult whose detail attribute is a tidy comparison table — one row per method, columns (method, estimator, estimate, se, pvalue, ci_low, ci_high, n_obs, note).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel dataset (long format).	required
`y`	`str`	Outcome variable.	required
`time`	`str`	Time / period variable (integer-valued).	required
`first_treat`	`str`	First-treatment period per unit; NaN (or 0) for never-treated.	required
`group`	`str`	Unit identifier.	required
`methods`	`str or list of str`	Methods to run. Valid keys: `'cs'`, `'sa'`, `'bjs'`, `'etwfe'`, `'stacked'`, or `'all'` / `'auto'` for all.	``'auto'``
`controls`	`list of str`	Time-varying covariates passed to methods that support them.	`None`
`cluster`	`str`	Cluster variable for SE (defaults to `group` in each sub-method).	`None`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`include_sensitivity`	`bool`	If `True` and `'cs'` is among the methods fit, compute the Rambachan–Roth (2023) breakdown M* — the largest relative violation of parallel trends under which the treatment effect is still significantly different from zero. The value is added to `model_info['breakdown_m']` and to the `breakdown_m` column of `detail` (CS row only; other methods leave `NaN`).	`False`
`verbose`	`bool`	Print progress for each method.	`False`

Returns:

Type	Description
`CausalResult`	`estimate` : mean of successfully-fit overall ATTs. `se` : standard deviation across methods (not a standard error — a crude dispersion measure). `detail` : comparison DataFrame described above. `model_info` : `{'methods_requested': [...], 'methods_fit': [...], 'methods_failed': {name: error_msg, ...}}`.

Notes

Each method's overall ATT has slightly different interpretation:

CS aggte(type='simple') averages ATT(g, t) for post-treatment :math:t \geq g, weighted by cohort size × exposure length.
ETWFE reports the R/Stata treated-observation-weighted simple ATT.
SA / BJS / Stacked report their estimator-specific overall ATT aggregations.

Differences across methods are informative about heterogeneity, model specification, and the sensitivity of conclusions to the estimator choice. Large disagreement is a red flag that deserves further investigation (e.g., via sp.bacon_decomposition or sp.honest_did).

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=10, staggered=True, seed=0)
>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit')
>>> out.summary()
>>> print(out.detail[['method', 'estimate', 'se', 'pvalue']])

did_summary_to_latex ¶

did_summary_to_latex(result: CausalResult, digits: int = 4, include_ci: bool = True, include_breakdown: bool = True, label: str = 'tab:did_summary', caption: str = 'DID method-robustness summary.') -> str

Render a :func:did_summary result as a LaTeX booktabs table.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`did_summary`.	required
`digits`	`int`	Decimal precision.	`4`
`include_ci`	`bool`	Include the 95 % CI column.	`True`
`include_breakdown`	`bool`	Include the Rambachan-Roth breakdown M* column when sensitivity was requested.	`True`
`label`	`str`	LaTeX label for the table.	``'tab:did_summary'``
`caption`	`str`	LaTeX caption.	``'DID method-robustness summary.'``

Returns:

Type	Description
`str`	Full `\begin{table} ... \end{table}` block using the `booktabs` package (`\toprule`, `\midrule`, `\bottomrule`).

Notes

Requires \usepackage{booktabs} in the LaTeX preamble.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit',
...                      methods=['cs', 'bjs'])
>>> tex = sp.did_summary_to_latex(out)
>>> isinstance(tex, str)
True
>>> tex.splitlines()[0].startswith('\\begin{table}')
True

did_summary_to_markdown ¶

did_summary_to_markdown(result: CausalResult, digits: int = 4, include_ci: bool = True, include_breakdown: bool = True) -> str

Render a :func:did_summary result as a GitHub-Flavoured Markdown table.

Columns shown (in order): Method, Estimate, SE, 95 % CI, p-value, and optionally Breakdown M* (when sensitivity was requested).

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`did_summary`.	required
`digits`	`int`	Decimal precision for numeric columns.	`4`
`include_ci`	`bool`	Include the 95 % CI column.	`True`
`include_breakdown`	`bool`	Include the Rambachan-Roth breakdown M* column (CS row only, blank for others). Ignored if sensitivity was not requested.	`True`

Returns:

Type	Description
`str`	Multi-line Markdown table, ready to paste into notebooks or PRs.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True,
...                 seed=0)
>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit',
...                      methods=['cs', 'bjs'])
>>> md = sp.did_summary_to_markdown(out)
>>> isinstance(md, str)
True
>>> md.splitlines()[0].startswith('| Method')
True

drdid ¶

drdid(data: DataFrame, y: str, group: str, time: str, covariates: Optional[List[str]] = None, method: str = 'imp', alpha: float = 0.05, n_boot: int = 500, random_state: Optional[int] = None, seed: Optional[int] = None, id: Optional[str] = None) -> CausalResult

Doubly Robust Difference-in-Differences (Sant'Anna & Zhao 2020).

Combines outcome regression with inverse probability weighting for 2×2 DID with covariates. Consistent if either the outcome model or the propensity score model is correctly specified.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Dataset with one row per unit-period in 2x2 design.	required
`y`	`str`	Outcome variable.	required
`group`	`str`	Binary treatment-group indicator (1 = treated, 0 = control).	required
`time`	`str`	Binary time indicator (1 = post, 0 = pre).	required
`covariates`	`list of str`	Covariate names. If `None`, runs a simple (un-adjusted) DID.	`None`
`method`	`str`	`'imp'` for the improved estimator (locally efficient); `'trad'` for the traditional DR-DID.	``'imp'``
`alpha`	`float`	Significance level.	`0.05`
`n_boot`	`int`	Number of bootstrap replications for inference.	`500`
`random_state`	`int`	Seed for bootstrap reproducibility.	`None`
`id`	`str`	Unit identifier for a true two-period panel. When supplied, the improved estimator uses the Sant'Anna-Zhao panel formula with calibrated propensity scores and influence-function standard errors, matching `DRDID::drdid_imp_panel` and Stata `drdid, drimp`.	`None`

Returns:

Type	Description
`CausalResult`	`estimate` is the DR-DID ATT. `detail` contains influence-function diagnostics.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> G = rng.integers(0, 2, n)
>>> T = rng.integers(0, 2, n)
>>> x = rng.normal(0, 1, n)
>>> y_val = 1 + 0.5*x + 2*G + 3*T + 4*G*T + rng.normal(0, 1, n)
>>> df = pd.DataFrame({'y': y_val, 'treated': G, 'post': T, 'x': x})
>>> result = sp.drdid(df, y='y', group='treated', time='post',
...                   covariates=['x'])
>>> abs(result.estimate - 4.0) < 1.0
True

etwfe ¶

etwfe(data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, xvar: Optional[Any] = None, panel: bool = True, cgroup: str = 'notyet') -> CausalResult

Public sp.etwfe entry point — see _dispatch_etwfe_impl for the full docstring on options and behaviour.

Thin wrapper around the 4-branch dispatcher (panel-with-xvar / panel-never-only / panel-notyet / repeated-cross-section) that attaches a :class:Provenance record to the returned result so downstream replication_pack / Quarto appendix / table footers can pick up the call without each branch having to opt in.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=120, n_periods=8, staggered=True, seed=42)
>>> res = sp.etwfe(df, y='y', group='unit', time='time',
...                first_treat='first_treat')
>>> res.estimate > 0  # R/Stata simple ATT (true effect 0.5)
True
>>> res.detail is not None  # cohort-specific ATTs
True

etwfe_emfx ¶

etwfe_emfx(result: CausalResult, type: str = 'simple', alpha: float = 0.05, include_leads: bool = False, weighting: str = 'treated') -> CausalResult

R etwfe::emfx-style aggregated marginal effects for an ETWFE fit.

Takes the result of :func:etwfe / :func:wooldridge_did and returns one of four aggregations used in applied work:

================ ======================================================== type Aggregation ================ ======================================================== 'simple' Overall treated-observation-weighted ATT (same as result.estimate for current sp.etwfe results). 'group' ATT per treatment cohort g. 'event' ATT per event time e = t - g, averaged across cohorts. 'calendar' ATT per calendar time t, averaged across cohorts for which t >= g. ================ ========================================================

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Output of :func:`etwfe` or :func:`wooldridge_did`.	required
`type`	`(simple, group, event, calendar)`	Aggregation type.	`'simple'`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`include_leads`	`bool`	For `type='event'` and `type='calendar'`, whether to include pre-treatment relative times (`rel_time < 0`) in the output. These coefficients identify pre-trends and are informative for parallel-trends inspection. Default `False` for backward compatibility with earlier versions; set `True` for full event-study output matching the R `etwfe::emfx(type='event')` default. `rel_time = -1` is always the reference category and is excluded.	`False`
`weighting`	`(cohort, treated)`	Aggregation weights for cohort-level marginal effects. `'treated'` uses the number of treated post-period observations, matching R `etwfe::emfx(type='simple')` and Stata `jwdid, estat simple`. `'cohort'` preserves the historical StatsPAI cohort-share weighting.	`'cohort'`

Returns:

Type	Description
`CausalResult`	`estimate` is the overall ATT (for `type='simple'`) or the mean of the sub-aggregation (for the other types). `detail` contains one row per group/event-time/calendar-time with (estimate, se, pvalue, ci_low, ci_high).

Notes

For 'event' and 'calendar', the reported SE treats the per-cohort coefficients as independent — a standard approximation that matches R etwfe's default under classical vcov. Cluster-robust or fully-general SEs require the full regression vcov, which can be requested via sp.wooldridge_did + the model_info matrix in a future release.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=10, staggered=True)
>>> fit = sp.etwfe(df, y='y', time='time',
...                first_treat='first_treat', group='unit')
>>> evt = sp.etwfe_emfx(fit, type='event')
>>> print(evt.detail)   # ATT by event time
>>> grp = sp.etwfe_emfx(fit, type='group')
>>> cal = sp.etwfe_emfx(fit, type='calendar')

twfe_decomposition ¶

twfe_decomposition(data: DataFrame, y: str, group: str, time: str, first_treat: str, alpha: float = 0.05) -> CausalResult

TWFE decomposition: Goodman-Bacon (2021) + de Chaisemartin–D'Haultfoeuille weights.

Decomposes the standard two-way fixed effects estimator into all pairwise 2×2 DID comparisons, showing the weight and estimate for each. Also computes de Chaisemartin–D'Haultfoeuille (2020) weights to diagnose whether negative weights are present.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel dataset in long format.	required
`y`	`str`	Outcome variable.	required
`group`	`str`	Unit identifier.	required
`time`	`str`	Time period variable.	required
`first_treat`	`str`	Treatment timing column (NaN or 0 for never-treated).	required
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`	`detail` DataFrame has columns: `type`, `treated_cohort`, `control_cohort`, `estimate`, `weight`, `weighted_est`. `model_info` includes summary statistics and dCDH weights.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=8, staggered=True, seed=0)
>>> result = sp.twfe_decomposition(df, y='y', group='unit',
...                                time='time',
...                                first_treat='first_treat')
>>> bool('weight' in result.detail.columns)  # 2x2 decomposition weights
True

did ¶

did(data: DataFrame, y: str, treat: str, time: str, id: Optional[str] = None, covariates: Optional[Any] = None, method: str = 'auto', estimator: str = 'dr', control_group: str = 'nevertreated', base_period: str = 'universal', cluster: Optional[str] = None, robust: bool = True, alpha: float = 0.05, weights: Optional[str] = None, subgroup: Optional[str] = None, treat_unit: Any = None, treat_time: Any = None, se_method: str = 'placebo', aggregation: Optional[str] = None, n_boot: int = 1000, random_state: Optional[int] = None, panel: bool = True, anticipation: int = 0, vce: Optional[str] = None, wild_reps: int = 999, wild_weight_type: str = 'rademacher', seed: Optional[int] = None, **kwargs: Any) -> CausalResult

Difference-in-Differences estimation.

Unified entry point that auto-detects design type and dispatches to the appropriate estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input dataset.	required
`y`	`str`	Outcome variable name.	required
`treat`	`str`	Treatment variable. The column semantics depend on the design — one of the most common pitfalls in DID: 2×2 DID (no `id`, two periods): a 0/1 group indicator (treated vs. control), identical to a standard DID dummy. Staggered DID (Callaway–Sant'Anna, Sun–Abraham, SDID, Borusyak–Jaravel–Spiess, de Chaisemartin–D'Haultfœuille, Wooldridge etwfe): the first treatment period for each unit (aka the cohort / g-variable in R's `did` package). Never-treated units must have `0` (or `NaN`), not `1`. A plain 0/1 indicator will silently be interpreted as "everyone was first treated in period 1," producing nonsense estimates. If you only have a 0/1 `treated` column, construct `first_treat` per unit and broadcast it to all of that unit's rows:: `# First treated year per unit (NaN for never-treated units) first = (df.loc[df['treated'] == 1] .groupby('id')['year'].min()) df['first_treat'] = df['id'].map(first).fillna(0).astype(int) sp.did(df, y='y', treat='first_treat', time='year', id='id')`	required
`time`	`str`	Time period variable.	required
`id`	`str`	Unit identifier. Required for staggered DID and SDID.	`None`
`covariates`	`list of str`	Covariate names for conditional parallel trends / controls.	`None`
`method`	`str`	`'auto'` — 2×2 if `id` is None and treatment is binary, else Callaway-Sant'Anna. `'2x2'` — classic two-period, two-group DID. `'ddd'` — triple differences (requires `subgroup`). `'callaway_santanna'` or `'cs'` — staggered DID. `'sun_abraham'`, `'sa'`, or `'sunab'` — IW event study. `'bjs'` or `'did_imputation'` — Borusyak-Jaravel-Spiess imputation DID. `'sdid'` — synthetic DID (Arkhangelsky et al. 2021).	`'auto'`
`estimator`	`str`	For staggered DID: `'dr'` (doubly robust), `'ipw'`, `'reg'`.	`'dr'`
`control_group`	`str`	For staggered DID: `'nevertreated'` or `'notyettreated'`.	`'nevertreated'`
`base_period`	`str`	For staggered DID: `'universal'` or `'varying'`.	`'universal'`
`cluster`	`str`	Cluster variable for standard errors.	`None`
`robust`	`bool`	HC1 robust standard errors (2×2 / DDD only).	`True`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`weights`	`str`	Column name for analytical weights (e.g. population weights). Supported for `'2x2'`, `'ddd'`, and event study methods. Equivalent to Stata's `[aweight=...]`.	`None`
`subgroup`	`str`	For DDD: binary affected-subgroup indicator.	`None`
`treat_unit`	`optional`	For SDID: treated unit(s).	`None`
`treat_time`	`optional`	For SDID: treatment time.	`None`
`se_method`	`str`	For SDID: 'placebo', 'bootstrap', or 'jackknife'.	`'placebo'`
`aggregation`	`str`	When set and `method` is Callaway–Sant'Anna, the raw ATT(g,t) result is passed through :func:`aggte` with `type=aggregation` (`'simple'`, `'dynamic'`, `'group'`, or `'calendar'`), delivering the aggregated ATT with Mammen multiplier-bootstrap uniform confidence bands in a single call.	`None`
`n_boot`	`int`	Bootstrap replications for the multiplier bootstrap when `aggregation` is set.	`1000`
`random_state`	`int`	Seed for the multiplier bootstrap.	`None`
`panel`	`bool`	Forwarded to :func:`callaway_santanna`; set `panel=False` for repeated cross-sections.	`True`
`anticipation`	`int`	Forwarded to :func:`callaway_santanna`.	`0`

Returns:

Type	Description
`CausalResult`	Estimation results with `.summary()`, `.plot()`, `.to_latex()`, `.cite()` methods.

References

Callaway, B. and Sant'Anna, P. H. C. (2021). Difference-in-differences with multiple time periods. Journal of Econometrics, 225(2), 200-230. [@callaway2021difference]

Sant'Anna, P. H. C. and Zhao, J. (2020). Doubly Robust Difference-in-Differences Estimators. Journal of Econometrics, 219(1), 101-122. [@santanna2020doubly]

Goodman-Bacon, A. (2021). Difference-in-differences with variation in treatment timing. Journal of Econometrics, 225(2), 254-277. [@goodmanbacon2021difference]

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)

Classic 2x2 DID (one binary treatment, two periods):

>>> n = 200
>>> df = pd.DataFrame({
...     'treated': np.repeat(rng.integers(0, 2, n), 2),
...     'post': np.tile([0, 1], n),
... })
>>> df['wage'] = (1.0 + 2.0 * df['treated'] + 1.5 * df['post']
...               + 3.0 * df['treated'] * df['post']
...               + rng.normal(0, 1, len(df)))
>>> result = sp.did(df, y='wage', treat='treated', time='post')
>>> bool(result.estimate > 0)
True

Triple Differences (a third dimension via subgroup):

>>> df['low_wage'] = np.repeat(rng.integers(0, 2, n), 2)
>>> ddd = sp.did(df, y='wage', treat='treated', time='post',
...              method='ddd', subgroup='low_wage')

Staggered DID (cohort column gives first-treatment time; 0 = never):

>>> rows = []
>>> for u in range(30):
...     first = int(rng.choice([2003, 2005, 0]))
...     for yr in range(2000, 2008):
...         on = 1 if (first != 0 and yr >= first) else 0
...         rows.append({'unit': u, 'year': yr, 'first_treat': first,
...                      'y': 5 + 2.0 * on + rng.normal(0, 1)})
>>> panel = pd.DataFrame(rows)
>>> staggered = sp.did(panel, y='y', treat='first_treat',
...                    time='year', id='unit')
>>> bool(staggered.estimate == staggered.estimate)  # finite
True

statspai.did¶

did ¶

DIDAnalysis dataclass ¶

summary ¶

plot ¶

HarvestDIDResult dataclass ¶

SensitivityResult dataclass ¶

summary ¶

plot ¶

CSReport dataclass ¶

to_text ¶

plot ¶

to_markdown ¶

to_excel ¶

to_latex ¶

ParallelTrendsRobustnessResult dataclass ¶

summary ¶

to_latex ¶

plot ¶

DIDInputTypeError ¶

did_analysis ¶

bacon_decomposition ¶

bjs_pretrend_joint ¶

cohort_anchored_event_study ¶

design_robust_event_study ¶

gardner_did ¶

harvest_did ¶

breakdown_m ¶

aggte_from_influence ¶

influence_functions ¶

did_misclassified ¶

dl_propensity_score ¶

overlap_weighted_did ¶

bacon_plot ¶

cohort_event_study_plot ¶

did_plot ¶

did_summary_plot ¶

enhanced_event_study_plot ¶

ggdid ¶

group_time_plot ¶

parallel_trends_plot ¶

sensitivity_plot ¶

treatment_rollout_plot ¶

pretrends_power ¶

pretrends_summary ¶

pretrends_test ¶

sensitivity_rr ¶

cs_report ¶

parallel_trends_robustness ¶

did_report ¶

did_summary ¶

did_summary_to_latex ¶

did_summary_to_markdown ¶

drdid ¶

etwfe ¶

etwfe_emfx ¶

twfe_decomposition ¶

did ¶

`statspai.did`¶

DIDAnalysis `dataclass` ¶

HarvestDIDResult `dataclass` ¶

SensitivityResult `dataclass` ¶

CSReport `dataclass` ¶

ParallelTrendsRobustnessResult `dataclass` ¶