Skip to content

statspai.did

did

Difference-in-Differences (DID) module for StatsPAI.

Provides estimators for: - Classic 2×2 DID (two groups, two periods) - Triple Differences / DDD (two groups, two periods, within-unit subgroup) - Callaway & Sant'Anna (2021) — staggered DID with DR/IPW/REG - Sun & Abraham (2021) — interaction-weighted event study - Synthetic DID (Arkhangelsky et al. 2021) - Goodman-Bacon (2021) — TWFE decomposition diagnostic - Honest DID (Rambachan & Roth 2023) — parallel trends sensitivity - de Chaisemartin & D'Haultfoeuille (2020) — DID with treatment switching - Borusyak, Jaravel & Spiess (2024) — imputation DID estimator - Stacked DID (Cengiz, Dube, Lindner & Zipperer, 2019) - did_analysis() — one-call DID workflow - Wooldridge (2021) — extended TWFE with cohort × time interactions - Sant'Anna & Zhao (2020) — doubly robust DID - TWFE decomposition — Bacon (2021) + de Chaisemartin–D'Haultfoeuille (2020) weights

CSReport dataclass

Structured output of :func:cs_report.

Attributes are plain pandas objects so downstream users can export to LaTeX, Markdown, or Excel without any custom converters.

to_text

to_text() -> str

Return the human-readable report as a single string.

plot

plot(figsize=(14, 10), suptitle: Optional[str] = None)

Render a 2×2 summary figure of the report.

The four quadrants show:

  • Top-left: event study (dynamic) with uniform confidence band
  • Top-right: θ(g) per-cohort aggregation with uniform band
  • Bottom-left: θ(t) per-calendar-time aggregation
  • Bottom-right: Rambachan–Roth breakdown M* across post event times

Requires matplotlib. Returns (fig, axes).

to_markdown

to_markdown(float_format: str = '%.4f') -> str

Render the report as GitHub-Flavoured Markdown.

Suitable for pasting directly into a pull request, blog post, or Jupyter notebook Markdown cell.

to_excel

to_excel(path, float_format: Optional[str] = '%.6f', engine: Optional[str] = None) -> str

Dump the report to a multi-sheet Excel workbook.

Creates one sheet per block — Summary, Dynamic, Group, Calendar, Breakdown, Meta — so downstream Excel consumers (policy briefs, regulatory reports) can link to or copy from the individual tables directly.

Parameters:

Name Type Description Default
path str | Path

Destination .xlsx path.

required
float_format str

Passed through to :meth:pandas.DataFrame.to_excel. Pass None to preserve full precision.

'%.6f'
engine str

Excel writer engine ('openpyxl' or 'xlsxwriter'). If None pandas picks an installed one; raises a clear ImportError here if none is available.

None

Returns:

Type Description
str

The path written.

to_latex

to_latex(float_format: str = '%.4f', caption: Optional[str] = None, label: Optional[str] = None) -> str

Render the report as a LaTeX fragment.

Uses the booktabs package for each sub-table and wraps the result in a single table float. Requires \usepackage{booktabs} in the preamble of the consuming document.

DIDAnalysis dataclass

Bundled results from a full DID analysis workflow.

summary

summary() -> str

Print comprehensive analysis summary.

plot

plot(**kwargs)

Plot event study if available, else main result.

SensitivityResult dataclass

Result of Rambachan & Roth (2023) sensitivity analysis.

Attributes:

Name Type Description
mbar_grid ndarray

Grid of M-bar values tested.

ci_lower ndarray

Lower bound of the honest CI at each M-bar.

ci_upper ndarray

Upper bound of the honest CI at each M-bar.

breakdown_mbar float

Smallest M-bar for which the CI includes zero (sign reversal).

att float

Point estimate of the ATT.

att_se float

Standard error of the ATT.

method str

Extrapolation method used ('C-LF').

alpha float

Significance level.

Methods:

Name Description
summary

Print a formatted summary table.

plot

Matplotlib sensitivity plot (M-bar vs CI).

summary

summary() -> str

Return a formatted summary string.

plot

plot(ax=None, figsize=(8, 5), **kwargs)

Sensitivity plot: M-bar on x-axis, honest CI band on y-axis.

Parameters:

Name Type Description Default
ax matplotlib Axes
None
figsize tuple
(8, 5)
**kwargs passed to ``ax.fill_between``.
{}

Returns:

Type Description
Axes

overlap_weighted_did

overlap_weighted_did(data: DataFrame, *, y: str, treat: str, time: str, covariates: Optional[Sequence[str]] = None, ps_model: Any = 'logit', alpha: float = 0.05) -> CausalResult

Overlap-weighted 2x2 DID.

Parameters:

Name Type Description Default
data DataFrame

Two-period panel with a binary treat indicator and a binary post/pre time indicator.

required
y str
required
treat str
required
time str
required
covariates sequence of str

Pre-treatment covariates for the propensity score. If omitted, reduces to standard (unweighted) 2x2 DID.

None
ps_model ('logit', 'gbm', 'dl')

How to estimate e(X) = P(treat=1 | X). 'dl' uses :func:dl_propensity_score.

'logit'
alpha float
0.05

Returns:

Type Description
CausalResult

estimand = 'ATT (overlap)'. Uses a sandwich-style bootstrap-ready SE derived from weighted residuals.

References

Li, Morgan & Zaslavsky (JASA 2018). "Overlap-weighted difference-in-differences" (Economics Letters 2025).

dl_propensity_score

dl_propensity_score(data: DataFrame, *, treatment: str, covariates: Sequence[str], hidden_sizes: Sequence[int] = (64, 32), max_iter: int = 300, random_state: int = 0) -> ndarray

Neural-net propensity score with balance-targeted loss.

Fits a small multi-layer perceptron e(X) = P(T=1 | X); if torch is available uses a proper MLP, otherwise falls back to scikit-learn's :class:MLPClassifier (lbfgs optimiser, ReLU).

Parameters:

Name Type Description Default
data DataFrame
required
treatment str
required
covariates sequence of str
required
hidden_sizes sequence of int
(64, 32)
max_iter int
300
random_state int
0

Returns:

Type Description
ndarray of shape (n,)

Estimated propensity scores clipped to (0.02, 0.98).

References

Peng, Li, Wu & Li (arXiv:2404.04794, 2024). [@peng2024local]

cs_report

cs_report(data_or_result, y: Optional[str] = None, g: Optional[str] = None, t: Optional[str] = None, i: Optional[str] = None, x: Optional[List[str]] = None, estimator: str = 'dr', control_group: str = 'nevertreated', anticipation: int = 0, alpha: float = 0.05, n_boot: int = 1000, random_state: Optional[int] = 0, min_e: float = -inf, max_e: float = inf, rr_method: str = 'smoothness', verbose: bool = True, save_to: Optional[str] = None) -> CSReport

One-call staggered-DID workflow: estimate → aggregate → sensitivity.

Parameters:

Name Type Description Default
data_or_result DataFrame | CausalResult

Either a long-format panel (then y, g, t, i are required and :func:callaway_santanna is run first), or an already-fitted :func:callaway_santanna result.

required
y str

Outcome / cohort / time / unit id columns (required when data_or_result is a DataFrame).

None
g str

Outcome / cohort / time / unit id columns (required when data_or_result is a DataFrame).

None
t str

Outcome / cohort / time / unit id columns (required when data_or_result is a DataFrame).

None
i str

Outcome / cohort / time / unit id columns (required when data_or_result is a DataFrame).

None
x list of str

Covariates for conditional parallel trends.

None
estimator ('dr', 'ipw', 'reg')
'dr'
control_group ('nevertreated', 'notyettreated')
'nevertreated'
anticipation int
0
alpha float
0.05
n_boot int

Multiplier-bootstrap replications for :func:aggte.

1000
random_state int

Seed for the bootstrap (set to None for non-reproducibility).

0
min_e float

Event-time window passed to the dynamic aggregation.

(-inf, inf)
max_e float

Event-time window passed to the dynamic aggregation.

(-inf, inf)
rr_method ('smoothness', 'relative_magnitude')

Sensitivity restriction handed to :func:breakdown_m.

'smoothness'
verbose bool

If True, print the report before returning.

True
save_to str

When set, treats the value as a path prefix and writes the report in every supported format in one call:

  • <prefix>.txt — fixed-width plain-text report
  • <prefix>.md — GitHub-flavoured Markdown
  • <prefix>.tex — booktabs LaTeX fragment
  • <prefix>.xlsx — multi-sheet workbook
  • <prefix>.png — 2×2 summary figure (only if matplotlib is installed; silently skipped otherwise)

Missing parent directories are created on the fly.

None

Returns:

Type Description
CSReport

Structured container; call .to_text() to re-render.

Examples:

>>> import statspai as sp
>>> rpt = sp.did.cs_report(
...     df, y='y', g='g', t='t', i='id', random_state=42)
>>> rpt.dynamic           # event-study DataFrame w/ uniform bands
>>> rpt.breakdown         # R-R breakdown M* per post event time

bjs_pretrend_joint

bjs_pretrend_joint(result: CausalResult, data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, cluster: Optional[str] = None, horizon: Optional[List[int]] = None, n_boot: int = 300, seed: Optional[int] = None) -> Dict[str, Any]

Cluster-bootstrap joint Wald test for BJS pre-treatment coefficients.

Parameters:

Name Type Description Default
result CausalResult

Output of :func:did_imputation on data with a non-trivial horizon that covers negative values. Only its model_info['event_study'] frame is consulted, to look up the observed pre-period point estimates that we re-test with a covariance-aware statistic.

required
data DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
y DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
group DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
time DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
first_treat DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
controls DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
cluster DataFrame

Same arguments you passed to the original :func:did_imputation call. Needed to re-run BJS on each cluster-bootstrap resample.

required
horizon list of int

If omitted, inferred from result.model_info['event_study'].

None
n_boot int

Cluster-bootstrap replications. Clusters are sampled with replacement; unit ids are reassigned in the resampled frame so BJS refits cleanly.

300
seed int

RNG seed for reproducibility.

None

Returns:

Type Description
dict

{'statistic', 'df', 'pvalue', 'method', 'n_boot', 'pre_cov'} where pre_cov is the bootstrap covariance matrix used for the Wald quadratic form.

Notes

Cost: n_boot full BJS re-fits. On a 10 000-row balanced panel with |horizon|=10, expect roughly n_boot × 0.3 s = 90 s for the default n_boot=300 — the function is therefore opt-in, not run by default inside :func:did_imputation.

bacon_decomposition

bacon_decomposition(data: DataFrame, y: str, treat: str, time: str, id: str, alpha: float = 0.05) -> Dict[str, Any]

Goodman-Bacon (2021) decomposition of the TWFE DID estimator.

Decomposes the overall TWFE coefficient into a weighted sum of 2×2 DID comparisons between different treatment timing groups.

Parameters:

Name Type Description Default
data DataFrame

Balanced panel data.

required
y str

Outcome variable.

required
treat str

Binary treatment indicator (0 before treatment, 1 after).

required
time str

Time period variable.

required
id str

Unit identifier.

required
alpha float

Significance level.

0.05

Returns:

Type Description
dict

Keys: - beta_twfe: overall TWFE estimate - decomposition: pd.DataFrame with columns [type, treated, control, estimate, weight] - weighted_sum: Σ(weight × estimate) — should equal beta_twfe - n_comparisons: number of 2×2 sub-comparisons - negative_weight_share: fraction of weight on comparisons where already-treated units serve as controls (the "forbidden" comparisons that can bias TWFE)

Examples:

>>> result = bacon_decomposition(df, y='outcome', treat='treated',
...                              time='year', id='unit')
>>> print(result['decomposition'])
>>> print(f"TWFE = {result['beta_twfe']:.4f}")
>>> print(f"Negative weight share = {result['negative_weight_share']:.1%}")
Notes

The decomposition identifies three types of comparisons:

  1. Earlier vs Later treated: Units treated at time g₁ vs units treated later at g₂ (g₁ < g₂). These are "good" comparisons.
  2. Later vs Earlier treated: Units treated at g₂ vs already-treated units at g₁. These are "forbidden" — they use treated units as controls and can introduce negative weighting bias.
  3. Treated vs Never treated: Always valid comparisons.

A large negative_weight_share signals that TWFE is unreliable and a heterogeneity-robust estimator (C&S, Sun-Abraham) should be used.

See Goodman-Bacon (2021, JEcon), Theorem 1.

breakdown_m

breakdown_m(result: CausalResult, e: int = 0, method: str = 'smoothness', alpha: float = 0.05) -> float

Compute the breakdown value of M.

The breakdown M is the largest violation magnitude under which the treatment effect at relative time e remains statistically significant. Larger M = more robust.

Parameters:

Name Type Description Default
result CausalResult

DID result with event study.

required
e int

Relative time period.

0
method str
'smoothness'
alpha float
0.05

Returns:

Type Description
float

Breakdown value M. The effect is significant for all M < M.

Examples:

>>> m_star = sp.breakdown_m(r, e=0)
>>> print(f"Breakdown M* = {m_star:.4f}")
>>> # Interpretation: parallel trends can deviate by up to M* per period
>>> # and the result remains significant
Notes

Formally, M* = sup{M : 0 ∉ CI(M)}.

For the smoothness restriction with n_drift periods: M* = (|θ̂| - z_{α/2} × SE) / n_drift

See Rambachan & Roth (2023, ReStud), Definition 2.

did_analysis

did_analysis(data: DataFrame, y: str, treat: str, time: str, id: Optional[str] = None, covariates: Optional[List[str]] = None, method: str = 'auto', estimator: str = 'dr', control_group: str = 'nevertreated', run_bacon: bool = True, run_event_study: bool = True, run_sensitivity: bool = True, event_window: Optional[tuple] = None, cluster: Optional[str] = None, robust: bool = True, alpha: float = 0.05, **kwargs) -> DIDAnalysis

Comprehensive DID analysis workflow.

Runs the full DID analysis pipeline in one call: design detection, Bacon decomposition (staggered), estimation, event study, and honest_did sensitivity analysis.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
treat str

Treatment variable. For 2×2: binary (0/1). For staggered: first treatment period (0 = never treated).

required
time str

Time period variable.

required
id str

Unit identifier. Required for staggered designs.

None
covariates list of str

Control variables.

None
method str

Estimation method: 'auto', '2x2', 'cs', 'sa', 'bjs', 'sdid'.

'auto'
estimator str

For CS: 'dr', 'ipw', or 'reg'.

'dr'
control_group str

For CS/SA: 'nevertreated' or 'notyettreated'.

'nevertreated'
run_bacon bool

Run Bacon decomposition for staggered designs.

True
run_event_study bool

Run event study for dynamic effects + pre-trend test.

True
run_sensitivity bool

Run honest_did sensitivity analysis.

True
event_window tuple of (int, int)

Event study window, e.g. (-5, 5). Auto-detected if None.

None
cluster str

Cluster variable for standard errors.

None
robust bool

HC1 robust standard errors.

True
alpha float

Significance level.

0.05

Returns:

Type Description
DIDAnalysis

Bundled results with .summary(), .plot() methods.

Examples:

Classic 2×2:

>>> report = did_analysis(df, y='wage', treat='policy', time='post')
>>> print(report.summary())

Staggered — full pipeline:

>>> report = did_analysis(df, y='earnings', treat='first_treat',
...                       time='year', id='worker')
>>> print(report.summary())
>>> report.plot()

Quick estimate only (skip diagnostics):

>>> report = did_analysis(df, y='y', treat='g', time='t', id='i',
...                       run_bacon=False, run_sensitivity=False)

gardner_did

gardner_did(data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, event_study: bool = False, horizon: Optional[List[int]] = None, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Gardner (2021) two-stage DID estimator.

Parameters:

Name Type Description Default
data DataFrame

Long-format panel.

required
y str

Outcome column name.

required
group str

Unit (panel-id) column.

required
time str

Time column.

required
first_treat str

First-treatment-period column. Never-treated units should be encoded as 0, NaN, or +inf.

required
controls list of str

Additional covariates included in both stages.

None
event_study bool

If True, Stage 2 reports coefficients by relative time k = t - first_treat_i.

False
horizon list of int

Relative-time leads/lags to report when event_study=True; defaults to range(-5, 6) intersected with available support.

None
cluster str

Cluster variable for Stage-2 SEs. Defaults to group.

None
alpha float

Two-sided CI level.

0.05

Returns:

Type Description
CausalResult

.estimate is the overall ATT; .model_info['event_study'] carries the event-study dict when requested. Supplies .summary(), .cite(), and is compatible with sp.outreg2().

Notes

Identification requires the usual staggered-DID conditions (parallel trends, no anticipation) plus a linear two-way FE + additive covariate structure for the untreated potential outcome. Stage-2 standard errors cluster by unit — bootstrapping the whole two-step procedure gives a conservative covariance when covariate models are heavy.

References

Gardner, J. (2022). Two-stage differences in differences. Working paper. [@gardner2022stage]

pretrends_test

pretrends_test(result, type: str = 'wald', alpha: float = 0.05) -> Dict[str, Any]

Joint test of pre-treatment coefficients.

Tests H0: beta_pre = 0 (all pre-treatment event-study coefficients are jointly zero).

Parameters:

Name Type Description Default
result CausalResult

Event-study result containing pre-treatment estimates and SEs.

required
type ``'wald'`` or ``'f'``

'wald': chi-squared test statistic. 'f': scaled F-statistic (requires df_resid in model_info).

'wald'
alpha float

Significance level.

0.05

Returns:

Type Description
dict

Keys: statistic, pvalue, df, type, reject, interpretation.

References

Standard Wald test; see Roth (2022) for caveats on interpretation.

Examples:

>>> import statspai as sp
>>> result = sp.event_study(df, y='y', treat='g', time='t', id='i')
>>> sp.pretrends_test(result)

pretrends_power

pretrends_power(result, delta: Optional[ndarray] = None, alpha: float = 0.05) -> Dict[str, Any]

Power of the pre-trend test against a hypothesised violation.

Implements the power calculation from Roth (2022, AER: Insights). A non-significant pre-trend test is uninformative when the test has low power against economically meaningful violations of parallel trends.

Parameters:

Name Type Description Default
result CausalResult

Event-study result with pre-treatment estimates and SEs.

required
delta array - like

Hypothesised trend violation in the pre-period (length = number of pre-periods). Default: linear trend delta[k] = (k+1) * min(|SE|) -- a violation equal to one SE at the furthest lag, declining linearly to near-zero.

None
alpha float

Significance level of the pre-trend test.

0.05

Returns:

Type Description
dict

Keys: power, noncentrality, df, delta, critical_value, warning.

References

Roth, J. (2022). Pre-test with Caution: Event-Study Estimates after Testing for Parallel Trends. AER: Insights, 4(3), 305--322. [@roth2022pretest]

Examples:

>>> import statspai as sp
>>> result = sp.event_study(df, y='y', treat='g', time='t', id='i')
>>> sp.pretrends_power(result)

sensitivity_rr

sensitivity_rr(result, Mbar: Optional[Union[ndarray, List[float]]] = None, method: str = 'C-LF', alpha: float = 0.05, n_grid: int = 20) -> SensitivityResult

Rambachan & Roth (2023) honest confidence intervals.

Computes confidence intervals for the ATT that are valid under bounded departures from parallel trends. The conditional linear-in-relative-time (C-LF) restriction assumes the post-treatment violation is bounded by a linear extrapolation of the pre-trend plus an additional M-bar of slack.

Parameters:

Name Type Description Default
result CausalResult

Event-study result with pre- and post-treatment estimates.

required
Mbar array - like

Grid of M-bar values. Default: np.linspace(0, 3 * max_pre_slope, n_grid).

None
method ``'C-LF'``

Extrapolation method. Currently only C-LF is implemented.

'C-LF'
alpha float

Significance level.

0.05
n_grid int

Number of grid points when Mbar is not supplied.

20

Returns:

Type Description
SensitivityResult

Object with .summary(), .plot(), .mbar_grid, .ci_lower, .ci_upper, .breakdown_mbar.

References

Rambachan, A. & Roth, J. (2023). A More Credible Approach to Parallel Trends. Review of Economic Studies, 90(5), 2555--2591. [@rambachan2023more]

Examples:

>>> import statspai as sp
>>> result = sp.event_study(df, y='y', treat='g', time='t', id='i')
>>> sens = sp.sensitivity_rr(result, Mbar=[0, 0.01, 0.02, 0.05])
>>> sens.summary()
>>> sens.plot()

pretrends_summary

pretrends_summary(result, delta=None, alpha: float = 0.05) -> str

Print a combined pre-trends diagnostic report.

Runs pretrends_test and pretrends_power and formats the output in a single table.

Parameters:

Name Type Description Default
result CausalResult

Event-study result.

required
delta array - like

Passed to pretrends_power.

None
alpha float

Significance level.

0.05

Returns:

Type Description
str

Formatted report.

etwfe

etwfe(data: DataFrame, y: str, group: str, time: str, first_treat: str, controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, xvar: Optional[Any] = None, panel: bool = True, cgroup: str = 'notyet') -> CausalResult

Public sp.etwfe entry point — see _dispatch_etwfe_impl for the full docstring on options and behaviour.

Thin wrapper around the 4-branch dispatcher (panel-with-xvar / panel-never-only / panel-notyet / repeated-cross-section) that attaches a :class:Provenance record to the returned result so downstream replication_pack / Quarto appendix / table footers can pick up the call without each branch having to opt in.

etwfe_emfx

etwfe_emfx(result: CausalResult, type: str = 'simple', alpha: float = 0.05, include_leads: bool = False, weighting: str = 'cohort') -> CausalResult

R etwfe::emfx-style aggregated marginal effects for an ETWFE fit.

Takes the result of :func:etwfe / :func:wooldridge_did and returns one of four aggregations used in applied work:

================ ======================================================== type Aggregation ================ ======================================================== 'simple' Overall cohort-size-weighted ATT (same as result.estimate). 'group' ATT per treatment cohort g. 'event' ATT per event time e = t - g, averaged across cohorts. 'calendar' ATT per calendar time t, averaged across cohorts for which t >= g. ================ ========================================================

Parameters:

Name Type Description Default
result CausalResult

Output of :func:etwfe or :func:wooldridge_did.

required
type (simple, group, event, calendar)

Aggregation type.

'simple'
alpha float

Significance level for confidence intervals.

0.05
include_leads bool

For type='event' and type='calendar', whether to include pre-treatment relative times (rel_time < 0) in the output. These coefficients identify pre-trends and are informative for parallel-trends inspection. Default False for backward compatibility with earlier versions; set True for full event-study output matching the R etwfe::emfx(type='event') default. rel_time = -1 is always the reference category and is excluded.

False
weighting (cohort, treated)

Aggregation weights for cohort-level marginal effects. 'cohort' preserves the historical StatsPAI cohort-share weighting. 'treated' uses the number of treated post-period observations in each cohort, matching R etwfe::emfx(type='simple') on balanced staggered panels.

'cohort'

Returns:

Type Description
CausalResult

estimate is the overall ATT (for type='simple') or the mean of the sub-aggregation (for the other types). detail contains one row per group/event-time/calendar-time with (estimate, se, pvalue, ci_low, ci_high).

Notes

For 'event' and 'calendar', the reported SE treats the per-cohort coefficients as independent — a standard approximation that matches R etwfe's default under classical vcov. Cluster-robust or fully-general SEs require the full regression vcov, which can be requested via sp.wooldridge_did + the model_info matrix in a future release.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=10, staggered=True)
>>> fit = sp.etwfe(df, y='y', time='time',
...                first_treat='first_treat', group='unit')
>>> evt = sp.etwfe_emfx(fit, type='event')
>>> print(evt.detail)   # ATT by event time
>>> grp = sp.etwfe_emfx(fit, type='group')
>>> cal = sp.etwfe_emfx(fit, type='calendar')

drdid

drdid(data: DataFrame, y: str, group: str, time: str, covariates: Optional[List[str]] = None, method: str = 'imp', alpha: float = 0.05, n_boot: int = 500, random_state: Optional[int] = None, seed: Optional[int] = None) -> CausalResult

Doubly Robust Difference-in-Differences (Sant'Anna & Zhao 2020).

Combines outcome regression with inverse probability weighting for 2×2 DID with covariates. Consistent if either the outcome model or the propensity score model is correctly specified.

Parameters:

Name Type Description Default
data DataFrame

Dataset with one row per unit-period in 2×2 design.

required
y str

Outcome variable.

required
group str

Binary treatment-group indicator (1 = treated, 0 = control).

required
time str

Binary time indicator (1 = post, 0 = pre).

required
covariates list of str

Covariate names. If None, runs a simple (un-adjusted) DID.

None
method str

'imp' for the improved estimator (locally efficient); 'trad' for the traditional DR-DID.

``'imp'``
alpha float

Significance level.

0.05
n_boot int

Number of bootstrap replications for inference.

500
random_state int

Seed for bootstrap reproducibility.

None

Returns:

Type Description
CausalResult

estimate is the DR-DID ATT. detail contains influence-function diagnostics.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> G = rng.integers(0, 2, n)
>>> T = rng.integers(0, 2, n)
>>> x = rng.normal(0, 1, n)
>>> y_val = 1 + 0.5*x + 2*G + 3*T + 4*G*T + rng.normal(0, 1, n)
>>> df = pd.DataFrame({'y': y_val, 'treated': G, 'post': T, 'x': x})
>>> result = sp.drdid(df, y='y', group='treated', time='post',
...                   covariates=['x'])
>>> abs(result.estimate - 4.0) < 1.0
True

twfe_decomposition

twfe_decomposition(data: DataFrame, y: str, group: str, time: str, first_treat: str, alpha: float = 0.05) -> CausalResult

TWFE decomposition: Goodman-Bacon (2021) + de Chaisemartin–D'Haultfoeuille weights.

Decomposes the standard two-way fixed effects estimator into all pairwise 2×2 DID comparisons, showing the weight and estimate for each. Also computes de Chaisemartin–D'Haultfoeuille (2020) weights to diagnose whether negative weights are present.

Parameters:

Name Type Description Default
data DataFrame

Panel dataset in long format.

required
y str

Outcome variable.

required
group str

Unit identifier.

required
time str

Time period variable.

required
first_treat str

Treatment timing column (NaN or 0 for never-treated).

required
alpha float

Significance level.

0.05

Returns:

Type Description
CausalResult

detail DataFrame has columns: type, treated_cohort, control_cohort, estimate, weight, weighted_est. model_info includes summary statistics and dCDH weights.

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=8, staggered=True)
>>> result = sp.twfe_decomposition(df, y='y', group='unit',
...                                time='period',
...                                first_treat='first_treat')
>>> result.summary()

cohort_anchored_event_study

cohort_anchored_event_study(data: DataFrame, y: str, treat: str, time: str, id: str, leads: int = 4, lags: int = 4, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Cohort-anchored event-study estimator (Rambachan-Roth successor).

Parameters:

Name Type Description Default
data DataFrame

Long-format panel.

required
y str
required
treat str

First-treatment-period column (0 = never-treated).

required
time str
required
id str
required
leads int

Number of pre/post event-time periods to estimate.

4
lags int

Number of pre/post event-time periods to estimate.

4
cluster str

Cluster column for SE; defaults to id.

None
alpha float
0.05

Returns:

Type Description
CausalResult

estimate is the average post-treatment effect across event times 0..lags. Per-event-time coefficients in model_info['event_study'] (DataFrame with columns rel_time, att, se, ci_low, ci_high).

References

arXiv 2509.01829, Cohort-Anchored Robust Inference for Event-Study with Staggered Adoption (2025).

design_robust_event_study

design_robust_event_study(data: DataFrame, y: str, treat: str, time: str, id: str, leads: int = 4, lags: int = 4, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Design-robust event-study with negative-weight diagnostics.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Same conventions as :func:callaway_santanna.

required
treat str

Same conventions as :func:callaway_santanna.

required
time str

Same conventions as :func:callaway_santanna.

required
id str

Same conventions as :func:callaway_santanna.

required
leads int

Event-time window.

4
lags int

Event-time window.

4
cluster str
None
alpha float
0.05

Returns:

Type Description
CausalResult

Headline = average post-treatment effect. model_info['weights'] reports the implicit TWFE weights per (cohort, time); negative entries flag contamination.

References

Wright, C. S. (2026). arXiv 2601.18801. See design_robust_es2026 bibkey at the bottom of this module for the full citation.

did_misclassified

did_misclassified(data: DataFrame, y: str, treat: str, time: str, id: str, pi_misclass: float = 0.0, anticipation_periods: int = 0, cluster: Optional[str] = None, alpha: float = 0.05) -> CausalResult

Staggered DiD robust to timing misclassification + anticipation.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treat str
required
time str
required
id str
required
pi_misclass float in [0, 0.5]

Probability that the recorded first-treatment period g is off by ±1 (symmetric). Pass 0 to skip this correction.

0.0
anticipation_periods int

Number of leads to absorb as anticipation (subtracts the average of pre-event coefficients k = -1..-anticipation_periods from the post ATT estimate).

0
cluster str
None
alpha float
0.05

Returns:

Type Description
CausalResult

estimate is the corrected ATT; model_info reports both the naive and the corrected estimate, plus the anticipation offset and misclassification adjustment factor.

References

arXiv 2507.20415, Staggered Adoption DiD Designs with Misclassification and Anticipation (2025).

did_summary

did_summary(data: DataFrame, y: str, time: str, first_treat: str, group: str, methods: Union[str, List[str]] = 'auto', controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, include_sensitivity: bool = False, verbose: bool = False) -> CausalResult

One-call method-robustness comparison for staggered DID.

Fits every requested estimator to the same data and returns a single :class:CausalResult whose detail attribute is a tidy comparison table — one row per method, columns (method, estimator, estimate, se, pvalue, ci_low, ci_high, n_obs, note).

Parameters:

Name Type Description Default
data DataFrame

Panel dataset (long format).

required
y str

Outcome variable.

required
time str

Time / period variable (integer-valued).

required
first_treat str

First-treatment period per unit; NaN (or 0) for never-treated.

required
group str

Unit identifier.

required
methods str or list of str

Methods to run. Valid keys: 'cs', 'sa', 'bjs', 'etwfe', 'stacked', or 'all' / 'auto' for all.

``'auto'``
controls list of str

Time-varying covariates passed to methods that support them.

None
cluster str

Cluster variable for SE (defaults to group in each sub-method).

None
alpha float

Significance level for confidence intervals.

0.05
include_sensitivity bool

If True and 'cs' is among the methods fit, compute the Rambachan–Roth (2023) breakdown M* — the largest relative violation of parallel trends under which the treatment effect is still significantly different from zero. The value is added to model_info['breakdown_m'] and to the breakdown_m column of detail (CS row only; other methods leave NaN).

False
verbose bool

Print progress for each method.

False

Returns:

Type Description
CausalResult

estimate : mean of successfully-fit overall ATTs. se : standard deviation across methods (not a standard error — a crude dispersion measure). detail : comparison DataFrame described above. model_info : {'methods_requested': [...], 'methods_fit': [...], 'methods_failed': {name: error_msg, ...}}.

Notes

Each method's overall ATT has slightly different interpretation:

  • CS aggte(type='simple') averages ATT(g, t) for post-treatment :math:t \geq g, weighted by cohort size × exposure length.
  • SA / ETWFE / BJS / Stacked report cohort-size-weighted averages by construction.

Differences across methods are informative about heterogeneity, model specification, and the sensitivity of conclusions to the estimator choice. Large disagreement is a red flag that deserves further investigation (e.g., via sp.bacon_decomposition or sp.honest_did).

Examples:

>>> import statspai as sp
>>> df = sp.dgp_did(n_units=200, n_periods=10, staggered=True, seed=0)
>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit')
>>> out.summary()
>>> print(out.detail[['method', 'estimate', 'se', 'pvalue']])

did_summary_to_markdown

did_summary_to_markdown(result: CausalResult, digits: int = 4, include_ci: bool = True, include_breakdown: bool = True) -> str

Render a :func:did_summary result as a GitHub-Flavoured Markdown table.

Columns shown (in order): Method, Estimate, SE, 95 % CI, p-value, and optionally Breakdown M* (when sensitivity was requested).

Parameters:

Name Type Description Default
result CausalResult

Output of :func:did_summary.

required
digits int

Decimal precision for numeric columns.

4
include_ci bool

Include the 95 % CI column.

True
include_breakdown bool

Include the Rambachan-Roth breakdown M* column (CS row only, blank for others). Ignored if sensitivity was not requested.

True

Returns:

Type Description
str

Multi-line Markdown table, ready to paste into notebooks or PRs.

did_summary_to_latex

did_summary_to_latex(result: CausalResult, digits: int = 4, include_ci: bool = True, include_breakdown: bool = True, label: str = 'tab:did_summary', caption: str = 'DID method-robustness summary.') -> str

Render a :func:did_summary result as a LaTeX booktabs table.

Parameters:

Name Type Description Default
result CausalResult

Output of :func:did_summary.

required
digits int

Decimal precision.

4
include_ci bool

Include the 95 % CI column.

True
include_breakdown bool

Include the Rambachan-Roth breakdown M* column when sensitivity was requested.

True
label str

LaTeX label for the table.

``'tab:did_summary'``
caption str

LaTeX caption.

``'DID method-robustness summary.'``

Returns:

Type Description
str

Full \begin{table} ... \end{table} block using the booktabs package (\toprule, \midrule, \bottomrule).

Notes

Requires \usepackage{booktabs} in the LaTeX preamble.

did_report

did_report(data: DataFrame, y: str, time: str, first_treat: str, group: str, save_to: str, methods: Union[str, List[str]] = 'auto', controls: Optional[List[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, include_sensitivity: bool = True, plot_sort_by: Optional[str] = 'estimate', verbose: bool = False) -> CausalResult

DID report bundle: fits selected methods and writes report artifacts.

Writes the following files to save_to:

  • did_summary.txt : text dump of result.summary().
  • did_summary.md : GitHub-Flavoured Markdown table.
  • did_summary.tex : LaTeX booktabs fragment.
  • did_summary.png : forest plot (requires matplotlib).
  • did_summary.json : detail table + model_info in JSON.

Parameters:

Name Type Description Default
data DataFrame

Same as :func:did_summary.

required
y DataFrame

Same as :func:did_summary.

required
time DataFrame

Same as :func:did_summary.

required
first_treat DataFrame

Same as :func:did_summary.

required
group DataFrame

Same as :func:did_summary.

required
methods DataFrame

Same as :func:did_summary.

required
controls DataFrame

Same as :func:did_summary.

required
cluster DataFrame

Same as :func:did_summary.

required
alpha DataFrame

Same as :func:did_summary.

required
save_to str

Directory path. Created if it does not exist.

required
include_sensitivity bool

Whether to run Rambachan-Roth breakdown M*. Defaults to True in did_report (vs False in did_summary) because a report is expected to be comprehensive.

True
plot_sort_by (estimate, None)

Sort the forest plot by estimate ascending.

'estimate'
verbose bool

Print progress lines.

False

Returns:

Type Description
CausalResult

The underlying :func:did_summary output. All side-effect files are written to save_to as a bundle.

parallel_trends_plot(data: DataFrame, y: str, time: str, treat: str, id: Optional[str] = None, treat_time: Optional[Union[int, float]] = None, agg: str = 'mean', labels: Optional[Dict] = None, colors: Optional[Tuple[str, str]] = None, ci: bool = True, ax=None, figsize: tuple = (10, 6), title: Optional[str] = None, **kwargs)

Plot raw outcome means over time for treatment and control groups.

The workhorse pre-analysis plot: shows whether parallel trends is plausible before running DID.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
time str

Time period variable.

required
treat str

Treatment group indicator. Binary (0/1) for 2×2, or first-treatment-period for staggered (0 = never treated).

required
id str

Unit identifier (for panel data).

None
treat_time int or float

Treatment onset time. Draws a vertical line if provided.

None
agg str

Aggregation function: 'mean' or 'median'.

'mean'
labels dict

Custom labels, e.g. {'treat': 'New Jersey', 'control': 'Pennsylvania'}.

None
colors tuple of str

Colors for (treatment, control). Default: ('#E74C3C', '#2C3E50').

None
ci bool

Show 95% confidence intervals (±1.96 SE of mean).

True
ax matplotlib Axes

Existing axes to plot on.

None
figsize tuple

Figure size.

(10, 6)
title str

Plot title.

None

Returns:

Type Description
(fig, ax)

Examples:

>>> parallel_trends_plot(df, y='wage', time='year', treat='treated',
...                      treat_time=2010)

bacon_plot

bacon_plot(bacon_result: Dict[str, Any], ax=None, figsize: tuple = (10, 6), title: Optional[str] = None, colors: Optional[Dict[str, str]] = None, **kwargs)

Scatter plot of Goodman-Bacon decomposition.

Each point is a 2×2 sub-comparison: x = weight, y = DD estimate. Color distinguishes comparison types (Treated vs Never-treated, Earlier vs Later, Later vs Already-treated).

Parameters:

Name Type Description Default
bacon_result dict

Output from bacon_decomposition(). Must contain 'decomposition' DataFrame and 'beta_twfe'.

required
ax matplotlib Axes
None
figsize tuple
(10, 6)
title str
None
colors dict

Map comparison type → color. Defaults provided.

None

Returns:

Type Description
(fig, ax)

Examples:

>>> bacon = sp.bacon_decomposition(df, y='y', treat='d',
...                                 time='t', id='i')
>>> bacon_plot(bacon)

group_time_plot

group_time_plot(result, plot_type: str = 'dot', ax=None, figsize: tuple = (12, 7), title: Optional[str] = None, color: str = '#2C3E50', sig_color: str = '#E74C3C', insig_color: str = '#BDC3C7', alpha_level: float = 0.05, **kwargs)

Plot group-time ATT estimates from Callaway-Sant'Anna.

Two modes: - 'dot' — dot plot with CI error bars, colored by significance - 'heatmap' — (group × time) heatmap of ATT magnitudes

Parameters:

Name Type Description Default
result CausalResult

Result from callaway_santanna() or did(method='cs'). Must have detail DataFrame with 'group', 'time', 'att' columns.

required
plot_type str

'dot' or 'heatmap'.

'dot'
ax matplotlib Axes
None
figsize tuple
(12, 7)
title str
None
color str

Default color for dot plot.

'#2C3E50'
sig_color str

Color for significant estimates.

'#E74C3C'
insig_color str

Color for insignificant estimates.

'#BDC3C7'
alpha_level float

Significance threshold.

0.05

Returns:

Type Description
(fig, ax)

Examples:

>>> result = sp.did(df, y='y', treat='g', time='t', id='i', method='cs')
>>> group_time_plot(result)
>>> group_time_plot(result, plot_type='heatmap')

did_plot

did_plot(data: DataFrame, y: str, time: str, treat: str, treat_time: Optional[Union[int, float]] = None, show_counterfactual: bool = True, labels: Optional[Dict] = None, colors: Optional[Tuple[str, str, str]] = None, ax=None, figsize: tuple = (10, 6), title: Optional[str] = None, annotate_effect: bool = True, **kwargs)

Classic DID diagram showing treatment effect with counterfactual.

Plots group means over time and adds a dashed counterfactual line for the treatment group (extrapolated from pre-treatment trend parallel to the control group).

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable.

required
time str

Time period variable.

required
treat str

Binary treatment group indicator (0/1).

required
treat_time int or float

Treatment onset time. If None, inferred as the midpoint.

None
show_counterfactual bool

Draw the dashed counterfactual line.

True
labels dict

Custom labels: {'treat': ..., 'control': ..., 'counterfactual': ...}.

None
colors tuple

(treatment, control, counterfactual) colors.

None
ax matplotlib Axes
None
figsize tuple
(10, 6)
title str
None
annotate_effect bool

Annotate the treatment effect arrow on the plot.

True

Returns:

Type Description
(fig, ax)

Examples:

>>> did_plot(df, y='wage', time='year', treat='treated',
...          treat_time=2010)

enhanced_event_study_plot

enhanced_event_study_plot(result, ax=None, figsize: tuple = (10, 6), title: Optional[str] = None, color: str = '#2C3E50', sig_color: Optional[str] = '#E74C3C', ci_alpha: float = 0.15, shade_pre: bool = True, shade_post: bool = True, pre_color: str = '#EBF5FB', post_color: str = '#FDEDEC', show_zero: bool = True, marker: str = 'o', markersize: int = 6, alpha_level: float = 0.05, **kwargs)

Enhanced event study plot with pre/post shading and significance coloring.

Improvement over the basic CausalResult.event_study_plot() — adds optional background shading for pre/post periods and colors significant coefficients differently.

Parameters:

Name Type Description Default
result CausalResult

DID result with event study in model_info['event_study'].

required
ax matplotlib Axes
None
figsize tuple
(10, 6)
title str
None
color str

Default color for estimates.

'#2C3E50'
sig_color str or None

Color for significant estimates. None disables coloring.

'#E74C3C'
ci_alpha float

Confidence band transparency.

0.15
shade_pre bool

Shade pre-treatment region.

True
shade_post bool

Shade post-treatment region.

True
pre_color str

Pre-treatment shading color.

'#EBF5FB'
post_color str

Post-treatment shading color.

'#FDEDEC'
show_zero bool

Show horizontal zero line.

True
marker str
'o'
markersize int
6
alpha_level float
0.05

Returns:

Type Description
(fig, ax)

Examples:

>>> result = sp.did(df, y='y', treat='g', time='t', id='i')
>>> event_study_plot(result, shade_pre=True)

treatment_rollout_plot

treatment_rollout_plot(data: DataFrame, time: str, treat: str, id: str, ax=None, figsize: tuple = (12, 7), title: Optional[str] = None, treated_color: str = '#E74C3C', untreated_color: str = '#ECF0F1', never_color: str = '#BDC3C7', sort_by: str = 'treat_time', show_cohort_labels: bool = True, **kwargs)

Visualise staggered treatment adoption timing.

Draws a tile/heatmap where each row is a unit and each column is a time period. Treated periods are shaded, making the staggered rollout pattern immediately visible.

Parameters:

Name Type Description Default
data DataFrame

Panel data with unit, time, and treatment columns.

required
time str

Time period variable.

required
treat str

First-treatment-period column (0 = never treated), or binary treatment indicator.

required
id str

Unit identifier.

required
ax matplotlib Axes
None
figsize tuple
(12, 7)
title str
None
treated_color str

Color for treated unit-periods.

'#E74C3C'
untreated_color str

Color for untreated unit-periods.

'#ECF0F1'
never_color str

Color for never-treated units.

'#BDC3C7'
sort_by str

Sort units by: 'treat_time' (earliest first), 'id', or 'random'.

'treat_time'
show_cohort_labels bool

Annotate cohort boundaries on the y-axis.

True

Returns:

Type Description
(fig, ax)

Examples:

>>> treatment_rollout_plot(df, time='year', treat='first_treat', id='state')

sensitivity_plot

sensitivity_plot(sensitivity: DataFrame, original_ci: Optional[Tuple[float, float]] = None, original_estimate: Optional[float] = None, ax=None, figsize: tuple = (10, 6), title: Optional[str] = None, color: str = '#2C3E50', breakdown_color: str = '#E74C3C', original_color: str = '#27AE60', **kwargs)

Plot Rambachan & Roth (2023) sensitivity analysis.

Shows how the robust confidence interval changes as the maximum allowed parallel trends violation (M) increases.

Parameters:

Name Type Description Default
sensitivity DataFrame

Output from honest_did(). Columns: M, ci_lower, ci_upper, rejects_zero.

required
original_ci tuple of (float, float)

Original CI (at M=0) for comparison.

None
original_estimate float

Original point estimate.

None
ax matplotlib Axes
None
figsize tuple
(10, 6)
title str
None
color str

CI band color.

'#2C3E50'
breakdown_color str

Color for the breakdown point marker.

'#E74C3C'
original_color str

Color for original estimate marker.

'#27AE60'

Returns:

Type Description
(fig, ax)

Examples:

>>> sens = sp.honest_did(result, e=0)
>>> sensitivity_plot(sens, original_estimate=result.estimate,
...                  original_ci=result.ci)

cohort_event_study_plot

cohort_event_study_plot(result, ax=None, figsize: tuple = (12, 7), title: Optional[str] = None, palette: Optional[List[str]] = None, show_aggregate: bool = True, aggregate_color: str = '#2C3E50', ci: bool = True, ci_alpha: float = 0.08, **kwargs)

Per-cohort event study plot (overlay).

Plots a separate event study line for each treatment cohort, showing heterogeneity in treatment effects across cohorts. Optionally overlays the aggregate event study.

Parameters:

Name Type Description Default
result CausalResult

Result from callaway_santanna() or did(method='cs'). Must have detail with 'group', 'relative_time', 'att' columns, and model_info['event_study'] for aggregate.

required
ax matplotlib Axes
None
figsize tuple
(12, 7)
title str
None
palette list of str

Colors for each cohort. Auto-generated if None.

None
show_aggregate bool

Overlay the aggregate event study line.

True
aggregate_color str

Color for aggregate line.

'#2C3E50'
ci bool

Show confidence intervals for each cohort.

True
ci_alpha float

CI band transparency.

0.08

Returns:

Type Description
(fig, ax)

Examples:

>>> result = sp.did(df, y='y', treat='g', time='t', id='i', method='cs')
>>> cohort_event_study_plot(result)

ggdid

ggdid(result, ax=None, figsize=(10, 6), title: Optional[str] = None, point_color: str = '#2E86AB', band_color: str = '#F18F01', show_pointwise: bool = True, show_uniform: bool = True)

Plot an aggte() result, mirroring R :func:did::ggdid.

Automatically dispatches on result.model_info['aggregation']:

  • simple : a single point with pointwise CI
  • dynamic : event-study line with pointwise CI and uniform band
  • group : horizontal bars of θ̂(g) per cohort
  • calendar : time-series of θ̂(t) per calendar period

Uniform bands (sup-t simultaneous confidence bands) are drawn from the cband_lower / cband_upper columns created by :func:aggte.

Parameters:

Name Type Description Default
result CausalResult

Output of :func:aggte.

required
ax matplotlib Axes
None
figsize tuple
(10, 6)
title str
None
point_color str

Colours for the pointwise estimate and the uniform band.

'#2E86AB'
band_color str

Colours for the pointwise estimate and the uniform band.

'#2E86AB'
show_pointwise bool

Draw pointwise CI lines.

True
show_uniform bool

Draw uniform band (shaded region).

True

Returns:

Type Description
(fig, ax)

did_summary_plot

did_summary_plot(result, ax=None, figsize: tuple = (9, 5), color: str = '#2C3E50', highlight_color: str = '#C0392B', reference: Optional[float] = None, title: Optional[str] = None, sort_by: Optional[str] = None)

Forest plot of DID method-robustness summary.

Plots each method's point estimate with its confidence interval as a horizontal errorbar. Designed to consume the CausalResult returned by :func:statspai.did.did_summary.

Parameters:

Name Type Description Default
result CausalResult

Output of :func:did_summary. Must have a detail DataFrame with columns estimate, ci_low, ci_high, and either method or estimator.

required
ax matplotlib Axes

Existing axes to draw on. If None a new figure is created.

None
figsize tuple

Figure size when creating a new figure.

``(9, 5)``
color str

Color for point estimates and CIs.

``"#2C3E50"``
highlight_color str

Color for the cross-method mean line.

``"#C0392B"``
reference float

Horizontal reference value (e.g. 0 for 'no effect'). Defaults to 0.

None
title str

Plot title. Defaults to "DID Method-Robustness Summary".

None
sort_by ('estimate', None)

If 'estimate', sort methods by point estimate ascending. Otherwise keep the order in result.detail.

'estimate'

Returns:

Type Description
(fig, ax) : matplotlib figure and axes.

Examples:

>>> out = sp.did_summary(df, y='y', time='time',
...                      first_treat='first_treat', group='unit')
>>> fig, ax = sp.did_summary_plot(out)

did

did(data: DataFrame, y: str, treat: str, time: str, id: Optional[str] = None, covariates: Optional[List[str]] = None, method: str = 'auto', estimator: str = 'dr', control_group: str = 'nevertreated', base_period: str = 'universal', cluster: Optional[str] = None, robust: bool = True, alpha: float = 0.05, weights: Optional[str] = None, subgroup: Optional[str] = None, treat_unit=None, treat_time=None, se_method: str = 'placebo', aggregation: Optional[str] = None, n_boot: int = 1000, random_state: Optional[int] = None, panel: bool = True, anticipation: int = 0, **kwargs) -> CausalResult

Difference-in-Differences estimation.

Unified entry point that auto-detects design type and dispatches to the appropriate estimator.

Parameters:

Name Type Description Default
data DataFrame

Input dataset.

required
y str

Outcome variable name.

required
treat str

Treatment variable. The column semantics depend on the design — one of the most common pitfalls in DID:

  • 2×2 DID (no id, two periods): a 0/1 group indicator (treated vs. control), identical to a standard DID dummy.
  • Staggered DID (Callaway–Sant'Anna, Sun–Abraham, SDID, Borusyak–Jaravel–Spiess, de Chaisemartin–D'Haultfœuille, Wooldridge etwfe): the first treatment period for each unit (aka the cohort / g-variable in R's did package). Never-treated units must have 0 (or NaN), not 1. A plain 0/1 indicator will silently be interpreted as "everyone was first treated in period 1," producing nonsense estimates.

If you only have a 0/1 treated column, construct first_treat per unit and broadcast it to all of that unit's rows::

# First treated year per unit (NaN for never-treated units)
first = (df.loc[df['treated'] == 1]
           .groupby('id')['year'].min())
df['first_treat'] = df['id'].map(first).fillna(0).astype(int)
sp.did(df, y='y', treat='first_treat', time='year', id='id')
required
time str

Time period variable.

required
id str

Unit identifier. Required for staggered DID and SDID.

None
covariates list of str

Covariate names for conditional parallel trends / controls.

None
method str
  • 'auto' — 2×2 if id is None and treatment is binary, else Callaway-Sant'Anna.
  • '2x2' — classic two-period, two-group DID.
  • 'ddd' — triple differences (requires subgroup).
  • 'callaway_santanna' or 'cs' — staggered DID.
  • 'sun_abraham', 'sa', or 'sunab' — IW event study.
  • 'bjs' or 'did_imputation' — Borusyak-Jaravel-Spiess imputation DID.
  • 'sdid' — synthetic DID (Arkhangelsky et al. 2021).
'auto'
estimator str

For staggered DID: 'dr' (doubly robust), 'ipw', 'reg'.

'dr'
control_group str

For staggered DID: 'nevertreated' or 'notyettreated'.

'nevertreated'
base_period str

For staggered DID: 'universal' or 'varying'.

'universal'
cluster str

Cluster variable for standard errors.

None
robust bool

HC1 robust standard errors (2×2 / DDD only).

True
alpha float

Significance level for confidence intervals.

0.05
weights str

Column name for analytical weights (e.g. population weights). Supported for '2x2', 'ddd', and event study methods. Equivalent to Stata's [aweight=...].

None
subgroup str

For DDD: binary affected-subgroup indicator.

None
treat_unit optional

For SDID: treated unit(s).

None
treat_time optional

For SDID: treatment time.

None
se_method str

For SDID: 'placebo', 'bootstrap', or 'jackknife'.

'placebo'
aggregation str

When set and method is Callaway–Sant'Anna, the raw ATT(g,t) result is passed through :func:aggte with type=aggregation ('simple', 'dynamic', 'group', or 'calendar'), delivering the aggregated ATT with Mammen multiplier-bootstrap uniform confidence bands in a single call.

None
n_boot int

Bootstrap replications for the multiplier bootstrap when aggregation is set.

1000
random_state int

Seed for the multiplier bootstrap.

None
panel bool

Forwarded to :func:callaway_santanna; set panel=False for repeated cross-sections.

True
anticipation int

Forwarded to :func:callaway_santanna.

0

Returns:

Type Description
CausalResult

Estimation results with .summary(), .plot(), .to_latex(), .cite() methods.

Examples:

Classic 2×2 DID:

>>> result = did(df, y='wage', treat='treated', time='post')

Triple Differences:

>>> result = did(df, y='emp', treat='nj', time='post',
...             method='ddd', subgroup='low_wage')

Staggered DID (Callaway & Sant'Anna):

>>> result = did(df, y='earnings', treat='first_treat',
...             time='year', id='worker_id')

Synthetic DID:

>>> result = did(df, y='gdp', treat='first_treat', time='year',
...             id='state', method='sdid',
...             treat_unit='CA', treat_time=2000)