Choosing a DID estimator¶
StatsPAI ships 18 DID variants. This guide is a decision tree: read the first question, jump to the section it sends you to, and stop when you have a recommendation. Every answer is grounded in the published literature.
0. TL;DR flowchart¶
Is your panel staggered (units get treated at different times)?
NO -> classic 2x2 DID (sp.did)
+-> Optional robustness: sp.honest_did, sp.drdid
YES -> Is your treatment effect HOMOGENEOUS across cohorts?
UNKNOWN -> sp.bacon_decomposition to find out
YES -> TWFE is fine, but CS/SA/Wooldridge also work
NO -> Do NOT use TWFE. See "Staggered + heterogeneous"
1. Two-period, two-group ("2x2 DID")¶
| Use case | Recommended call |
|---|---|
| Standard 2-period 2-group panel | sp.did(df, y='y', treat='treated', time='t', post='post') |
| With covariates, doubly-robust | sp.drdid(df, y='y', d='d', post='post', covariates=[...]) |
| Repeated cross-section (no panel match) | sp.drdid(..., panel=False) or sp.did(..., repeated_cs=True) |
| Unit-by-time cell-level data (DDD) | sp.ddd(df, y='y', t1='state', t2='age', t3='year', ...) |
Minimum viable robustness suite for 2x2 DID:
r = sp.did(df, y='y', treat='treated', time='t', post='post')
r.next_steps() # model-specific checklist
sp.honest_did(r, max_M=0.2) # Rambachan-Roth sensitivity
sp.pretrends_test(r) # pre-treatment placebo
2. Staggered adoption¶
Staggered = units get treated at different calendar times. With staggered adoption, classic TWFE:
is biased whenever treatment effects are heterogeneous across cohorts (Goodman-Bacon 2021; de Chaisemartin & D'Haultfoeuille 2020). Diagnose first:bacon = sp.bacon_decomposition(df, y='y', treat='treat',
time='t', id='i')
# If most weight goes to "Earlier vs Later Treated" comparisons,
# TWFE is contaminated by already-treated units acting as controls.
2a. Staggered + homogeneous effects¶
TWFE is fine here. But CS / SA / Wooldridge are all also unbiased, and give you event-study flexibility for free. There's no reason to pick TWFE over them.
2b. Staggered + heterogeneous effects¶
| Scenario | Pick |
|---|---|
| You want group-time ATT(g,t) + event study | sp.callaway_santanna(df, y, g, t, i) |
| Heavy-weight covariates | sp.callaway_santanna(..., x=[...], estimator='dr') |
| Sun-Abraham interaction-weighted event | sp.sun_abraham(df, y, g, t, i) |
| Imputation-style (BJS untreated-only TWFE) | sp.did(df, y='y', treat='first_treat', time='t', id='i', method='bjs') |
| Two-stage regression (event study + covariate ix) | sp.gardner_did(df, y=..., group=..., time=..., first_treat=..., event_study=True) |
| One-call harvesting + precision-weighted | sp.harvest_did(df, outcome=..., unit=..., time=..., cohort=...) |
| Two-way Mundlak / ETWFE | sp.wooldridge_did(df, y, group, time, first_treat) |
| Always-treated + never-treated only | sp.stacked_did(df, y, g, t, i, event_window=6) |
| Continuous / dose treatment | sp.continuous_did(df, y, d, t, i) |
| Changes-in-changes (CIC, not DID-in-mean) | sp.cic(df, y, g, t) |
| de Chaisemartin-D'Haultfoeuille | sp.did_multiplegt(df, y, treat, g, t, i) |
Default recommendation when in doubt: sp.callaway_santanna(..., estimator='dr').
Doubly-robust CS is the modern "no-regret" default — it's robust to both
outcome-model and propensity-score misspecification, and its aggregation
weights (sp.aggte) let you switch between simple, group-weighted,
calendar-weighted, and event-study ATT without refitting.
2c. Event study with TWFE (legacy)¶
If you must use TWFE event studies for a reviewer:
sp.event_study(df, y='y', d='d', t='t', i='i',
method='twfe', # naive; prints warning if staggered
pretrend_test=True)
method='sun_abraham' or run sp.sun_abraham directly.
3. Sensitivity and robustness¶
Always run the three-step robustness suite for a publication-quality DID result:
# 1. Pre-trend test (are pre-treatment coefficients near zero?)
sp.pretrends_test(r, alpha=0.05)
# 2. Honest DID (Rambachan-Roth): how much pre-trend violation can the
# causal conclusion survive?
hd = sp.honest_did(r, max_M=0.5, method='smoothness')
# 3. Full robustness report: combines pre-trend, placebo, leave-one-cohort-out
sp.robustness_report(r)
Optional but highly recommended for DID papers:
- sp.bjs_pretrend_joint(r): Borusyak-Jaravel-Spiess joint pre-trend
test (addresses multiple-testing issue in per-lag tests).
- sp.bacon_decomposition: shows which 2x2 comparisons drive your
TWFE estimate.
4. When to avoid DID entirely¶
DID is the wrong tool if:
- Treatment is confounded by pre-trends: use matched DID
(
sp.drdid) or synthetic control (sp.synth). - Only one treated unit: use
sp.synthorsp.causal_impact. - Treatment is continuous dose, not 0/1 onset: use
sp.continuous_didorsp.bunching(if at a threshold). - Anticipation effects exist: use
anticipation=hparameter in CS2021 to backdate the reference period.
4.5 Frontier estimators (tracked, partial, or not-yet-landed)¶
Several post-2020 DiD advances are either partially shipped or on the roadmap. Using them today means knowing what you're getting:
| What you want | Current state | Tracked in |
|---|---|---|
| Continuous-dose DiD — heuristic | sp.continuous_did(method='att_gt') dose-quantile 2×2 rollup; not CGS (2024) ATT(d|g,t) |
docs/rfc/continuous_did_cgs.md |
| Continuous-dose DiD — CGS (2024) ATT(d|g,t) / ACRT | sp.continuous_did(method='cgs') MVP exists; not yet reference-parity with R contdid, OR-only, bootstrap SE |
docs/rfc/continuous_did_cgs.md |
| On/off switching — dCDH (2020) DID_M | sp.did_multiplegt — pair rollup + joint placebo + avg cumulative (2024 overlay) |
shipped |
On/off switching — dCDH (2024) _dyn event-study |
sp.did_multiplegt_dyn(...) experimental MVP exists; not yet paper-parity, cluster-bootstrap SE, switch-on only |
docs/rfc/multiplegt_dyn.md |
| LP-DiD (Dube, Girardi, Jordà, Taylor 2023) | Not yet implemented | docs/rfc/did_roadmap_gap_audit.md §4 |
| Triple-difference, heterogeneity-robust | sp.ddd textbook only; Olden-Møen / Strezhnev variants pending |
docs/rfc/did_roadmap_gap_audit.md §4 |
| Time-varying covariates DiD (Caetano et al. 2022) | Not yet implemented | docs/rfc/did_roadmap_gap_audit.md §4 |
Why dCDH (2020) and dCDH (2024) are not the same estimator: the 2024
_dynversion does a direct long-difference event-study with "not-yet-treated at horizonl" as the per-horizon control group, with its own influence-function variance.sp.did_multiplegt(dynamic=H)is the 2020 pair rollup extended to H horizons — numerically close in simple DGPs, but different in identification, control construction, and inference. Usesp.did_multiplegt_dynonly when you explicitly accept its current experimental/MVP limitations.
Until frontier items land with reference-parity tests, do not cite their
MVP outputs as fully paper-faithful CGS (2024) / dCDH (2024) estimates.
The current stable heuristics remain dose-bin / pair-rollup estimators;
the MVPs are useful for API and workflow development, but their
identification details and variance formulas are still tracked in the
RFCs with [待核验] markers.
5. Reading the output¶
All DID estimators in StatsPAI return a CausalResult. The common
interface:
r.estimate # Point estimate of the main estimand (usually ATT)
r.se # Standard error (clustered at unit level by default)
r.ci # (lower, upper) tuple for 95% CI
r.tidy() # Long-format table (broom-compatible): main, event_study,
# group_time rows all in one DataFrame
r.glance() # One-row model-level summary (nobs, pretrend pvalue, etc.)
r.plot() # Auto-selects event-study / trajectory / coefplot
r.summary() # Human-readable summary
r.next_steps() # Prioritised robustness checklist
r.cite() # BibTeX for the underlying paper
For Agents¶
Pre-conditions - data is panel or repeated cross-section with a time column - treat column is binary (0/1) for 2x2, or first-treatment-period (int) for staggered - at least one pre-treatment period (≥ 2 periods for 2x2; ≥ 3 recommended for event study) - for staggered designs: id column identifying units across time
Identifying assumptions - Parallel trends: treated and control groups would have followed the same trajectory absent treatment - No anticipation: outcomes in pre-treatment periods are unaffected by future treatment - SUTVA: no spillovers between units - For staggered / heterogeneous effects: use CS or SA — TWFE can produce negative weights (Goodman-Bacon)
Failure modes → recovery
| Symptom | Exception | Remedy | Try next |
|---|---|---|---|
| Pre-trend joint test p < 0.05 (or underpowered at 0.10) | AssumptionViolation |
Use sp.sensitivity_rr (Rambachan & Roth honest CI) or switch to sp.callaway_santanna. | sp.sensitivity_rr |
| Staggered treatment timing with TWFE method | AssumptionWarning |
TWFE can give negative weights; use Callaway-Sant'Anna, Sun-Abraham, or BJS imputation. | sp.callaway_santanna |
| Pre-trend test underpowered (Roth 2022) | AssumptionWarning |
Check sp.pretrends_power — if low, report honest CI via sp.sensitivity_rr. | sp.sensitivity_rr |
| Few clusters at unit level | AssumptionWarning |
Use wild cluster bootstrap (sp.wild_cluster_bootstrap). | sp.wild_cluster_bootstrap |
Alternatives (ranked)
- sp.callaway_santanna
- sp.sun_abraham
- sp.did_imputation
- sp.sdid
- sp.synth
Typical minimum N: 50
For Agents¶
Pre-conditions - panel data with unit × time × outcome - g column is integer: first-treated period or 0 for never-treated - at least one never-treated or late-treated control group - ≥ 2 pre-treatment periods per cohort - data is panel or repeated cross-section with a time column - treat column is binary (0/1) for 2x2, or first-treatment-period (int) for staggered - at least one pre-treatment period (≥ 2 periods for 2x2; ≥ 3 recommended for event study) - for staggered designs: id column identifying units across time
Identifying assumptions - Parallel trends conditional on X (if covariates supplied) - No anticipation (or adjust via anticipation= parameter) - Overlap: positive propensity for each cohort - SUTVA - Parallel trends: treated and control groups would have followed the same trajectory absent treatment - No anticipation: outcomes in pre-treatment periods are unaffected by future treatment - SUTVA: no spillovers between units - For staggered / heterogeneous effects: use CS or SA — TWFE can produce negative weights (Goodman-Bacon)
Failure modes → recovery
| Symptom | Exception | Remedy | Try next |
|---|---|---|---|
| Pre-trend test on aggregated ATT(g,t) rejects | AssumptionViolation |
Use sp.sensitivity_rr for honest CI, or add covariates for conditional parallel trends. | sp.sensitivity_rr |
| Cohort with only one unit — insufficient variation | DataInsufficient |
Aggregate small cohorts or drop; check sp.diagnose_result. | |
| All units treated at the same time (no staggering) | MethodIncompatibility |
Fall back to 2x2 DID via sp.did(method='2x2'). | sp.did |
| Pre-trend joint test p < 0.05 (or underpowered at 0.10) | AssumptionViolation |
Use sp.sensitivity_rr (Rambachan & Roth honest CI) or switch to sp.callaway_santanna. | sp.sensitivity_rr |
| Staggered treatment timing with TWFE method | AssumptionWarning |
TWFE can give negative weights; use Callaway-Sant'Anna, Sun-Abraham, or BJS imputation. | sp.callaway_santanna |
| Pre-trend test underpowered (Roth 2022) | AssumptionWarning |
Check sp.pretrends_power — if low, report honest CI via sp.sensitivity_rr. | sp.sensitivity_rr |
| Few clusters at unit level | AssumptionWarning |
Use wild cluster bootstrap (sp.wild_cluster_bootstrap). | sp.wild_cluster_bootstrap |
Alternatives (ranked)
- sp.sun_abraham
- sp.did_imputation
- sp.sdid
- sp.did
- sp.callaway_santanna
- sp.synth
Typical minimum N: 50