Skip to content

Staggered Difference-in-Differences — Callaway & Sant'Anna (2021)

StatsPAI implements the Callaway–Sant'Anna estimator from first principles, matching the R did package's core functionality while adding new convenience layers on top.

Basic usage

import statspai as sp

cs = sp.callaway_santanna(
    df,
    y='earnings',        # outcome
    g='first_treat',     # first-treatment period (0 = never-treated)
    t='year',            # time period
    i='worker_id',       # unit id
    estimator='dr',      # 'dr' (default), 'ipw', or 'reg'
    control_group='nevertreated',  # or 'notyettreated'
    anticipation=0,      # periods of anticipation (CS2021 §3.2)
)

print(cs.summary())
cs.detail              # one row per (group, time) with ATT + pointwise CI
cs.model_info['event_study']   # event-study aggregation
cs.model_info['pretrend_test'] # joint Wald pre-trend test

Aggregation with uniform bands

The raw callaway_santanna() result is a grid of ATT(g, t) estimates. Collapse to a scalar or an event-study curve with aggte(), which layers the Mammen (1993) multiplier bootstrap on top and returns simultaneous confidence bands:

es = sp.aggte(cs, type='dynamic',
              n_boot=500, random_state=0,
              balance_e=3)        # balance across cohorts for e ≤ 3

print(es.detail)
# relative_time  att  se  ci_lower  ci_upper  cband_lower  cband_upper ...

The cband_lower / cband_upper columns give a sup-t uniform band — valid for simultaneous inference across the entire event window, unlike the pointwise CI.

Other aggregation types:

type= Meaning
'simple' cohort-share-weighted overall ATT
'dynamic' event-study curve ATT(e)
'group' per-cohort average ATT(g)
'calendar' per-calendar-time ATT(t)

Repeated cross-sections

Pass panel=False when observations are not matched across time (e.g. CPS pooled cross-sections). The estimator switches to the unconditional 2×2 cell-mean DID with observation-level influence functions; downstream aggte, cs_report, ggdid, and honest_did all work unchanged.

cs_rcs = sp.callaway_santanna(
    survey_df,
    y='wage', g='first_treat', t='year', i='respondent_id',
    estimator='reg',         # only 'reg' supported in RCS mode
    x=['age', 'education'],  # optional covariate residualisation
    panel=False,
)

Sensitivity — Rambachan & Roth (2023)

Every event-study result (from CS, SA, BJS, or aggte) feeds into the Rambachan–Roth sensitivity framework:

sens = sp.honest_did(es, e=2)     # robust CI at e=2 across an M grid
m_star = sp.breakdown_m(es, e=2)  # largest M* under which effect is significant

One-call report

For a ready-to-publish summary — raw estimation + four aggregations with uniform bands + pre-trend Wald + R-R breakdown M* per post event time — call cs_report().

For Agents

Pre-conditions - panel data with unit × time × outcome - g column is integer: first-treated period or 0 for never-treated - at least one never-treated or late-treated control group - ≥ 2 pre-treatment periods per cohort - data is panel or repeated cross-section with a time column - treat column is binary (0/1) for 2x2, or first-treatment-period (int) for staggered - at least one pre-treatment period (≥ 2 periods for 2x2; ≥ 3 recommended for event study) - for staggered designs: id column identifying units across time

Identifying assumptions - Parallel trends conditional on X (if covariates supplied) - No anticipation (or adjust via anticipation= parameter) - Overlap: positive propensity for each cohort - SUTVA - Parallel trends: treated and control groups would have followed the same trajectory absent treatment - No anticipation: outcomes in pre-treatment periods are unaffected by future treatment - SUTVA: no spillovers between units - For staggered / heterogeneous effects: use CS or SA — TWFE can produce negative weights (Goodman-Bacon)

Failure modes → recovery

Symptom Exception Remedy Try next
Pre-trend test on aggregated ATT(g,t) rejects AssumptionViolation Use sp.sensitivity_rr for honest CI, or add covariates for conditional parallel trends. sp.sensitivity_rr
Cohort with only one unit — insufficient variation DataInsufficient Aggregate small cohorts or drop; check sp.diagnose_result.
All units treated at the same time (no staggering) MethodIncompatibility Fall back to 2x2 DID via sp.did(method='2x2'). sp.did
Pre-trend joint test p < 0.05 (or underpowered at 0.10) AssumptionViolation Use sp.sensitivity_rr (Rambachan & Roth honest CI) or switch to sp.callaway_santanna. sp.sensitivity_rr
Staggered treatment timing with TWFE method AssumptionWarning TWFE can give negative weights; use Callaway-Sant'Anna, Sun-Abraham, or BJS imputation. sp.callaway_santanna
Pre-trend test underpowered (Roth 2022) AssumptionWarning Check sp.pretrends_power — if low, report honest CI via sp.sensitivity_rr. sp.sensitivity_rr
Few clusters at unit level AssumptionWarning Use wild cluster bootstrap (sp.wild_cluster_bootstrap). sp.wild_cluster_bootstrap

Alternatives (ranked) - sp.sun_abraham - sp.did_imputation - sp.sdid - sp.did - sp.callaway_santanna - sp.synth

Typical minimum N: 50