Cookbook — recipes by research question¶
Find the method by the question you are actually asking, not by its textbook name. Each recipe is a minimal, runnable starting point; follow the linked guide or API reference for the full options.
Let StatsPAI choose
If you are unsure, sp.recommend(df, y=..., treat=...) and
sp.detect_design(df) will suggest an estimator from the data shape.
"A policy turned on for different units at different times"¶
Staggered-adoption difference-in-differences. Two-way fixed effects is biased here; use a heterogeneity-robust estimator.
import statspai as sp
df = sp.datasets.mpdta()
r = sp.callaway_santanna(df, y="lemp", g="first_treat", t="year", i="countyreal")
r.summary()
→ Choosing a DID estimator · Callaway–Sant'Anna guide
"One unit got treated and I have many untreated comparison units"¶
Synthetic control — build a weighted combination of donors that tracks the treated unit before treatment.
r = sp.synth(df, y="outcome", unit="state", time="year",
treated="California", treat_period=1989)
r.plot()
→ Synthetic control guide ·
sp.synth family
"Treatment is endogenous but I have an instrument"¶
Instrumental variables. Check the first stage before trusting the estimate.
data = sp.datasets.card_1995()
r = sp.ivreg("lwage ~ (educ ~ nearc4) + exper + black + south", data=data)
r.summary()
# weak-instrument-robust reporting bundle:
sp.iv_diag(data, y="lwage", endog="educ", instruments="nearc4")
→ Choosing an IV estimator · IV reference
"Treatment is assigned by a cutoff on a running variable"¶
Regression discontinuity.
data = sp.datasets.lee_2008_senate()
r = sp.rdrobust(data["vote_t1"], data["margin"], c=0.0)
r.summary()
→ Choosing an RD estimator · RD reference
"I want the effect for everyone, not just the average (heterogeneity)"¶
Conditional average treatment effects (CATE) via meta-learners, causal forest, or double ML.
r = sp.dml(df, y="y", treat="d", covariates=["x1", "x2", "x3"], model="irm")
cate = sp.auto_cate(df, y="y", treat="d", covariates=["x1", "x2", "x3"])
→ Choosing an ML causal estimator
"I have rich confounders and want a robust observational estimate"¶
Double/debiased ML or TMLE — both doubly robust, both need overlap.
r = sp.dml(df, y="y", treat="d", covariates=[...], model="irm", ml_g="rf", ml_m="rf")
r = sp.tmle(df, y="y", treat="d", covariates=[...])
"Why is the gap between two groups what it is?" (decomposition)¶
Oaxaca–Blinder and RIF/recentered-influence-function decompositions.
→ Decomposition family · Decomposition reference
"Match treated and control units on covariates"¶
Propensity-score / entropy-balancing / optimal matching.
ps = sp.propensity_score(df, treat="d", covariates=["x1", "x2"])
w = sp.ebalance(df, treat="d", covariates=["x1", "x2"]) # entropy balancing
sp.love_plot(sp.balance_diagnostics(df, treat="d", covariates=["x1", "x2"]))
→ Choosing a matching estimator
"Panel regression with many fixed effects"¶
reghdfe-style high-dimensional fixed effects.