Robustness workflow¶
A publication-quality causal inference result is not one estimate. It
is a package: the point estimate + a stack of robustness checks that
convince a reviewer the conclusion doesn't hinge on a single modelling
choice. StatsPAI's .next_steps() gives you a priority-ordered list.
This guide expands the full recipe for each design.
The three-layer defence¶
Layer 1 — IDENTIFICATION: is my design valid?
DID: parallel trends
RD: continuity, no manipulation
IV: relevance, exclusion, monotonicity
SOO: unconfoundedness, overlap
Layer 2 — SPECIFICATION: does the result survive alternative choices?
Bandwidth, polynomial order, covariate set, kernel, sample restriction
Layer 3 — SENSITIVITY: how much unobserved confounding would
kill the conclusion?
E-value, Oster, sensemakr, Rambachan-Roth
Layer 1 — Identification checks¶
DID¶
# Placebo pre-trends (per-period joint)
sp.pretrends_test(r)
sp.bjs_pretrend_joint(r) # Borusyak-Jaravel-Spiess
# Honest inference under parallel-trends violations
sp.honest_did(r, max_M=0.5, method='smoothness')
# Decomposition: which 2x2 cells drive TWFE?
sp.bacon_decomposition(df, y='y', treat='d', time='t', id='i')
RD¶
# Density continuity (McCrary 2008 / Cattaneo-Jansson-Ma 2020)
sp.rddensity(df, x='running_var', c=0.0)
sp.mccrary_test(df, x='running_var', c=0.0)
# Covariate balance
sp.rdbalance(df, x='running_var', c=0.0, covariates=[...])
# Placebo cutoffs
sp.rdplacebo(df, y='y', x='running_var',
true_cutoff=0.0, placebo_cutoffs=[-0.5, 0.5])
IV¶
# First-stage strength
r.diagnostics['First-stage F (d)'] # > 10 rule; > 30 comfortable
# Exclusion sensitivity
sp.plausibly_exogenous(df, y='y', d='d', z='z',
gamma_range=(-0.1, 0.1))
# Monotonicity (Imbens-Rubin / Kitagawa)
sp.kitagawa_test(df, y='y', d='d', z='z')
# Reduced form
sp.regress('y ~ z', data=df).tidy()
Selection-on-observables¶
# Overlap / common support
sp.overlap_plot(r)
sp.love_plot(r)
sp.ps_balance(r)
# E-value: how strong must unobserved U be to explain away effect?
sp.evalue_from_result(r)
# Oster delta bounds
sp.oster_bounds(r, delta=1.0)
# Sensemakr (Cinelli-Hazlett)
sp.sensemakr(r, benchmark_covariates=['X1'])
Layer 2 — Specification robustness¶
Specification curve / multiverse¶
spec = sp.spec_curve(
df, y='y', treat='d',
covariate_sets=[
['X1'], ['X1', 'X2'], ['X1', 'X2', 'X3'],
['X1', 'X2', 'X3', 'X4'],
],
estimators=['regress', 'ebalance', 'aipw'],
)
spec.plot() # classic Simonsohn-Simmons-Nelson spec curve
Bandwidth / kernel sensitivity (RD)¶
Subsample stability¶
This runs: leave-one-cohort-out (DID), leave-one-unit-out (synth), subsample reruns, and produces a single report.Layer 3 — Sensitivity to unobserved confounding¶
E-value (VanderWeele-Ding 2017)¶
For any causal estimate on observational data, the E-value is the minimum strength of association an unobserved confounder U would need to have with both treatment and outcome (on the risk-ratio scale) to fully explain away the observed effect.
# From a fitted causal result (interpreted as a standardised mean diff):
r = sp.dml(df, y='y', treat='d', covariates=[...])
ev = sp.evalue_from_result(r)
print(ev["evalue_estimate"]) # e.g. 2.3 -> U needs RR > 2.3 with both
print(ev["evalue_ci"]) # E-value for the CI limit nearest the null
# Or directly from a reported effect on any supported scale:
sp.evalue(estimate=1.8, ci=(1.2, 2.7), measure="OR") # rare=False by default
sp.evalue(estimate=1.5, ci=(1.1, 2.0), measure="HR", rare=False)
sp.evalue(estimate=0.5, se=0.1, sd=2.0, measure="OLS")
sp.evalue(estimate=2.5, ci=(1.8, 3.2), measure="RR", true=1.5) # non-null E-value
sp.evalue_rd(200, 150, 100, 250) # exact risk-difference E-value from a 2x2 table
sp.evalue returns a dict (evalue_estimate, evalue_ci, rr_estimate,
interpretation, ...) and reproduces the R EValue package to machine
precision across RR/OR/HR (rare and common outcomes), MD/SMD, OLS, and the
2x2-table risk difference. If the confidence interval already contains the
null (or the specified true value), its E-value is exactly 1.
Rule of thumb: E > 2 is "reasonably robust"; E > 3 is strong; E < 1.5 is fragile (any of the measured covariates already has that level of association).
Oster (2019) coefficient stability bounds¶
b = sp.oster_bounds(r, delta=1.0, r_max=1.3 * r.glance()['r_squared'].iloc[0])
print(b.lower, b.upper)
[lower, upper] is where the true effect must lie if
unobservables are as important as observables (δ=1). If 0 is in this
interval, the effect is not robust.
Sensemakr (Cinelli-Hazlett 2020)¶
s = sp.sensemakr(r, benchmark_covariates=['X1', 'X2'])
s.summary()
s.plot() # contour plot: partial R^2 of U with T and Y
Bounds (Manski-style)¶
When no assumption feels credible, compute worst-case bounds:
Full robustness report in one call¶
r = sp.callaway_santanna(df, y='y', g='g', t='t', i='i')
sp.robustness_report(r) # prints the full layered check
Returns a dict with:
{
'identification': {'pretrends_pvalue': 0.23, 'honest_did_M_breakdown': 0.34, ...},
'specification': {'spec_curve_sign_stability': 0.95, ...},
'sensitivity': {'evalue': 2.1, 'oster_upper': 0.42, 'oster_lower': 0.08, ...},
'verdict': 'ROBUST', # one of: ROBUST | MARGINAL | FRAGILE
}
Reporting template¶
For the final paper, present:
- Main table: point estimate, SE, 95% CI, N.
- Identification check: one test per identifying assumption (pre-trend plot, McCrary plot, first-stage F).
- Specification panel: point estimate varying key design choices (covariate sets, bandwidths, subsamples).
- Sensitivity panel: E-value, Oster bounds, sensemakr contour.
sp.regtable() auto-formats this for LaTeX/Word: