Automatic diagnostics — what StatsPAI checks for you¶

Every fitted StatsPAI result carries a self-audit. You do not have to remember which assumption each estimator leans on — the result knows, and will tell you:

import statspai as sp

res = sp.ivreg("y ~ (d ~ z)", data=df)

res.violations()          # structured list of flagged concerns (may be empty)
sp.audit(res)             # reviewer checklist: what's checked, passed, missing

res.violations() inspects diagnostics the estimator already computed — it never re-runs a test or touches your data, so it is instant. Each entry is a dict an agent can branch on:

{"kind": "assumption", "severity": "warning", "test": "weak_instrument",
 "value": 4.2, "threshold": 10.0,
 "message": "First-stage F = 4.20 < 10 (Stock-Yogo 5% bias) — weak instrument …",
 "recovery_hint": "Use sp.anderson_rubin_ci …",
 "alternatives": ["sp.anderson_rubin_ci", "sp.iv"]}

sp.audit(res) is a superset of violations(): it adds the robustness / sensitivity checks a referee would ask for (present, failed, or still missing) and folds in every live violation, so one call gives the full picture.

Two design commitments make these signals trustworthy:

Fit-time and structured API agree. Where an estimator warns at fit time (weak IV, few clusters, separation), the same concern appears in violations() — never one without the other.
Calibrated not to cry wolf. Thresholds are set so the field's canonical good examples stay silent (e.g. California Prop-99 does not trip the synthetic-control pre-fit check). A diagnostic that fires on the textbook example would erode trust, not build it.

The checklist¶

Family	Check	Fires when	Points to
DID	parallel trends	pre-trend joint test p < 0.10 (Roth 2022)	`sp.sensitivity_rr`, `sp.callaway_santanna`, `sp.did_imputation`
IV	weak instrument	first-stage F < 10 (Stock-Yogo)	`sp.anderson_rubin_ci`, `sp.iv(method='liml')`
Panel / OLS	few clusters	# clusters < 30 (Cameron-Gelbach-Miller 2008)	`sp.wild_cluster_bootstrap`, `sp.wild_cluster_ci_inv`
Synthetic control	poor pre-fit	pre-RMSPE / pre-period SD > 0.6	`sp.synth_compare`, `sp.augsynth`, `sp.synth_sensitivity`
RD	manipulation	McCrary density test p < 0.05	`sp.rddensity`, `sp.rdplotdensity`, `sp.rdrandinf`
Matching	residual imbalance	max post-match SMD > 0.25 (Stuart 2010)	`sp.ebalance`, `sp.cbps`, `sp.love_plot`
Matching / IPW	overlap	min propensity weight share < 0.05	`sp.trimming`
DML / AIPW	weak overlap	> 5% of units at the trimming bound	`sp.trimming`, `sp.overlap_weights`, `sp.cbps`
Logit / probit	(quasi-)separation	\|slope coef\| > 15 (Albert-Anderson 1984)	penalised logit / drop the separating predictor
Poisson	over-dispersion	Pearson dispersion > 1.5	`sp.nbreg`, robust SEs (quasi-Poisson)
Count	excess zeros	observed − predicted zeros > 0.05	`sp.zip_model`, `sp.zinb`
Heckman	no selection	inverse-Mills p > 0.10	`sp.regress` (more efficient)
Heckman	unstable rho	\|rho\| > 0.99 (weak exclusion restriction)	strengthen the exclusion; compare `sp.regress`
Tobit	extreme censoring	> 90% of observations censored	`sp.heckman`, report bounds
Cox	non-proportional hazards	Schoenfeld PH test p < 0.05	time interaction / stratify, `sp.aft`
Bayesian	non-convergence	r̂ > 1.01, bulk-ESS < 400, or divergences > 0	more draws / chains, reparameterise
Any	numerical	non-positive or non-finite standard errors	check collinearity (`sp.vif`), sandwich setup

The thresholds live in one place (statspai.core._agent_summary) and are shared by violations() and audit(), so the two can never disagree, and a future correctness fix that moves a cutoff moves both at once.

Using it in a workflow¶

res = sp.dml(df, y="y", treat="d", covariates=X, model="irm")

for v in res.violations():
    if v["severity"] in ("warning", "error"):
        print(v["message"])
        print("  → try:", ", ".join(v["alternatives"]))

# Or get the full reviewer view, sorted by how ready the result is:
report = sp.audit(res)
report["coverage"]        # passed / total, in [0, 1]
report["summary"]         # {passed, failed, missing, n_total}

For the checks that re-run against the data (rather than inspecting stored diagnostics), see sp.assumption_audit, and for the design-level robustness sweep see the robustness workflow guide.