StatsPAI¶

Validation-tiered Python workflows for causal inference and applied econometrics. One import statspai as sp exposes 1,147 registered functions across 87 submodules (live count: python scripts/registry_stats.py) spanning classical regression, staggered DiD, regression discontinuity, synthetic control, decomposition, stochastic frontier, multilevel / mixed-effects, modern ML causal inference, the full three-school (Econometrics / Epidemiology / ML) toolkit, 2025-2026 research-frontier modules (bridging theorems, fairness, surrogates, PCMCI, TMLE survival, etc.), and manuscript/reporting output in Word / Excel / LaTeX / HTML.

Current release: v1.20.0 (2026-06-22) — a batch of ⚠️ inference correctness fixes (CUSUM linear boundary, genuine Imbens–Manski CIs for sp.lee_bounds, CCT kernel-squared robust RD variance, Hotelling-T² pre-trend test for sp.callaway_santanna, nested-FE cluster DOF in the native HDFE backend), building on v1.19.0's cross-engine validation (sp.cross_validate), data-MCP ingestion normalisers (sp.from_worldbank / sp.from_fred / sp.from_sdmx), and the sp.network module. See the changelog for detail.

import statspai as sp

# One-call DiD pipeline with sensitivity + export
rpt = sp.cs_report(data, y='y', g='g', t='t', i='id',
                   n_boot=500, random_state=0,
                   save_to='~/study/cs_v1')

What's inside¶

Release highlights (early era: v0.9.17 → v1.5.0)¶

For everything after v1.5.0 — the P1 agent-native surface (sp.causal_question, sp.paper(), LLM-DAG), cross-engine validation (sp.cross_validate), the parity index, and the v1.20.0 inference correctness batch — see the changelog.

Release	Focus	Headline
v1.5.0	Interference / Conformal / Mendelian family consolidation	Three family guides (`interference_family`, `conformal_family`, `mendelian_family`) covering all 36 functions; three unified dispatchers `sp.mr(method=...)` / `sp.conformal(kind=...)` / `sp.interference(design=...)` with 91 aliases in total; two silent-wrong-numbers fixes — `mr_egger` slope t(n−2) parity with `mr_pleiotropy_egger` (anti-conservative CIs at small `n_snps` before fix) and `mr_presso` MC p-value floor at `1/(B+1)` (no more `p = 0`). Breaking: `sp.mr` is now a dispatcher function, not a module alias. Registry coverage fixes for 5 previously-unregistered family functions.
v1.4.2	Correctness patches + Proximal / QTE / Causal-RL family guides	`sp.dml_model_averaging` √n SE scaling bug (CIs were √n × too wide) + `sp.gardner_did` event-study reference-category contamination (pre-trend bias ~0.3). Three family guides. No breaking changes.
v1.4.1	v3-frontier Sprint 3 (AKM SE + Claude thinking + test suites + docs)	`sp.shift_share_political_panel(cluster='shock')` — panel-extended Adão-Kolesár-Morales (2019) shock-cluster variance (Park-Xu 2026 §4.2); `sp.causal_llm.anthropic_client(thinking_budget=N)` — Claude 4.5 / Opus 4.7 extended-thinking API; 10-check assimilation parity suite + 11-test MAS integration suite with 3 Claude thinking block-splitter tests; two new MkDocs guides (`shift_share_political_panel`, `causal_mas`). Strictly additive over v1.4.0.
v1.4.0	v3-frontier Sprint 2 (extensions + LLM SDK + docs)	`shift_share_political_panel` (Park-Xu 2026 multi-period); real LLM adapters `openai_client` / `anthropic_client` / `echo_client` for Causal MAS; `particle_filter` backend for `assimilative_causal` (non-Gaussian / nonlinear); three new MkDocs guides (`synth_experimental`, `harvest_did`, `assimilative_ci`); 20 unused-import cleanups; CausalForest parity-test de-flake.
v1.3.0	v3-frontier sprint (Sprint 1 of 知识地图 v3)	11 frontier methods: Abadie-Zhao inverse synthetic experimental design, CJM RBC bootstrap for `rdrobust`, Kwak-Pleasants evidence-without-injustice fairness test, JAMA/BMJ TARGET manuscript renderer, Borusyak harvest-DiD, Zorzetto ordinal / factor-exposure BCF, multi-agent causal discovery (`causal_mas`), Park-Xu political shift-share IV, state-space `causal_kalman`. 35 new tests, 869 registered functions, zero regressions. `tabulate` promoted to core dep.
v1.0.1	Post-review correctness + NEEDS_VERIFICATION closeout	All Critical / High / Medium findings from the independent code-review-expert pass on v1.0 frontier modules fixed and pinned by regression tests. Abadie κ-weighted complier QTE now implemented for `beyond_average_late`; `bridge.surrogate_pci` path B now a genuine dual-path (arm-specific counterfactual bridge), not OLS-tautology. 2 706+ tests passing.
v1.0.0	Research-frontier capstone	`sp.bridge` (6 bridging theorems), `sp.fairness`, `sp.surrogate`, `mr_multivariable`/`mr_mediation`/`mr_bma`, PCMCI/LPCMCI/DYNOTEARS, conformal frontiers (debiased/density/fair/multi-DP), proximal frontiers, sequential SDID, BCF longitudinal, LTMLE survival, ML bounds, JAMA/BMJ 2025 TARGET Statement 21-item reporting checklist.
v0.9.17	Three-school completion	`sp.epi` (OR/RR/MH/standardization/Bradford-Hill/ROC/kappa), `sp.longitudinal` (MSM/g-formula/IPW unified + safe regime DSL), `sp.question` (estimand-first DSL), full MR diagnostic suite, DAG `recommend_estimator()`, unified `result.sensitivity()`, `preregister()` + `load_preregister()`.
v0.9.3	Frontier + Multilevel + GLMM + Trinity	`sp.frontier` / `sp.xtfrontier` full Stata/R parity; `sp.zisf`, `sp.lcsf`, `sp.malmquist`; `sp.mixed` lme4-grade; GLMMs with AGHQ; `sp.dml(model='pliv')`, `sp.mixlogit`, `sp.ivqreg`; `sp.verify` posterior verification.
v0.9.2	Decomposition	18 methods under `sp.decompose(method=...)`.
v0.9.1	Regression discontinuity	18+ estimators across 14 modules.
v0.9.0	Synthetic control	20 SCM estimators + 6 inference strategies.

Methodological coverage¶

Regression & panel. OLS / IV / panel / GLM; fixed-effect high- dimensional estimation; GMM; quantile regression; instrumental-variable quantile regression (sp.ivqreg); mixed logit (sp.mixlogit).

Difference-in-differences (10+ variants). sp.callaway_santanna (DR/IPW/REG), sp.aggte with Mammen uniform bands, sp.sun_abraham, sp.bjs (Borusyak-Jaravel-Spiess imputation), sp.dcdh (de Chaisemartin-D'Haultfoeuille), sp.etwfe, sp.goodman_bacon; sensitivity via sp.honest_did, sp.breakdown_m; one-call sp.cs_report with Markdown / LaTeX / Excel export.

Regression discontinuity (18+ estimators). sp.rdrobust (CCT sharp/fuzzy/kink with bias-corrected robust CI), sp.rd2d (2D/boundary), sp.rkd, sp.rdit, multi-cutoff and multi-score designs, sp.rdhonest (Armstrong-Kolesar), local randomization (sp.rdrandinf, sp.rdwinselect, sp.rdsensitivity), sp.cjm_density, ML-based CATE (sp.rd_forest, sp.rd_boost, sp.rd_lasso), Angrist-Rokkanen extrapolation, sp.rdpower, sp.rdsampsi, bundled sp.rdsummary dashboard.

Synthetic control (20 estimators). Classical SCM, SDID, Augmented SCM (ASCM), Bayesian SCM (MCMC), BSTS and CausalImpact (Kalman smoother), Penalized SCM (Abadie-L'Hour), Forward-DID, cluster SCM, sparse (LASSO) SCM, kernel and kernel-ridge SCM, staggered synthetic control, multi-outcome SCM; research workflow: sp.synth_compare, sp.synth_recommend, sp.synth_power, sp.synth_mde, sp.synth_sensitivity, sp.synth_report.

Decomposition analysis (18 methods). Mean: sp.oaxaca (5 reference coefficients), sp.gelbach, sp.fairlie, sp.bauer_sinning, sp.yun_nonlinear. Distributional: sp.rifreg, sp.ffl_decompose, sp.dfl_decompose, sp.machado_mata, sp.melly_decompose, sp.cfm_decompose. Inequality: sp.subgroup_decompose, sp.shapley_inequality, sp.source_decompose. Demographic: sp.kitagawa_decompose, sp.das_gupta. Causal: sp.gap_closing, sp.mediation_decompose, sp.disparity_decompose. Unified entry: sp.decompose(method=…).

Stochastic frontier (v0.9.3). sp.frontier cross-sectional with half-normal / exponential / truncated-normal, heteroskedastic usigma / vsigma, Battese-Coelli (1995) determinants emean, Battese-Coelli (1988) TE and JLMS, Kodde-Palm LR mixed-\(\bar\chi^2\) test, bootstrap unit-efficiency CI. sp.xtfrontier panel with Pitt-Lee (1981), BC92 time-decay, BC95, Greene (2005) TFE/TRE with Dhaene-Jochmans (2015) jackknife. sp.zisf, sp.lcsf, sp.malmquist (M = EC × TC), sp.translog_design.

Multilevel / mixed-effects (v0.9.3). sp.mixed linear mixed models with unstructured G default, three-level nested, BLUP posterior SEs, Nakagawa-Schielzeth \(R^2\). GLMMs (sp.melogit, sp.mepoisson, sp.meglm, sp.megamma, sp.menbreg, sp.meologit) via Laplace or adaptive Gauss-Hermite quadrature (nAGQ>1 matches Stata intpoints() and R lme4::glmer). sp.icc with delta-method CI; sp.lrtest with Self-Liang boundary correction.

Modern ML causal. Double/debiased ML (sp.dml with PLR / IRM / PLIV); causal forests; meta-learners (S / T / X / R / DR); TMLE and Super Learner; neural causal (TARNet, CFRNet, DragonNet); causal discovery (NOTEARS, PC, LiNGAM, GES); policy trees; Bayesian causal forests; matrix completion; conformal causal inference; dose-response; dynamic- treatment regimes; interference / spillover.

Spatial, time-series, survival, survey, bunching, Mendelian. Spatial econometrics (weights, ESDA, ML/GMM, GWR/MGWR, spatial panel); time-series (ARIMA, VAR, BVAR, GARCH, cointegration, local projections, structural break); survival (Cox, AFT, frailty); survey calibration and complex-survey regression; bunching; Mendelian randomization.

Sensitivity analysis. Oster bounds; sensemakr; E-values; Rosenbaum bounds; Manski bounds; sp.spec_curve() specification curve analysis; sp.robustness_report() one-call battery.

Smart Workflow¶

# Recommend estimators + run posterior verification
rec  = sp.recommend(df, outcome='y', treatment='d', verify=True)
rec.summary()                # ranked estimators with verify_score
rec.plot('verify_radar')     # visual stability check

Agent-facing API¶

Registered functions are discoverable programmatically:

sp.list_functions(category='did')        # enumerate methods
sp.describe_function('rdrobust')         # natural-language description
sp.function_schema('dml')                # JSON schema: args, types, returns

Installation¶

pip install statspai                       # core
pip install 'statspai[plotting]'           # matplotlib + seaborn
pip install 'statspai[fixest]'             # pyfixest HDFE
pip install 'statspai[deepiv]'             # PyTorch (Deep IV, TARNet)
pip install 'statspai[text]'               # sentence-transformers for sbert

Citation¶

If you use StatsPAI in research, please cite the underlying papers implemented by each estimator — sp.citation() returns the package entry, and many result objects expose a .cite() method for the estimator-level reference — together with this package:

@software{wang2026statspai,
  author  = {Wang, Biaoyue and Rozelle, Scott},
  title   = {StatsPAI: Validation-Tiered Causal Inference and
             Econometrics Workflows for Python},
  year    = {2026},
  version = {1.20.0},
  url     = {https://github.com/brycewang-stanford/StatsPAI}
}