FAQ & troubleshooting¶
Installation & imports¶
Do I need PyTorch / JAX / PyMC?
No. The core install has no heavy ML dependencies. Those back specific
estimators and are lazily imported, so you only hit an ImportError if you
call an estimator that needs an extra you have not installed. The error tells
you which extra to add, e.g.:
pip install "StatsPAI[bayes]" # PyMC + ArviZ
pip install "StatsPAI[neural]" # PyTorch (neural causal, DeepIV)
pip install "StatsPAI[performance]" # JAX
import statspai as sp — is there a second import I need?
No. Every public function is reachable as sp.<name>. If you cannot find
something, sp.search_functions("keyword") or sp.help(search="keyword")
will locate it.
Discovering functionality¶
How do I find the right function?
sp.recommend(df, y="y", treat="d") # suggest an estimator from the data
sp.detect_design(df) # what study design is this?
sp.search_functions("synthetic control")
sp.help("did") # category / function help
How do I see a function's assumptions before I run it?
sp.agent_card("callaway_santanna") # assumptions, pre-conditions, failure modes
sp.function_schema("did", agent_native=True) # same, as an LLM tool schema
Reading the output¶
What do the warnings mean?
StatsPAI fails loudly rather than returning silent NaNs. Common warnings:
AssumptionWarning/AssumptionViolation— an identifying assumption looks violated (e.g. pre-trends, overlap). Inspectresult.diagnostics.ConvergenceWarning— an iterative/MCMC fit did not converge cleanly (e.g.rhat > 1.01or low ESS for Bayesian estimators). Increase iterations/draws.WorkflowDegradedWarning— an orchestration step (sp.paper,smartworkflows) degraded a section; the reason is recorded inresult.degradations.
My DiD pre-trends look violated — now what? Quantify how sensitive the conclusion is with Rambachan–Roth honest bounds:
My IV estimate has a huge confidence interval. Almost always a weak first stage. Check it and switch to weak-instrument-robust inference:
"Poor overlap" / extreme propensity scores. Treated and control covariate distributions barely overlap. Trim or restrict to the common-support region:
sp.trimming(df, treat="d", covariates=[...])
sp.overlap_plot(sp.ps_balance(df, treat="d", covariates=[...]))
Reproducibility¶
How do I make results deterministic?
with sp.session(seed=42):
r = sp.callaway_santanna(df, y="lemp", g="first_treat", t="year", i="countyreal")
sp.session fixes the global RNG state for bootstrap / permutation / MCMC
draws inside the block.
Citations¶
How do I cite an estimator correctly?
StatsPAI never invents references — every citation is verified against
paper.bib.
Using StatsPAI from an agent / LLM¶
Every function returns structured results and carries a self-describing schema. The typical agent loop:
sp.detect_design(df) # 1. identify the design
sp.preflight(df, method="callaway_santanna") # 2. will it run?
r = sp.callaway_santanna(df, ...) # 3. fit
r.to_dict(detail="agent") # 4. structured payload
sp.audit(r) # 5. missing robustness checks
r.cite() # 6. verified citation
Numerical correctness¶
Did the numbers change between versions?
Correctness fixes are flagged with ⚠️ correctness in the
changelog and recorded in MIGRATION.md. If you are
reproducing an older analysis, check those notes for the modules you use.