statspai.workflow¶
workflow ¶
End-to-end causal-inference workflow orchestrator.
sp.causal(df, y=, treatment=, ...) stitches the full analysis
pipeline into one call: diagnose identification -> recommend an
estimator -> fit it -> run the standard robustness suite -> produce
an HTML / Markdown / LaTeX report.
This module materialises the agent-native workflow as an API while
keeping each stage's statistical assumptions explicit.
CausalWorkflow
dataclass
¶
Holds state across the diagnose -> estimate -> report pipeline.
estimate ¶
Fit the top-recommended estimator.
Returns the result object (CausalResult or EconometricResults).
Uses the workflow's dataset and column mappings; when the top
recommendation is plain OLS with a treatment-only formula,
enrich the formula with the user's covariates so confounders
are actually adjusted for (sp.recommend by default leaves this
to the caller; the workflow takes responsibility).
Sprint-B preference: when the caller supplied one of the new causal-method hints (proxy_z/proxy_w, tv_confounders, post_treat_strata, mediator), that hint wins over the design-based default in the top recommendation — the user signalled the estimand explicitly.
robustness ¶
Run design-appropriate robustness checks on the fitted result.
Delegates to :func:statspai.workflow._robustness.run_robustness_battery
— the shared battery that powers both the natural-language path
(sp.paper(data, question, ...)) and the estimand-first path
(sp.paper(CausalQuestion(...))). The battery never raises;
per-check failures land as severity='check_failed' findings.
Backwards compatibility: self.robustness_findings keeps the
flat Dict[str, Any] shape that callers (and the existing
:class:PaperDraft renderer) expect — populated from
:meth:RobustnessReport.to_dict. The structured per-finding
records are reachable via self.robustness_findings['_findings']
when a caller wants severity-aware rendering.
report ¶
Generate an end-to-end report and optionally write to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
path
|
str
|
Output path. If omitted, only returns the string. |
None
|
fmt
|
str
|
One of 'html' (default) or 'markdown'. |
'html'
|
Returns:
| Type | Description |
|---|---|
str
|
The report content. |
compare_estimators ¶
Run a design-appropriate panel of estimators for robustness.
For DiD: CS + SA + BJS + Wooldridge (+ Stacked when feasible). For IV: 2SLS + LIML + JIVE + GMM. For observational: OLS + Entropy Balancing + CBPS + SBW + DML. For RD: Sharp RDD (rdrobust) at MSE-optimal + CERD ±50% bandwidths.
Rows with a failed estimator carry NaN and an error string.
sensitivity_panel ¶
Stack E-value + Rosenbaum Γ + Oster δ for one-page sensitivity.
Only runs the tests whose required inputs are present: * E-value requires a risk-ratio or the primary estimate + SE. * Rosenbaum Γ requires a matched/weighted comparison. * Oster δ requires OLS residual variances.
cate ¶
Quick heterogeneity pass: X-Learner + Causal Forest summary.
Skipped if the design is IV / RD / DiD with only the canonical columns and no covariates — heterogeneity needs an X matrix.
run ¶
Run the causal pipeline.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
full
|
bool
|
When True, also runs the v0.9.17 extended stages:
|
True
|
PaperDraft
dataclass
¶
Draft causal-analysis report assembled by :func:sp.paper.
Attributes:
| Name | Type | Description |
|---|---|---|
question |
str
|
The original natural-language question. |
sections |
dict[str, str]
|
Mapping |
workflow |
CausalWorkflow
|
The underlying workflow object — exposes the raw fitted result
( |
fmt |
str
|
Default output format ( |
citations |
list of str
|
BibTeX-style entries collected from each estimator's |
parsed_hints |
dict
|
What the question parser extracted, for transparency / debugging. |
degradations |
list of dict
|
Structured record of optional sub-steps that failed and were
skipped (covariate balance, CI rendering, DAG appendix, citation
extraction, provenance attachment, …). Each entry has at least
|
to_tex ¶
Render to a LaTeX article skeleton.
Each section becomes \section{...}; markdown bullet lists
and code fences are translated to LaTeX equivalents.
to_docx ¶
Write a Word document to path.
Uses the workflow's already-fit result's to_docx if available;
otherwise falls back to dropping a markdown file with a .docx
warning header (no python-docx hard dep).
to_qmd ¶
to_qmd(*, title: str = 'Causal Analysis Draft', author: Optional[str] = None, formats: Optional[List[str]] = None, bibliography: Optional[str] = None, csl: Optional[str] = None, include_provenance: bool = True) -> str
Render to a Quarto (.qmd) document.
Quarto is the multi-format manuscript default: a single source compiles
to PDF / HTML / DOCX / Beamer with cross-refs, citations (CSL),
and embedded code chunks. sp.paper() already produces all
the prose; this method just wraps it in the correct YAML
frontmatter so quarto render paper.qmd Just Works.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
title
|
str
|
|
'Causal Analysis Draft'
|
author
|
str
|
|
None
|
formats
|
list of str
|
Output formats Quarto should support. Default
|
None
|
bibliography
|
str
|
Path Quarto should resolve for citation lookup, e.g.
|
None
|
csl
|
str
|
CSL style file (e.g. |
None
|
include_provenance
|
bool
|
Append a Reproducibility appendix with
:func: |
True
|
Returns:
| Type | Description |
|---|---|
str
|
The complete |
Notes
- The body sections are the same as :meth:
to_markdown— standard markdown with## H2headers, which Quarto will render natively. - Code chunks are not injected by default. When the calling
script wants the
.qmdto re-execute the analysis on each render, pass it through :func:sp.replication_packwhich writes both the.qmdand acode/script.pyreproducer.
write ¶
Write the draft to disk in the format inferred from the path
extension (.md / .tex / .docx / .qmd).
causal ¶
causal(data: DataFrame, y: str, treatment: Optional[str] = None, covariates: Optional[List[str]] = None, id: Optional[str] = None, time: Optional[str] = None, running_var: Optional[str] = None, instrument: Optional[str] = None, cutoff: Optional[float] = None, cohort: Optional[str] = None, cluster: Optional[str] = None, design: Optional[str] = None, dag=None, strict: bool = False, auto_run: bool = True, mediator: Optional[str] = None, tv_confounders: Optional[List[str]] = None, proxy_z: Optional[List[str]] = None, proxy_w: Optional[List[str]] = None, post_treat_strata: Optional[str] = None, allow_experimental: bool = False) -> CausalWorkflow
End-to-end causal-inference workflow.
One call that diagnoses identification, picks an estimator, fits it, runs the canonical robustness suite, and produces a report.
Orchestrate the registered StatsPAI stages while keeping each stage's statistical assumptions explicit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
DataFrame
|
|
required |
treatment
|
DataFrame
|
|
required |
covariates
|
DataFrame
|
|
required |
id
|
DataFrame
|
|
required |
time
|
DataFrame
|
|
required |
running_var
|
DataFrame
|
|
required |
instrument
|
DataFrame
|
|
required |
cutoff
|
Optional[float]
|
Passed through to |
None
|
cohort
|
Optional[float]
|
Passed through to |
None
|
cluster
|
Optional[float]
|
Passed through to |
None
|
design
|
Optional[float]
|
Passed through to |
None
|
dag
|
Optional[float]
|
Passed through to |
None
|
strict
|
Optional[float]
|
Passed through to |
None
|
auto_run
|
bool
|
If True (default), immediately runs all 5 stages and returns
the fully-populated workflow. If False, returns the workflow
object with no stages executed — call |
True
|
allow_experimental
|
bool
|
Forwarded to :func: |
False
|
Returns:
| Type | Description |
|---|---|
CausalWorkflow
|
A workflow object with |
Examples:
One-call full analysis:
>>> import statspai as sp
>>> w = sp.causal(df, y='wage', treatment='training',
... id='worker', time='year', design='did')
>>> w.report('analysis.html')
Fine-grained control:
parse_question ¶
Heuristic parse of a natural-language causal question.
Returns a dict of hints the caller can fall back on when explicit column kwargs aren't provided. Never overrides explicit args.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
question
|
str
|
Natural-language question, e.g. |
required |
columns
|
list of str
|
Columns of the dataset; the parser only proposes column names present in this list. |
required |
Returns:
| Type | Description |
|---|---|
dict
|
Possible keys: |