`statspai.question`¶

question ¶

Estimand-first causal question DSL (sp.causal_question).

The article emphasizes "causal question precedes statistical model" as the common foundation of all three causal-inference schools: econometrics' identification, epidemiology's target trial protocol, and ML's estimand-aware learning.

This module lets a user declare a causal question in one place, then automatically:

Identify the appropriate research design (IV / DiD / RD / backdoor).
Suggest the right StatsPAI estimator.
Run the analysis and attach diagnostics + sensitivity.
Produce a reproducible Methods paragraph.

import statspai as sp q = sp.causal_question( ... treatment="minimum_wage_hike", ... outcome="employment", ... estimand="ATT", ... design="policy_shock", ... data=df, ... time_structure="panel", ... covariates=["industry", "skill"], ... ) q.identify() r = q.estimate() q.report()

CausalQuestion `dataclass` ¶

Pre-registered causal question declaration.

Fields map directly onto the Target Trial Protocol (Hernán 2016) and the "PICOTS + identification" rubric the article describes.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> d = (rng.uniform(size=n) < 0.5).astype(int)
>>> y = 1.0 + 0.5 * d + rng.normal(size=n)
>>> df = pd.DataFrame({"treat": d, "outcome": y})
>>> q = sp.causal_question("treat", "outcome", data=df, design="rct")
>>> isinstance(q, sp.CausalQuestion)
True
>>> q.identify().estimator
'regress'

save ¶

save(filename: str | Path, *, fmt: str = 'auto', note: str = '') -> 'Path'

Save the question to a pre-registration file.

See :func:statspai.question.preregister.preregister for details.

load `classmethod` ¶

load(filename: str | Path) -> 'CausalQuestion'

Load a CausalQuestion from a preregistration file.

to_yaml ¶

to_yaml() -> str

Render the question as a YAML string (no file I/O).

identify ¶

identify() -> IdentificationPlan

Choose an estimator based on the declared design / estimand.

estimate ¶

estimate(**kwargs: Any) -> EstimationResult

Execute the identification plan against self.data.

report ¶

report(fmt: str = 'markdown') -> str

Render a Methods + Results narrative.

paper ¶

paper(*, fmt: str = 'markdown', output_path: Optional[str] = None, dag: Any = None, include_robustness: bool = True, cite: bool = True, reviewer_mode: bool = False) -> Any

Build a full :class:PaperDraft from this declared question.

Convenience wrapper around :func:statspai.paper_from_question. Calls identify() and estimate() on demand, then assembles a Question / Data / Identification / Estimator / Results / Robustness / References draft. Renders to markdown by default; pass fmt='qmd' for a Quarto document with statspai provenance and an auto-appended Reproducibility appendix.

Examples:

>>> q = sp.causal_question("trained", "wage", data=df, design="did",
...                        time="year", id="worker_id")
>>> draft = q.paper(fmt='qmd')
>>> draft.write("paper.qmd")

IdentificationPlan `dataclass` ¶

Output of :meth:CausalQuestion.identify.

Describes which estimator is planned, why it is identifying, and which assumptions the user must defend.

Examples:

>>> import statspai as sp
>>> q = sp.causal_question("treat", "outcome", design="rct")
>>> plan = q.identify()
>>> isinstance(plan, sp.IdentificationPlan)
True
>>> plan.estimator
'regress'
>>> bool("random assignment" in plan.assumptions)
True

EstimationResult `dataclass` ¶

Bases: ResultProtocolMixin

Unified view of a causal-question estimate.

Thin wrapper that preserves the underlying estimator's full result object while exposing a canonical estimate / se / ci interface.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> d = (rng.uniform(size=n) < 0.5).astype(int)
>>> y = 1.0 + 0.5 * d + rng.normal(size=n)
>>> df = pd.DataFrame({"treat": d, "outcome": y})
>>> q = sp.causal_question("treat", "outcome", data=df, design="rct")
>>> res = q.estimate()
>>> isinstance(res, sp.EstimationResult)
True
>>> res.estimator
'regress'
>>> bool(res.ci[0] < res.estimate < res.ci[1])
True

causal_question ¶

causal_question(treatment: str, outcome: str, *, data: Optional[DataFrame] = None, population: str = '', estimand: str = 'ATE', design: str = 'auto', time_structure: str = 'cross_section', time: Optional[str] = None, id: Optional[str] = None, covariates: Optional[Sequence[str]] = None, instruments: Optional[Sequence[str]] = None, running_variable: Optional[str] = None, cutoff: Optional[float] = None, cohort: Optional[str] = None, notes: str = '', engine: str = 'ols') -> CausalQuestion

Declare a causal question (see :class:CausalQuestion).

Supported design values

Classical / quasi-experimental: - 'rct' — randomised assignment; OLS ATE. - 'iv' / 'natural_experiment' — 2SLS / LATE. - 'regression_discontinuity' — local polynomial RD. - 'did' / 'event_study' — difference-in-differences. - 'synthetic_control' — convex-hull weighting. - 'longitudinal_observational' — MSM / g-formula. - 'selection_on_observables' — AIPW (default).

ML-based selection-on-observables (v1.13+): - 'dml' — Double/debiased ML for ATE / LATE [chernozhukov2018double]. - 'tmle' — Targeted Maximum Likelihood with Super Learner [vanderlaan2006targeted]. - 'metalearner' — S/T/X/R/DR-Learner for tau(x); population ATE summary via AIPW influence function [kunzel2019metalearners; nie2021quasi]. - 'causal_forest' — honest random forest for tau(x) [athey2019generalized; wager2018estimation]; population ATE inference uses cross-fit AIPW [vanderlaan2003unified; chernozhukov2018double].

All bib keys above resolve in paper.bib.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> d = (rng.uniform(size=n) < 0.5).astype(int)
>>> x = rng.normal(size=n)
>>> y = 1.0 + 0.5 * d + 0.3 * x + rng.normal(size=n)
>>> df = pd.DataFrame({"treat": d, "outcome": y, "x": x})
>>> q = sp.causal_question("treat", "outcome", data=df,
...                        design="rct", covariates=["x"])
>>> plan = q.identify()
>>> plan.estimator
'regress'
>>> res = q.estimate()
>>> res.estimand
'ATE'

load_preregister ¶

load_preregister(filename: Union[str, Path]) -> CausalQuestion

Load a pre-registration file back into a :class:CausalQuestion.

Deviations and metadata are preserved on the returned object via the .notes field (concatenated).

Examples:

>>> import statspai as sp
>>> import os, tempfile
>>> q = sp.causal_question(treatment="policy", outcome="employment",
...                        estimand="ATT", design="did")
>>> path = os.path.join(tempfile.mkdtemp(), "pap.yaml")
>>> _ = sp.preregister(q, path)
>>> q2 = sp.load_preregister(path)
>>> q2.treatment
'policy'

statspai.question¶

question ¶

CausalQuestion dataclass ¶

save ¶

load classmethod ¶

to_yaml ¶

identify ¶

estimate ¶

report ¶

paper ¶

IdentificationPlan dataclass ¶

EstimationResult dataclass ¶

causal_question ¶

load_preregister ¶

`statspai.question`¶

CausalQuestion `dataclass` ¶

load `classmethod` ¶

IdentificationPlan `dataclass` ¶

EstimationResult `dataclass` ¶