Skip to content

statspai.question

question

Estimand-first causal question DSL (sp.causal_question).

The article emphasizes "causal question precedes statistical model" as the common foundation of all three causal-inference schools: econometrics' identification, epidemiology's target trial protocol, and ML's estimand-aware learning.

This module lets a user declare a causal question in one place, then automatically:

  1. Identify the appropriate research design (IV / DiD / RD / backdoor).
  2. Suggest the right StatsPAI estimator.
  3. Run the analysis and attach diagnostics + sensitivity.
  4. Produce a reproducible Methods paragraph.

import statspai as sp q = sp.causal_question( ... treatment="minimum_wage_hike", ... outcome="employment", ... estimand="ATT", ... design="policy_shock", ... data=df, ... time_structure="panel", ... covariates=["industry", "skill"], ... ) q.identify() r = q.estimate() q.report()

CausalQuestion dataclass

Pre-registered causal question declaration.

Fields map directly onto the Target Trial Protocol (Hernán 2016) and the "PICOTS + identification" rubric the article describes.

save

save(filename, *, fmt: str = 'auto', note: str = '') -> 'Path'

Save the question to a pre-registration file.

See :func:statspai.question.preregister.preregister for details.

load classmethod

load(filename) -> 'CausalQuestion'

Load a CausalQuestion from a preregistration file.

to_yaml

to_yaml() -> str

Render the question as a YAML string (no file I/O).

identify

identify() -> IdentificationPlan

Choose an estimator based on the declared design / estimand.

estimate

estimate(**kwargs) -> EstimationResult

Execute the identification plan against self.data.

report

report(fmt: str = 'markdown') -> str

Render a Methods + Results narrative.

paper

paper(*, fmt: str = 'markdown', output_path: Optional[str] = None, dag: Any = None, include_robustness: bool = True, cite: bool = True, reviewer_mode: bool = False)

Build a full :class:PaperDraft from this declared question.

Convenience wrapper around :func:statspai.paper_from_question. Calls identify() and estimate() on demand, then assembles a Question / Data / Identification / Estimator / Results / Robustness / References draft. Renders to markdown by default; pass fmt='qmd' for a Quarto document with statspai provenance and an auto-appended Reproducibility appendix.

Examples:

>>> q = sp.causal_question("trained", "wage", data=df, design="did",
...                        time="year", id="worker_id")
>>> draft = q.paper(fmt='qmd')
>>> draft.write("paper.qmd")

IdentificationPlan dataclass

Output of :meth:CausalQuestion.identify.

Describes which estimator is planned, why it is identifying, and which assumptions the user must defend.

EstimationResult dataclass

Unified view of a causal-question estimate.

Thin wrapper that preserves the underlying estimator's full result object while exposing a canonical estimate / se / ci interface.

causal_question

causal_question(treatment: str, outcome: str, *, data: Optional[DataFrame] = None, population: str = '', estimand: str = 'ATE', design: str = 'auto', time_structure: str = 'cross_section', time: Optional[str] = None, id: Optional[str] = None, covariates: Optional[Sequence[str]] = None, instruments: Optional[Sequence[str]] = None, running_variable: Optional[str] = None, cutoff: Optional[float] = None, cohort: Optional[str] = None, notes: str = '') -> CausalQuestion

Declare a causal question (see :class:CausalQuestion).

Supported design values

Classical / quasi-experimental: - 'rct' — randomised assignment; OLS ATE. - 'iv' / 'natural_experiment' — 2SLS / LATE. - 'regression_discontinuity' — local polynomial RD. - 'did' / 'event_study' — difference-in-differences. - 'synthetic_control' — convex-hull weighting. - 'longitudinal_observational' — MSM / g-formula. - 'selection_on_observables' — AIPW (default).

ML-based selection-on-observables (v1.13+): - 'dml' — Double/debiased ML for ATE / LATE [chernozhukov2018double]. - 'tmle' — Targeted Maximum Likelihood with Super Learner [vanderlaan2006targeted]. - 'metalearner' — S/T/X/R/DR-Learner for tau(x); population ATE summary via AIPW influence function [kunzel2019metalearners; nie2021quasi]. - 'causal_forest' — honest random forest for tau(x) [athey2019generalized; wager2018estimation]; population ATE inference uses cross-fit AIPW [vanderlaan2003unified; chernozhukov2018double].

All bib keys above resolve in paper.bib.

load_preregister

load_preregister(filename: Union[str, Path]) -> CausalQuestion

Load a pre-registration file back into a :class:CausalQuestion.

Deviations and metadata are preserved on the returned object via the .notes field (concatenated).