Skip to content

statspai.smart

smart

Smart Workflow Engine.

Registered workflow helpers for planning, diagnostics, sensitivity, and replication support:

  • recommend() — DAG + data → estimator selection
  • compare_estimators() — run multiple methods, compare, diagnose
  • assumption_audit() — comprehensive assumption testing by method
  • sensitivity_dashboard() — multi-dimensional sensitivity analysis
  • pub_ready() — journal-specific publication readiness checklist
  • replicate() — famous paper replication with built-in data

RecommendationResult

Result from the estimator recommendation engine.

to_latex

to_latex(caption: Optional[str] = None, label: str = 'tab:recommendation') -> str

Export recommendations as a booktabs LaTeX table.

If verify=True was used when calling recommend(), the table includes the stability-check columns (composite score, bootstrap stability, placebo pass-rate, subsample agreement).

IMPORTANT CAVEAT FOR AUTHORS: The stability score measures whether a method gives consistent estimates under resampling on the observed data — it does NOT establish identification validity or protect against unobserved confounding. A biased OLS on observational data will typically score high because the bias is stable across resamples. Do not cite this score as evidence that a method is "correct" for a given design; use it only to compare the stability of methods that already satisfy the design's identification assumptions.

Parameters:

Name Type Description Default
caption str

Table caption. Defaults to the detected design.

None
label str

LaTeX label for cross-referencing.

'tab:recommendation'

Returns:

Type Description
str

LaTeX source (booktabs + threeparttable).

run

run(which: int = 0, **kwargs)

Execute the recommended estimator.

Parameters:

Name Type Description Default
which int

Which recommendation to run (0 = top recommendation).

0
**kwargs

Override any parameters.

{}

run_all

run_all(**kwargs)

Run all recommended estimators and return a comparison.

ComparisonResult

Results from multi-estimator comparison.

plot

plot(ax=None, **kwargs)

Forest plot comparing estimates across methods.

AssumptionResult

Results from comprehensive assumption audit.

failed

failed() -> List[AssumptionCheck]

Return only failed checks.

passed_all

passed_all() -> bool

True if all checks passed.

SensitivityDashboard

Multi-dimensional sensitivity analysis results.

PubReadyResult

Publication readiness checklist results.

IdentificationReport dataclass

Report from check_identification.

verdict property

verdict: str

Overall verdict: BLOCKERS | WARNINGS | OK.

DiagnosticFinding dataclass

A single design-level finding.

IdentificationError

Bases: Exception

Raised by check_identification(strict=True) when a blocker is found.

Carries the full :class:IdentificationReport on self.report so downstream code can still inspect findings without re-running.

compare_estimators

compare_estimators(data: DataFrame, y: str, treatment: str, methods: List[str] = None, covariates: List[str] = None, id: str = None, time: str = None, instrument: str = None, alpha: float = 0.05, method_hints: Optional[Dict[str, Dict[str, Any]]] = None) -> ComparisonResult

Run multiple estimators on the same data and compare.

Run selected estimators and return an agreement-diagnostics table for manual robustness review.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treatment str

Treatment variable (binary).

required
methods list of str

Estimators to compare. Default auto-selects based on data. Classical options: 'ols', 'matching', 'ipw', 'aipw', 'dml', 'g_computation', 'causal_forest', 'did', 'panel_fe'.

Hint-driven Sprint-B options (require method_hints): 'proximal', 'msm', 'principal_strat', 'mediate', 'mediate_interventional', 'front_door'. Each needs method-specific kwargs the shared signature does not expose (proxy_z/proxy_w, time_varying, strata, mediator, etc.) — pass them through method_hints.

None
method_hints dict

Per-method keyword overrides, merged with the shared kwargs when dispatching each estimator. Structure::

{'proximal': {'proxy_z': ['z'], 'proxy_w': ['w']},
 'msm':      {'time_varying': ['L_lag']},
 'principal_strat': {'strata': 's'}}

Collision rule (docs/ROADMAP.md §6): per-method hints take precedence over the shared kwargs for the method they name. If the top-level covariates=['age'] disagrees with method_hints={'proximal': {'covariates': ['age', 'educ']}}, proximal uses the hint and every other method uses the shared arg. A UserWarning fires on conflict so the override is visible in the log.

None
covariates list of str
None
id str

Panel unit ID.

None
time str

Time variable.

None
instrument str
None
alpha float
0.05

Returns:

Type Description
ComparisonResult

With .summary(), .plot(), .results (dict of individual results).

Examples:

>>> import statspai as sp
>>> comp = sp.compare_estimators(
...     data=df, y='wage', treatment='training',
...     methods=['ols', 'matching', 'ipw', 'dml'],
...     covariates=['age', 'education'],
... )
>>> print(comp.summary())
>>> comp.plot()

assumption_audit

assumption_audit(result, data: DataFrame = None, alpha: float = 0.05, verbose: bool = True) -> AssumptionResult

Comprehensive assumption audit for any estimated model.

Run the method's registered assumption checks and provide actionable remedies for failed or inconclusive diagnostics.

Parameters:

Name Type Description Default
result EconometricResults or CausalResult

Estimated model result.

required
data DataFrame

Original data (needed for some tests). Auto-extracted if available.

None
alpha float

Significance level for tests.

0.05
verbose bool

Print summary automatically.

True

Returns:

Type Description
AssumptionResult

With .summary(), .failed(), .passed_all() methods.

Examples:

>>> import statspai as sp
>>> result = sp.regress("wage ~ educ + exper", data=df)
>>> audit = sp.assumption_audit(result)
>>> print(audit.summary())
>>> if not audit.passed_all():
...     for fail in audit.failed():
...         print(f"  Fix: {fail.remedy}")

bib_for

bib_for(result: Any) -> Dict[str, Any]

Top-level structured citation for a fitted result.

Convenience entry that pairs with result.cite(format="json") so agents that don't have direct access to the result method can pull the structured payload via sp.bib_for(...) instead.

Parameters:

Name Type Description Default
result CausalResult or EconometricResults

Any fitted result object exposing a .cite() method.

required

Returns:

Type Description
dict

Same shape as result.cite(format="json"): {type, key, authors, year, title, journal, volume, number, pages, publisher, fields}.

Examples:

>>> r = sp.did(df, y='y', treat='treated', time='t', post='post')
>>> sp.bib_for(r)['key']
'angrist2009mostly'

render_citation

render_citation(bibtex: str, fmt: str = 'bibtex') -> Any

Render a stored BibTeX string in the requested format.

Parameters:

Name Type Description Default
bibtex str

Raw BibTeX entry as stored on the result class. May contain multiple @type{...} entries concatenated (some methods cite more than one paper); the renderer walks every entry.

required
fmt ('bibtex', 'apa', 'json')

Output format.

"bibtex"

Returns:

Type Description
str | dict | list
  • "bibtex"str (the raw string, multi-entry preserved as-is).
  • "apa"str (single entry) or str with entries joined by a blank line (multi-entry).
  • "json"dict for single-entry input (the original shape, preserved for backward compat) OR list[dict] when the source contains multiple BibTeX entries.

sensitivity_dashboard

sensitivity_dashboard(result, data: DataFrame = None, dimensions: List[str] = None, alpha: float = 0.05, verbose: bool = True) -> SensitivityDashboard

Comprehensive multi-dimensional sensitivity analysis.

Test sensitivity across selected dimensions and produce an overall stability grade.

Parameters:

Name Type Description Default
result EconometricResults or CausalResult

Baseline estimated result.

required
data DataFrame

Original data (auto-extracted if possible).

None
dimensions list of str

Which dimensions to test. Default: all applicable. Options: 'sample', 'controls', 'functional_form', 'outliers', 'unobservables'.

None
alpha float
0.05
verbose bool
True

Returns:

Type Description
SensitivityDashboard

Examples:

>>> import statspai as sp
>>> result = sp.regress("y ~ x1 + x2", data=df)
>>> dash = sp.sensitivity_dashboard(result, data=df)
>>> print(dash.summary())

pub_ready

pub_ready(results: list = None, venue: str = 'top5_econ', design: str = None, has_balance: bool = False, has_pretrends: bool = False, has_robustness: bool = False, has_heterogeneity: bool = False, has_sensitivity: bool = False, has_placebo: bool = False, has_mht: bool = False) -> PubReadyResult

Publication readiness checklist.

Generate a venue-specific checklist for empirical paper submission.

Parameters:

Name Type Description Default
results list

List of estimated result objects.

None
venue str

Target venue: 'top5_econ', 'aej_applied', 'rct'.

'top5_econ'
design str

Research design: 'rct', 'did', 'rd', 'iv', 'observational'.

None
has_balance bool

Already have balance table.

False
has_pretrends bool

Already have pre-trend tests.

False
has_robustness bool

Already have robustness checks.

False
has_heterogeneity bool

Already have subgroup analysis.

False
has_sensitivity bool

Already have sensitivity analysis.

False
has_placebo bool

Already have placebo tests.

False
has_mht bool

Already have MHT correction.

False

Returns:

Type Description
PubReadyResult

Examples:

>>> import statspai as sp
>>> check = sp.pub_ready(results=[r1, r2], venue='top5_econ', design='did')
>>> print(check.summary())

list_replications

list_replications() -> DataFrame

List all available replication datasets and guides.

Returns:

Type Description
DataFrame

Columns: key, title, design, journal, n_obs, has_real_data, has_classic_track, has_modern_track.

Examples:

>>> import statspai as sp
>>> sp.list_replications()

check_identification

check_identification(data: DataFrame, y: str, treatment: Optional[str] = None, covariates: Optional[List[str]] = None, id: Optional[str] = None, time: Optional[str] = None, running_var: Optional[str] = None, instrument: Optional[str] = None, cluster: Optional[str] = None, cutoff: Optional[float] = None, design: Optional[str] = None, cohort: Optional[str] = None, dag=None, strict: bool = False) -> IdentificationReport

Run design-level identification diagnostics before fitting an estimator.

This reads your dataframe + design and outputs a prioritised list of pitfalls — bad controls, overlap violations, underpowered designs, small cohorts, clustering ambiguity.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treatment str

Binary or continuous treatment column.

None
covariates list of str

Candidate control variables.

None
id str

Panel identifiers.

None
time str

Panel identifiers.

None
running_var str

RD running variable.

None
instrument str

IV instrument.

None
cluster str

Clustering column for inference.

None
cutoff float

RD cutoff value.

None
design str

Override auto-detected design: one of 'rct', 'did', 'rd', 'iv', 'observational', 'panel'.

None
cohort str

First-treatment-period column (for staggered DID).

None
dag DAG

Causal DAG. If supplied, runs Cinelli-Forney-Pearl (2022) bad-control detection (mediator, descendant, collider, M-bias) and verifies the covariate set satisfies a valid adjustment criterion. Upgrades correlation heuristic to a principled check.

None
strict bool

If True, raise :class:IdentificationError when the report's verdict is 'BLOCKERS'. Use in CI / automated pipelines where you want a hard failure when the design is broken. The exception carries .report for post-mortem inspection.

False

Returns:

Type Description
IdentificationReport

With .summary(), .verdict, .findings, .by_category().

Examples:

>>> report = sp.check_identification(
...     df, y='wage', treatment='training',
...     covariates=['age', 'education'],
...     id='worker', time='year', design='did',
... )
>>> print(report.summary())
>>> if report.verdict == 'BLOCKERS':
...     raise RuntimeError("Design has identification blockers.")