`statspai.crossval`¶

crossval ¶

Cross-engine validation for StatsPAI.

sp.cross_validate runs one estimand through several independent engines (StatsPAI native, pyfixest, linearmodels, DoubleML, R's fixest via Rscript, Stata via batch do) and reports whether they agree — turning the cross-package-reproducibility discipline of Scott Cunningham's "estimate it two ways and check they match" into a single callable for humans and agents.

Public surface

:func:cross_validate — the dispatcher.
:class:CrossValidationResult — the verdict + per-engine table.
:class:EstimandSpec, :class:EngineEstimate, :class:TolerancePolicy — building blocks, exposed for advanced use and testing.

EngineEstimate `dataclass` ¶

One engine's answer for one focal coefficient.

Every backend adapter normalises its native result into this shape so the reconciliation logic never has to special-case a library. status keeps unavailable / failed engines in the data flow rather than dropping them.

Parameters:

Name	Type	Description	Default
`engine`	`str`	Backend label, e.g. `"statspai"`, `"pyfixest"`, `"linearmodels"`, `"R::fixest"`, `"Stata::reghdfe"`.	required
`estimand`	`str`	The estimand key that was requested (`"ols"`, `"iv"`, `"did"` …).	required
`term`	`str`	Name of the focal coefficient this estimate refers to.	`None`
`coef`	`float`	Point estimate and standard error of `term`. `None` when the engine did not produce one (status != ok).	`None`
`se`	`float`	Point estimate and standard error of `term`. `None` when the engine did not produce one (status != ok).	`None`
`tstat`	`float`		`None`
`pvalue`	`float`		`None`
`ci_lower`	`float`		`None`
`ci_upper`	`float`		`None`
`nobs`	`int`		`None`
`vcov`	`str`	Variance estimator flavour actually used (`"iid"`, `"HC1"`, `"cluster"` …) so a SE mismatch can be attributed to a vcov difference rather than a genuine disagreement.	`None`
`status`	`str`	One of `ok` / `unavailable` / `error` / `skipped`.	`STATUS_OK`
`message`	`str`	Human-readable note (why it was unavailable, the exception text …).	`None`
`elapsed_s`	`float`		`None`
`extra`	`dict`	Free-form backend extras (first-stage F, n_folds, full coef table …).	`dict()`

TolerancePolicy `dataclass` ¶

How close two engines should be on a focal coefficient, and why.

Attributes:

Name	Type	Description
`mode`	`str`	`"exact"` (judge coefficients on a relative-difference scale) or `"statistical"` (judge on a standard-error scale).
`coef_rtol, coef_atol`	`float`	Relative / absolute tolerance on the point estimate (`exact` mode).
`se_rtol`	`float`	Relative tolerance on the standard error (`exact` mode). SEs are looser than coefficients because dof corrections and default vcov flavours legitimately differ across libraries.
`se_band`	`float`	`statistical` mode: two estimates agree if `\|Δcoef\| <= se_band * max(se)`.
`rationale`	`str`	Plain-language justification, surfaced in the report so a reader can see why a given tolerance was applied.

CrossValidationResult ¶

Outcome of cross-validating one estimand across several engines.

Attributes:

Name	Type	Description
`estimand`	`str`
`term`	`str`	Focal coefficient that was reconciled.
`estimates`	`list of EngineEstimate`	Every engine that was requested (including unavailable / errored ones).
`agreement`	`AgreementReport`	Verdict + spread diagnostics.
`spec`	`dict`	Serialised :class:`EstimandSpec` (what was fit).
`provenance`	`dict`	Engine versions / environment captured for reproducibility.
`degradations`	`list of dict`	Structured records for every engine that could not contribute, mirrored from :func:`statspai.workflow.record_degradation`.

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> df = pd.DataFrame(
...     {"y": [1.0, 2.0, 3.0, 4.0], "x": [0.0, 1.0, 0.0, 1.0]}
... )
>>> cv = sp.cross_validate(
...     df, "ols", formula="y ~ x", treatment="x", engines=["statspai"]
... )
>>> isinstance(cv, sp.CrossValidationResult)
True
>>> cv.term
'x'

estimates_table `property` ¶

estimates_table: DataFrame

One row per requested engine (ok and not-ok alike).

engine_status_counts `property` ¶

engine_status_counts: Dict[str, int]

Count engines by status, including unavailable/error entries.

can_claim_cross_engine_agreement `property` ¶

can_claim_cross_engine_agreement: bool

Whether it is honest to report cross-engine agreement.

ok_table ¶

ok_table() -> DataFrame

Only the engines that produced an estimate.

plot ¶

plot(ax: Any = None, **kwargs: Any) -> Any

Forest plot of the engines' estimates with shared-range shading.

EstimandSpec `dataclass` ¶

Engine-neutral description of one estimand.

Either formula (fixest 1-3 part syntax) or the structured fields (y + treatment + covariates …) must pin down the model. The constructor helpers :meth:from_kwargs and :meth:from_result fill both representations so downstream adapters can pick whichever they prefer.

Parameters:

Name	Type	Description	Default
`estimand`	`str`	Canonical estimand key (`"ols"`, `"feols"`, `"iv"`, `"did"`, `"poisson"`, `"dml"` …).	required
`data`	`DataFrame`		required
`formula`	`str`	fixest-style: `"y ~ x1 + x2"` (OLS), `"y ~ x \| fe1 + fe2"` (FE), `"y ~ x \| fe \| endog ~ z1 + z2"` (IV with FE).	`None`
`y`	`str`	Outcome and focal regressor (the coefficient cross-validation reconciles by default).	`None`
`treatment`	`str`	Outcome and focal regressor (the coefficient cross-validation reconciles by default).	`None`
`covariates`	`list of str`		`list()`
`fixed_effects`	`list of str`		`list()`
`endog`	`list of str`	Endogenous regressors (IV).	`list()`
`instruments`	`list of str`		`list()`
`cluster`	`list of str`		`list()`
`weights`	`str`		`None`
`vcov`	`str`	Requested variance estimator (`"iid"`, `"HC1"`, `"cluster"` …).	`None`
`term`	`str`	Focal coefficient to reconcile. Defaults to `treatment` (or the first endogenous regressor for IV).	`None`
`extra`	`dict`	Estimand-specific extras forwarded verbatim (e.g. DiD `time` / `unit` / `gname` columns).	`dict()`

from_result `classmethod` ¶

from_result(result: Any) -> 'EstimandSpec'

Best-effort recovery of a spec from a fitted StatsPAI result.

Reads the metadata StatsPAI result objects commonly carry (estimand / formula / data / treatment column). Raises a clear error when the result does not expose enough to re-run it elsewhere — rather than guessing and silently cross-validating the wrong model.

focal_term ¶

focal_term() -> str

Name of the coefficient cross-validation reconciles.

build_formula ¶

build_formula() -> str

Construct a fixest-style formula from the structured fields.

statspai.crossval¶

crossval ¶

EngineEstimate dataclass ¶

TolerancePolicy dataclass ¶