`statspai.robustness`¶

robustness ¶

Robustness analysis tools.

spec_curve: Specification Curve Analysis (Simonsohn et al. 2020)
robustness_report: Automated battery of robustness checks
subgroup_analysis: Subgroup heterogeneity analysis with forest plot

SpecCurveResult `dataclass` ¶

Bases: ResultProtocolMixin

Holds all specification curve outputs.

Returned by :func:sp.spec_curve. Carries one row per specification in results_df plus summary statistics (median_estimate, share_significant, share_positive) and a .plot() method producing the two-panel Simonsohn-Simmons-Nelson figure.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> df = pd.DataFrame({
...     "education": rng.normal(12, 3, size=n),
...     "experience": rng.normal(10, 5, size=n),
...     "female": rng.integers(0, 2, size=n),
... })
>>> df["wage"] = (5.0 + 0.4 * df["education"] + 0.1 * df["experience"]
...               - 0.5 * df["female"] + rng.normal(size=n))
>>> res = sp.spec_curve(
...     data=df, y="wage", x="education",
...     controls=[[], ["experience"], ["experience", "female"]],
...     se_types=["nonrobust", "hc1"],
... )
>>> type(res).__name__
'SpecCurveResult'
>>> res.n_specs                     # 3 control sets x 2 SE types
6
>>> res.results_df.shape[0] == res.n_specs
True

References

simonsohn2020specification

results_df `instance-attribute` ¶

results_df: DataFrame

One row per specification with columns: spec_id, estimate, se, ci_lower, ci_upper, pvalue, significant, plus one column per choice dimension.

summary ¶

summary(alpha: float = 0.05) -> str

Return a formatted text summary.

to_latex ¶

to_latex(*args: Any, caption: Optional[str] = 'Specification Curve Summary', label: Optional[str] = None) -> str

Export summary to a LaTeX table.

to_dataframe ¶

to_dataframe() -> DataFrame

Return the full results DataFrame.

plot ¶

plot(alpha: float = 0.05, color_sig: str = '#2C3E50', color_nonsig: str = '#BDC3C7', figsize: Optional[Tuple[float, float]] = None, title: Optional[str] = None, sort_by: str = 'estimate') -> Tuple[Any, Tuple[Any, Any]]

Draw the canonical two-panel specification curve plot.

Top panel: sorted point estimates with 95 % CIs, coloured by significance. Bottom panel: indicator matrix showing which analytical choices produced each specification.

Parameters:

Name	Type	Description	Default
`alpha`	`float`	Significance threshold for colouring.	`0.05`
`color_sig`	`str`	Colours for significant / non-significant estimates.	`'#2C3E50'`
`color_nonsig`	`str`	Colours for significant / non-significant estimates.	`'#2C3E50'`
`figsize`	`tuple`	Figure size (width, height). Auto-sized if None.	`None`
`title`	`str`	Title for the top panel.	`None`
`sort_by`	`str`	Column to sort specifications by. Default `'estimate'`.	`'estimate'`

Returns:

Type	Description
`fig, axes : matplotlib Figure and array of Axes`

RobustnessResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for robustness report results.

Returned by :func:robustness_report. Holds one row per robustness check in results_df (alternative controls, clustering, winsorising, trimming, subsamples), alongside the baseline_estimate / baseline_se for the key variable x. Call :meth:summary for a stability assessment.

Examples:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> report = sp.robustness_report(
...     df,
...     formula="log_wage ~ education + experience + tenure",
...     x="education",
...     winsor_levels=[0.01],
... )
>>> type(report).__name__
'RobustnessResult'
>>> report.x
'education'
>>> bool(report.n_checks > 0 and "check" in report.results_df.columns)
True

results_df `instance-attribute` ¶

results_df: DataFrame

One row per robustness check.

summary ¶

summary() -> str

Formatted text summary.

to_latex ¶

to_latex(*, caption: Optional[str] = 'Robustness Checks', label: Optional[str] = None) -> str

Export to LaTeX table.

plot ¶

plot(figsize: Optional[Tuple[float, float]] = None, title: Optional[str] = None, color: str = '#2C3E50', baseline_color: str = '#E74C3C') -> Any

Forest-plot style visualization of robustness checks.

Returns:

Type	Description
`fig, ax : matplotlib Figure and Axes`

SubgroupResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for subgroup heterogeneity analysis.

Returned by :func:sp.subgroup_analysis. Holds one row per subgroup in results_df, the pooled overall_estimate / overall_se and per-grouping interaction Wald tests in het_tests.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> df = pd.DataFrame({
...     "education": rng.normal(12, 3, size=n),
...     "experience": rng.normal(10, 5, size=n),
...     "female": rng.integers(0, 2, size=n),
... })
>>> df["wage"] = (5.0 + 0.4 * df["education"] + 0.1 * df["experience"]
...               + 0.3 * df["female"] * df["education"]
...               + rng.normal(size=n))
>>> res = sp.subgroup_analysis(
...     data=df,
...     formula="wage ~ education + experience",
...     x="education",
...     by={"Gender": "female"},
... )
>>> type(res).__name__
'SubgroupResult'
>>> res.x
'education'
>>> "Gender" in res.het_tests
True

results_df `instance-attribute` ¶

results_df: DataFrame

Columns: group_var, group_val, estimate, se, ci_lower, ci_upper, pvalue, nobs, label.

het_tests `instance-attribute` ¶

het_tests: Dict[str, Dict[str, float]]

Per group_var: chi2, pvalue, df.

to_latex ¶

to_latex(*, caption: Optional[str] = 'Subgroup Heterogeneity Analysis', label: Optional[str] = None) -> str

Export to LaTeX.

plot ¶

plot(figsize: Optional[Tuple[float, float]] = None, title: Optional[str] = None, color: str = '#2C3E50', overall_color: str = '#E74C3C') -> Any

Forest plot of subgroup estimates.

Returns:

Type	Description
`(fig, ax)`

SensitivityDashboard `dataclass` ¶

Result of a unified sensitivity analysis.

Always contains an e_value entry; other entries are optional depending on what the estimator provides.

FrontierSensitivityResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for frontier sensitivity analysis.

Returned by :func:copula_sensitivity, :func:survival_sensitivity, and :func:calibrate_confounding_strength.

Examples:

>>> import statspai as sp
>>> res = sp.copula_sensitivity(0.3, 0.1)
>>> isinstance(res, sp.FrontierSensitivityResult)
True
>>> res.method
'copula_gaussian'

subgroup_analysis ¶

subgroup_analysis(data: DataFrame, formula: str, x: str, by: Dict[str, str], robust: str = 'hc1') -> SubgroupResult

Run subgroup heterogeneity analysis with forest plot.

Estimate the effect of x on y within each subgroup defined by the variables in by, and test for heterogeneity using interaction-based Wald tests.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Analysis dataset.	required
`formula`	`str`	Regression formula, e.g. `"wage ~ education + experience"`.	required
`x`	`str`	Key explanatory variable.	required
`by`	`dict[str, str]`	Mapping of display name → column name for grouping. Example: `{'Gender': 'female', 'Region': 'region'}`.	required
`robust`	`str`	Standard error type for subgroup regressions.	`'hc1'`

Returns:

Type	Description
`SubgroupResult`	Container with `.plot()`, `.summary()`, `.to_latex()`, `.results_df`.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> df = pd.DataFrame({
...     "female": rng.integers(0, 2, size=n),
...     "region": rng.integers(0, 3, size=n),
...     "education": rng.normal(12, 3, size=n),
...     "experience": rng.normal(10, 5, size=n),
... })
>>> df["wage"] = (
...     5 + 0.4 * df["education"] + 0.1 * df["experience"]
...     - 0.5 * df["female"] + rng.normal(size=n)
... )
>>> result = sp.subgroup_analysis(
...     data=df,
...     formula="wage ~ education + experience",
...     x='education',
...     by={'Gender': 'female', 'Region': 'region'},
... )
>>> fig = result.plot()
>>> type(result).__name__
'SubgroupResult'

copula_sensitivity ¶

copula_sensitivity(estimate: float, se: float, *, sigma_u: float = 1.0, sigma_y: float = 1.0, rho_grid: Optional[Sequence[float]] = None, alpha: float = 0.05) -> FrontierSensitivityResult

Gaussian-copula sensitivity to unobserved confounding.

Under a Gaussian copula with correlation rho between U (one latent unit-level confounder) and Y (outcome), the bias in an OLS / DML point estimate scales linearly with rho:

bias(rho) = rho * sigma_u * sigma_y / sigma_D²  ≈ rho * sigma_u * sigma_y

under the normalisation sigma_D = 1. The adjusted estimate is estimate - bias(rho); we sweep rho on a grid to find the breakpoint rho* that zeros the effect.

Parameters:

Name	Type	Description	Default
`estimate`	`float`		required
`se`	`float`		required
`sigma_u`	`float`	Standard deviations of the latent confounder and the outcome. With default values the bias coefficient is numerically equal to `rho`, matching Chernozhukov-Cinelli-Hazlett's "percentile scaling."	`1.0`
`sigma_y`	`float`	Standard deviations of the latent confounder and the outcome. With default values the bias coefficient is numerically equal to `rho`, matching Chernozhukov-Cinelli-Hazlett's "percentile scaling."	`1.0`
`rho_grid`	`sequence of float`	Correlation grid. Defaults to `np.linspace(-0.5, 0.5, 21)`.	`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`FrontierSensitivityResult`

Examples:

>>> import statspai as sp
>>> res = sp.copula_sensitivity(0.3, 0.1)
>>> res.method
'copula_gaussian'
>>> int(len(res.curve))
21

References

Balgi, Braun, Peña & Daoud (arXiv:2508.08752, 2025). [@balgi2025sensitivity]

survival_sensitivity ¶

survival_sensitivity(log_hr: float, se_log_hr: float, *, gamma_grid: Optional[Sequence[float]] = None, baseline_survival_t: float = 0.5, alpha: float = 0.05) -> FrontierSensitivityResult

Nonparametric sensitivity for survival / hazard-ratio outcomes.

Extends Rosenbaum's Gamma bounds to hazard ratios and converts them into shifted survival differences at a chosen time t.

Given an observed log hazard ratio log_hr with SE se_log_hr, bound the worst-case log-HR at sensitivity parameter Γ:

log_hr_worst(Γ) = log_hr − log(Γ),  log_hr_best(Γ) = log_hr + log(Γ)

and translate the worst case into a survival shift at time t using the proportional-hazards identity S_1(t) = S_0(t) ^ exp(log_hr).

Parameters:

Name	Type	Description	Default
`log_hr`	`float`		required
`se_log_hr`	`float`		required
`gamma_grid`	`sequence of float`	Gamma (≥ 1) values. Defaults to `np.linspace(1.0, 3.0, 21)`.	`None`
`baseline_survival_t`	`float`	Baseline S_0(t) used to report Δ survival at time t.	`0.5`
`alpha`	`float`		`0.05`

Examples:

>>> import statspai as sp
>>> res = sp.survival_sensitivity(0.4, 0.15)
>>> res.method
'survival_gamma'
>>> int(len(res.curve))
21

References

Hu & Westling (arXiv:2511.01412, 2025). [@hu2025nonparametric]

calibrate_confounding_strength ¶

calibrate_confounding_strength(estimate: float, se: float, *, observed_r2_outcome: float, observed_r2_treatment: float, alpha: float = 0.05, target_estimate: float = 0.0) -> FrontierSensitivityResult

Calibrate the strength of an unobserved confounder required to explain the observed effect to a target value.

Follows the Cinelli-Hazlett (2020) and Zhang et al. (2025) "ml-calibrated E-value" generalisation: given observed-covariate partial-R² with the outcome and treatment, the amount of residual variation an unobserved U would need to share with Y and D to shift the effect to target_estimate.

Parameters:

Name	Type	Description	Default
`estimate`	`float`		required
`se`	`float`		required
`observed_r2_outcome`	`float in (0, 1)`	Partial-R² of the observed covariate(s) with Y (resp. D). Used to benchmark "1x as confounding as observed" / "2x" etc.	required
`observed_r2_treatment`	`float in (0, 1)`	Partial-R² of the observed covariate(s) with Y (resp. D). Used to benchmark "1x as confounding as observed" / "2x" etc.	required
`alpha`	`float`		`0.05`
`target_estimate`	`float`	Effect value to explain away.	`0.0`

Examples:

>>> import statspai as sp
>>> res = sp.calibrate_confounding_strength(
...     0.3, 0.1, observed_r2_outcome=0.1, observed_r2_treatment=0.1)
>>> res.method
'calibrate_confounding_strength'
>>> list(res.curve.columns)[:3]
['multiplier', 'r2_outcome', 'r2_treatment']

References

Baitairian et al. (arXiv:2510.16560, 2025). [@baitairian2025calibrating] Cinelli & Hazlett (JRSS-B 2020).

statspai.robustness¶

robustness ¶

SpecCurveResult dataclass ¶

results_df instance-attribute ¶

summary ¶

to_latex ¶

to_dataframe ¶

plot ¶

RobustnessResult dataclass ¶

results_df instance-attribute ¶

summary ¶

to_latex ¶

plot ¶

SubgroupResult dataclass ¶

results_df instance-attribute ¶

het_tests instance-attribute ¶

to_latex ¶

plot ¶

SensitivityDashboard dataclass ¶

FrontierSensitivityResult dataclass ¶

subgroup_analysis ¶

copula_sensitivity ¶

survival_sensitivity ¶

calibrate_confounding_strength ¶

`statspai.robustness`¶

SpecCurveResult `dataclass` ¶

results_df `instance-attribute` ¶

RobustnessResult `dataclass` ¶

results_df `instance-attribute` ¶

SubgroupResult `dataclass` ¶

results_df `instance-attribute` ¶

het_tests `instance-attribute` ¶

SensitivityDashboard `dataclass` ¶

FrontierSensitivityResult `dataclass` ¶