`statspai.diagnostics`¶

diagnostics ¶

Diagnostics and sensitivity analysis for StatsPAI.

Provides: - Oster (2019) coefficient stability bounds - McCrary (2008) density discontinuity test for RD manipulation

WeakRobustResult ¶

Bases: ResultProtocolMixin

Container holding the unified weak-IV-robust panel.

Returned by :func:sp.weakrobust. Supports dict-style lookup (panel["effective_F"]), .get(), .to_dict(), .to_frame() (Stata-style single table) and .summary().

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> panel = sp.weakrobust(df, y="wage", endog="educ",
...                       instruments=["z1", "z2"],
...                       clr_simulations=2000, grid_size=101,
...                       random_state=0)
>>> type(panel).__name__
'WeakRobustResult'
>>> panel.endog
'educ'
>>> "effective_F" in panel.to_dict()
True

to_frame ¶

to_frame() -> DataFrame

Return the Stata-style single-table summary.

KitagawaResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of the Kitagawa (2015) LATE validity test.

Attributes:

Name	Type	Description
`statistic`	`float`	Test statistic (maximum violation of the inequality conditions).
`p_value`	`float`	Bootstrap p-value.
`first_stage`	`float`	First-stage effect: P(D=1\|Z=1) - P(D=1\|Z=0).
`n_boot`	`int`	Number of bootstrap replications used.
`n_obs`	`int`	Number of observations.
`max_violation_treated`	`float`	Maximum violation of the treated-complier CDF condition.
`max_violation_untreated`	`float`	Maximum violation of the untreated-complier CDF condition.

Methods:

Name	Description
`summary`	Print a formatted summary.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(4)
>>> z = rng.integers(0, 2, 400)
>>> d = ((0.3 + 0.5 * z + rng.normal(0, 0.2, 400)) > 0.5).astype(int)
>>> y = 1.0 + 0.8 * d + rng.normal(size=400)
>>> df = pd.DataFrame({"outcome": y, "treated": d, "assigned": z})
>>> res = sp.kitagawa_test(data=df, y="outcome", treatment="treated",
...                        instrument="assigned", n_boot=200, seed=42)
>>> type(res).__name__
'KitagawaResult'
>>> bool(0.0 <= res.p_value <= 1.0)
True

summary ¶

summary() -> str

Return a formatted summary string.

RosenbaumResult `dataclass` ¶

Bases: ResultProtocolMixin

Result container for :func:rosenbaum_bounds.

Examples:

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> treated = rng.normal(1.0, 1.0, 80)
>>> control = rng.normal(0.0, 1.0, 80)
>>> res = sp.rosenbaum_bounds(treated, control)
>>> type(res).__name__
'RosenbaumResult'
>>> res.n_pairs
80

to_dict ¶

to_dict() -> dict

JSON-safe dict of every field (agent-native serialization).

oster_bounds ¶

oster_bounds(data: Optional[DataFrame] = None, y: Optional[str] = None, treat: Optional[str] = None, controls: Optional[List[str]] = None, r_max: Optional[float] = None, delta: float = 1.0, beta_short: Optional[float] = None, r2_short: Optional[float] = None, beta_long: Optional[float] = None, r2_long: Optional[float] = None, alpha: float = 0.05) -> Dict[str, Any]

Oster (2019) coefficient stability bounds.

Assesses robustness of a treatment effect estimate to omitted variable bias by computing how much unobservable selection (relative to observable selection) would be needed to explain away the result.

Can be called in two ways:

From data — runs short and long regressions internally::

result = oster_bounds(data, y='wage', treat='training', controls=['age', 'education'])
From pre-computed statistics — supply β and R² directly::

result = oster_bounds(beta_short=2.5, r2_short=0.15, beta_long=2.0, r2_long=0.45)

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data (required for method 1).	`None`
`y`	`str`	Outcome variable.	`None`
`treat`	`str`	Treatment variable.	`None`
`controls`	`list of str`	Control variables (included in the long but not short regression).	`None`
`r_max`	`float`	Maximum R² under full selection. Default: min(1.0, 1.3 × R²_long).	`None`
`delta`	`float`	Proportionality assumption: ratio of unobservable-to-observable selection. δ=1 means equal selection.	`1.0`
`beta_short`	`float`	Treatment coefficient from short regression (no controls).	`None`
`r2_short`	`float`	R² from short regression.	`None`
`beta_long`	`float`	Treatment coefficient from long regression (with controls).	`None`
`r2_long`	`float`	R² from long regression.	`None`
`alpha`	`float`	Significance level for the identified set.	`0.05`

Returns:

Type Description

dict

Keys: - beta_short, r2_short: short regression estimates - beta_long, r2_long: long regression estimates - r_max: maximum R² used - delta_for_zero: δ* such that adjusted β = 0 (robustness measure) - beta_adjusted: bias-adjusted β under (δ, R_max) - identified_set: [min(β_adjusted, β_long), max(β_adjusted, β_long)] - robust: True if identified set excludes zero - interpretation: human-readable interpretation

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> # From data
>>> rng = np.random.default_rng(1)
>>> n = 300
>>> age = rng.normal(40, 10, n)
>>> education = rng.normal(13, 3, n)
>>> training = (rng.normal(size=n) + 0.3 * (education - 13) > 0).astype(float)
>>> wage = 20 + 2.0 * training + 0.1 * age + 0.5 * education + rng.normal(size=n)
>>> df = pd.DataFrame({"wage": wage, "training": training,
...                    "age": age, "education": education})
>>> result = sp.oster_bounds(df, y='wage', treat='training',
...                          controls=['age', 'education'])
>>> bool('delta_for_zero' in result)
True

>>> # From statistics (e.g., read from a paper)
>>> result = sp.oster_bounds(beta_short=2.5, r2_short=0.15,
...                          beta_long=2.0, r2_long=0.45)
>>> round(result['delta_for_zero'], 1)
1.8

mccrary_test ¶

mccrary_test(data: DataFrame, x: str, c: float = 0, bw: Optional[float] = None, n_bins: Optional[int] = None, alpha: float = 0.05) -> CausalResult

McCrary (2008) density discontinuity test for RD manipulation.

Tests the null hypothesis that the density of the running variable is continuous at the cutoff. A rejection suggests possible manipulation of the running variable.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`x`	`str`	Running variable name.	required
`c`	`float`	RD cutoff.	`0`
`bw`	`float`	Bandwidth for local linear density estimation. Default: automatic (2 × Silverman rule).	`None`
`n_bins`	`int`	Number of histogram bins per side. Default: ceil(sqrt(n/2)).	`None`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type	Description
`CausalResult`	`estimate`: log density difference ln(f̂₊) - ln(f̂₋) `pvalue`: two-sided test for H₀: no discontinuity `model_info['density_left']`: estimated density from the left `model_info['density_right']`: estimated density from the right

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> # Smooth running variable (no manipulation at the cutoff)
>>> df = pd.DataFrame({"score": rng.normal(0, 1, size=500)})
>>> result = sp.mccrary_test(df, x='score', c=0)
>>> bool(result.pvalue > 0.05)   # no manipulation -> fail to reject
True

diagnose ¶

diagnose(data: DataFrame, y: str, x: List[str], print_results: bool = True) -> Dict[str, Any]

Comprehensive regression diagnostics in one call.

Equivalent to running Stata's estat hettest, estat ovtest, estat vif after a regression.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Data used in the regression.	required
`y`	`str`	Dependent variable name.	required
`x`	`list of str`	Independent variable names (excluding constant).	required
`print_results`	`bool`	Print formatted output.	`True`

Returns:

Type	Description
`dict`	Keys: `'het_test'`, `'reset_test'`, `'vif'`, each containing test results.

Examples:

Run the full diagnostic suite on a Mincer-style wage regression:

>>> import statspai as sp
>>> df = sp.cps_wage()
>>> diag = sp.diagnose(df, y='log_wage',
...                    x=['education', 'experience'],
...                    print_results=False)
>>> sorted(diag.keys())
['het_test', 'reset_test', 'vif']

References

breusch1979simple, ramsey1969tests, white1980heteroskedasticity

het_test ¶

het_test(data: DataFrame, y: str, x: List[str]) -> Dict[str, float]

Breusch-Pagan test for heteroskedasticity.

H0: Homoskedastic errors (constant variance). H1: Variance depends on regressors.

Equivalent to Stata's estat hettest.

Parameters:

Name	Type	Default
`data`	as in ``diagnose()``	required
`y`	as in ``diagnose()``	required
`x`	as in ``diagnose()``	required

Returns:

Type	Description
`dict`	`'statistic'`, `'df'`, `'pvalue'`.

References

Breusch, T.S. and Pagan, A.R. (1979). Econometrica, 47(5), 1287-1294. [@breusch1979simple]

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> df = pd.DataFrame({
...     "x1": rng.normal(size=200),
...     "x2": rng.normal(size=200),
... })
>>> df["y"] = (1.0 + 0.5 * df["x1"] - 0.3 * df["x2"]
...            + rng.normal(size=200))
>>> ht = sp.het_test(df, y="y", x=["x1", "x2"])
>>> int(ht["df"])
2
>>> ht["pvalue"] < 0.05   # homoskedastic data: do not reject H0
False

reset_test ¶

reset_test(data: DataFrame, y: str, x: List[str], powers: int = 3) -> Dict[str, float]

Ramsey RESET test for functional form misspecification.

Tests whether nonlinear combinations of fitted values help explain the dependent variable. Rejection suggests the model is missing important nonlinearities.

Equivalent to Stata's estat ovtest.

Parameters:

Name	Type	Description	Default
`data`	as in ``diagnose()``		required
`y`	as in ``diagnose()``		required
`x`	as in ``diagnose()``		required
`powers`	`int`	Include ŷ², ŷ³, ..., ŷ^powers in the auxiliary regression.	`3`

Returns:

Type	Description
`dict`	`'statistic'` (F), `'df1'`, `'df2'`, `'pvalue'`.

References

Ramsey, J.B. (1969). JRSS-B, 31(2), 350-371. [@ramsey1969tests]

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> df = pd.DataFrame({
...     "x1": rng.normal(size=200),
...     "x2": rng.normal(size=200),
... })
>>> df["y"] = (1.0 + 0.5 * df["x1"] - 0.3 * df["x2"]
...            + rng.normal(size=200))
>>> rt = sp.reset_test(df, y="y", x=["x1", "x2"])
>>> rt["df1"], rt["df2"]
(2, 195)
>>> rt["pvalue"] > 0.05   # linear form is correct: do not reject
True

vif ¶

vif(data: DataFrame, x: List[str]) -> DataFrame

Variance Inflation Factors for multicollinearity detection.

VIF > 10 is a common rule of thumb for concerning multicollinearity.

Equivalent to Stata's estat vif.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`x`	`list of str`	Independent variables.	required

Returns:

Type	Description
`DataFrame`	Columns: `'variable'`, `'VIF'`, `'1/VIF'`.

Notes

VIF_j = 1 / (1 - R²_j) where R²_j is from regressing x_j on all other x variables.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> df = pd.DataFrame({
...     "x1": rng.normal(size=200),
...     "x2": rng.normal(size=200),
... })
>>> v = sp.vif(df, x=["x1", "x2"])
>>> list(v.columns)
['variable', 'VIF', '1/VIF']
>>> bool((v["VIF"] < 5).all())   # near-orthogonal: low VIFs
True

hausman_test ¶

hausman_test(data: DataFrame, y: str, x: List[str], id: str, time: str, alpha: float = 0.05) -> Dict[str, Any]

Hausman test for FE vs RE in panel data.

Equivalent to Stata's hausman fe re.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`y`	`str`	Dependent variable.	required
`x`	`list of str`	Independent variables (time-varying).	required
`id`	`str`	Unit identifier.	required
`time`	`str`	Time period identifier.	required
`alpha`	`float`		`0.05`

Returns:

Type	Description
`dict`	`'statistic'`: Hausman chi² statistic `'df'`: degrees of freedom `'pvalue'`: p-value `'recommendation'`: 'FE' or 'RE' `'beta_fe'`, `'beta_re'`: coefficient vectors

Examples:

>>> import statspai as sp
>>> df = sp.dgp_panel(n_units=40, n_periods=8, seed=0)
>>> result = sp.hausman_test(df, y='y', x=['x'], id='unit', time='time')
>>> bool(result['recommendation'] in ('FE', 'RE'))
True
>>> bool(0.0 <= result['pvalue'] <= 1.0)
True

Notes

The test statistic is:

.. math:: H = (\hat{\beta}{FE} - \hat{\beta}{RE})' [V(\hat{\beta}{FE}) - V(\hat{\beta}{RE})]^{-1} (\hat{\beta}{FE} - \hat{\beta}{RE})

Under H0, H ~ χ²(k).

Reject H0 (p < 0.05) → Use Fixed Effects
Fail to reject → Use Random Effects (more efficient)

See Hausman (1978, Econometrica).

anderson_rubin_test ¶

anderson_rubin_test(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, h0: float = 0, alpha: float = 0.05, vcov: str = 'HC1') -> Dict[str, Any]

Anderson-Rubin (1949) test — size-correct under weak instruments.

Tests H0: β_endog = h0 and constructs a confidence set by inverting the test over a grid of candidate values. The AR test has correct size regardless of instrument strength and is the recommended inference procedure when F_eff < 10.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome variable.	required
`endog`	`str`	Endogenous regressor (single).	required
`instruments`	`list of str`	Excluded instruments.	required
`exog`	`list of str`	Included exogenous controls.	`None`
`h0`	`float`	Null hypothesis value for the endogenous coefficient.	`0`
`alpha`	`float`	Significance level.	`0.05`
`vcov`	`(HC0, HC1, classic)`	Variance estimator used for the Olea-Pflueger effective F reported alongside AR.	`'HC0'`

Returns:

Type Description

dict

'ar_stat' : AR F statistic at h0. 'ar_df' : Degrees of freedom (k_z, n - k_w - k_z). 'ar_pvalue' : P-value of AR test at h0. 'ar_ci' : (low, high) AR confidence set (grid-inverted). 'beta_2sls' : 2SLS point estimate. 'first_stage_F' : Classical first-stage F. 'effective_F' : Olea-Pflueger robust F_eff. 'tF_critical_value' : Lee et al. (2022) adjusted 5 % two-sided critical value. None if alpha != 0.05. 'strength' : Instrument-strength interpretation. 'interpretation' : Human-readable summary.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> result = sp.anderson_rubin_test(df, y='wage', endog='educ',
...                                 instruments=['z1', 'z2'])
>>> bool(0.0 <= result['ar_pvalue'] <= 1.0)
True

Notes

Under H0: β = β₀, regress Y - β₀·D on Z and W. The F-test on Z gives the AR statistic. The confidence set is constructed by collecting all β₀ values not rejected at level alpha. If the AR set is unbounded on one or both sides, the returned (low, high) will be ±inf accordingly.

effective_f_test ¶

effective_f_test(data: DataFrame, endog: str, instruments: List[str], exog: Optional[List[str]] = None, vcov: str = 'HC1') -> Dict[str, Any]

Olea-Pflueger (2013) robust effective F statistic for weak instruments.

Computes the heteroskedasticity-robust effective F that is a pre-test for the concentration parameter of the first stage. Under homoskedasticity (vcov='classic') and a single instrument it reduces exactly to the standard first-stage F.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`endog`	`str`	Endogenous regressor (single endogenous variable).	required
`instruments`	`list of str`	Excluded instruments.	required
`exog`	`list of str`	Included exogenous controls (a constant is added automatically).	`None`
`vcov`	`(HC0, HC1, classic)`	Variance estimator for the first-stage residuals: `'classic'` — homoskedastic; F_eff equals first-stage F. `'HC0'` — White heteroskedasticity-robust. `'HC1'` — HC0 with small-sample correction `n/(n-k)`.	`'HC0'`

Returns:

Type	Description
`dict`	`'F_eff'` : Olea-Pflueger effective F. `'first_stage_F'` : Classical first-stage F (for comparison). `'n_instruments'` : Number of excluded instruments (`k_z`). `'n_obs'` : Sample size. `'strength'` : Interpretation string. `'stock_yogo_10pct'` : 23.1 (conventional threshold for <10 % bias).

Notes

The formula (Andrews-Stock-Sun 2019, eq. 4.13) is

.. math::

F_{\text{eff}} = \frac{\hat\pi' (\tilde Z' \tilde Z)\hat\pi}
    {\mathrm{tr}\!\left(\hat\Omega\,(\tilde Z' \tilde Z)^{-1}\right)}

where :math:\tilde Z, \tilde D are residualized after partialling out the exogenous controls, :math:\hat\pi is the first-stage OLS coefficient vector, and :math:\hat\Omega = \sum_i \hat\eta_i^2 \tilde z_i \tilde z_i' is the HC meat.

Under homoskedasticity :math:\hat\Omega \approx \hat\sigma_\eta^2 \tilde Z'\tilde Z, so the trace collapses to :math:k_z \hat \sigma_\eta^2 and :math:F_{\text{eff}} reduces to the first-stage F.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> res = sp.effective_f_test(df, endog='educ', instruments=['z1', 'z2'])
>>> bool(res['F_eff'] > res['stock_yogo_10pct'])  # strong instruments
True

tF_critical_value ¶

tF_critical_value(first_stage_F: float, alpha: float = 0.05) -> float

Lee–McCrary–Moreira–Porter (2022, AER) tF adjusted critical value.

Returns the adjusted two-sided t-ratio critical value c(F) such that |t| > c(F) is a valid 1 - alpha test of the 2SLS coefficient, robust to weak instruments.

Parameters:

Name	Type	Description	Default
`first_stage_F`	`float`	Observed first-stage F statistic (or Olea-Pflueger F_eff).	required
`alpha`	`float`	Significance level. Only `0.05` is implemented (the only level for which LMMP publish a complete table).	`0.05`

Returns:

Type	Description
`float`	Adjusted critical value. Returns `inf` when `F ≤ 3.84` (AR inference should be used instead). Converges to `1.96` as `F → ∞`.

Raises:

Type	Description
`ValueError`	If `alpha` is not 0.05.

Notes

The LMMP tF procedure gives exactly correct 5 % size for any first-stage strength. The standard 1.96 critical value over-rejects substantially when F < 104.7 (the value at which c = 1.96).

Examples:

>>> import statspai as sp
>>> # With first-stage F = 10, the adjusted critical value is ~3.16
>>> c = sp.tF_critical_value(10.0)
>>> print(f"Adjusted 5% critical value: {c:.2f}")

weakrobust ¶

weakrobust(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, *, h0: float = 0.0, alpha: float = 0.05, vcov: str = 'HC1', include_clr: bool = True, include_k: bool = True, clr_simulations: int = 20000, grid_size: int = 401, random_state: Optional[int] = None) -> WeakRobustResult

Stata-style unified weak-instrument-robust diagnostic panel.

Bundles in a single call:

Classical first-stage F and Olea–Pflueger (2013) effective F — instrument-strength pre-tests.
Kleibergen–Paap (2006) rk LM and Wald F — rank test of the reduced form, heteroskedasticity-robust Cragg-Donald analogue.
Anderson–Rubin (1949) F test of H0: β_endog = h0 and its weak-IV-robust confidence set.
Moreira (2003) Conditional LR (CLR) test and CLR confidence set by grid inversion (uniformly most powerful invariant for a single endogenous regressor).
Kleibergen (2002) K score test and K confidence set.
Lee–McCrary–Moreira–Porter (2022) tF adjusted critical value.

This entry point is the Python analogue of Stata 19's ivregress 2sls; estat weakrobust and weakiv / ivreg2 in Stata 17+, unifying tools that are scattered across ivmodel (R), linearmodels (Python), and the user-written packages weakiv, rivtest (Stata).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome and single endogenous regressor (column names in `data`).	required
`endog`	`str`	Outcome and single endogenous regressor (column names in `data`).	required
`instruments`	`list of str`	Excluded instruments.	required
`exog`	`list of str`	Included exogenous controls. An intercept is always added.	`None`
`h0`	`float`	Null value for the endogenous coefficient. All under-H0 tests (AR, CLR, K) are evaluated at `β = h0`.	`0.0`
`alpha`	`float`	Significance level for the robust confidence sets.	`0.05`
`vcov`	`(HC0, HC1, classic)`	Used by the Olea–Pflueger effective F.	`'HC0'`
`include_clr`	`bool`	Also run the CLR test and invert it for a CLR confidence set.	`True`
`include_k`	`bool`	Also run the Kleibergen K score test and K confidence set.	`True`
`clr_simulations`	`int`	Monte-Carlo draws for the CLR null distribution.	`20 000`
`grid_size`	`int`	Grid resolution used by AR/CLR/K confidence-set inversion.	`401`
`random_state`	`int`		`None`

Returns:

Type	Description
`WeakRobustResult`	Accepts `.summary()`, `.to_frame()` and dict-style lookup.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> panel = sp.weakrobust(df, y='wage', endog='educ',
...                       instruments=['z1', 'z2'],
...                       clr_simulations=2000, grid_size=101,
...                       random_state=0)
>>> bool("effective_F" in panel.to_dict())
True

References

Anderson–Rubin (1949) Ann. Math. Stat. 20, 46-63. Kleibergen (2002) Econometrica 70, 1781-1803. Moreira (2003) Econometrica 71, 1027-1048. Kleibergen–Paap (2006) J. Econom. 133, 97-126. Olea–Pflueger (2013) JBES 31, 358-369. Lee–McCrary–Moreira–Porter (2022) AER 112, 3260-3290. [@anderson1949estimation]

evalue_from_result ¶

evalue_from_result(result: Any, measure: str = 'SMD', rare: Optional[bool] = None, rare_outcome: Optional[bool] = None, true: Optional[float] = None) -> Dict[str, Any]

Compute an E-value from a StatsPAI CausalResult object.

Parameters:

Name	Type	Description	Default
`result`	`CausalResult`	Result from any StatsPAI causal estimator exposing a scalar `estimate` (and ideally `se` / `ci`).	required
`measure`	`str`	How to interpret `result.estimate` (for ATE/ATT on continuous outcomes `'SMD'` is appropriate; pass `'RR'` / `'OR'` / `'HR'` for ratio estimates).	`'SMD'`
`rare`	`Optional[bool]`	Passed through to :func:`evalue`.	`None`
`rare_outcome`	`Optional[bool]`	Passed through to :func:`evalue`.	`None`
`true`	`Optional[bool]`	Passed through to :func:`evalue`.	`None`

Returns:

Type	Description
`dict`	Same structure as :func:`evalue`.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> rows = []
>>> for u in range(100):
...     treated = int(u < 50)
...     fe = rng.normal()
...     for post in (0, 1):
...         y = (1.0 + fe + 0.5 * post
...              + 0.8 * treated * post + rng.normal())
...         rows.append({"unit": u, "y": y,
...                      "treated": treated, "post": post})
>>> df = pd.DataFrame(rows)
>>> res = sp.did(df, y="y", treat="treated", time="post")
>>> ev = sp.evalue_from_result(res, measure="SMD")
>>> ev["measure"]
'SMD'
>>> ev["evalue_estimate"] > 1.0
True

evalue_rd ¶

evalue_rd(n11: float, n10: float, n01: float, n00: float, true: float = 0.0, alpha: float = 0.05, grid: float = 0.0001) -> Dict[str, Any]

Exact E-value for a risk difference from a 2x2 table.

Implements the Ding & VanderWeele (2016) risk-difference E-value, a faithful port of EValue::evalues.RD. The exposure must be coded so that the risk difference is non-negative.

Parameters:

Name	Type	Description	Default
`n11`	`float`	Exposed cases and exposed non-cases.	required
`n10`	`float`	Exposed cases and exposed non-cases.	required
`n01`	`float`	Unexposed cases and unexposed non-cases.	required
`n00`	`float`	Unexposed cases and unexposed non-cases.	required
`true`	`float`	Reference risk difference (must be <= the observed RD).	`0.0`
`alpha`	`float`	Significance level for the confidence-limit E-value.	`0.05`
`grid`	`float`	Step size of the bias-factor grid search for the CI E-value.	`1e-4`

Returns:

Type	Description
`dict`	`evalue_estimate` (point E-value), `evalue_ci` (E-value for the lower confidence limit; 1 if the CI already crosses `true`), `rd` (the observed risk difference), `bias_factor` and `measure='RD'`.

Examples:

>>> import statspai as sp
>>> round(sp.evalue_rd(200, 150, 100, 250)["evalue_estimate"], 4)
3.4142

bias_factor ¶

bias_factor(rr_eu: float, rr_ud: float) -> float

Confounding bias factor B (Ding & VanderWeele 2016).

Given the maximum risk ratios of an unmeasured confounder U with the exposure (rr_eu) and with the outcome (rr_ud), the factor by which they can bias a risk ratio is

B = (rr_eu * rr_ud) / (rr_eu + rr_ud - 1).

The E-value is the value e such that bias_factor(e, e) equals the observed risk ratio.

Examples:

>>> import statspai as sp
>>> round(sp.bias_factor(2.0, 2.0), 4)
1.3333
>>> round(sp.bias_factor(3.0, 3.0), 4)
1.8

diagnose_result ¶

diagnose_result(result: Any, print_results: bool = True, alpha: float = 0.05) -> Dict[str, Any]

One-call diagnostic battery, auto-selected by method type.

Parameters:

Name	Type	Description	Default
`result`	`EconometricResults or CausalResult`	A fitted result from any StatsPAI estimator.	required
`print_results`	`bool`	If True, print formatted diagnostics to stdout.	`True`
`alpha`	`float`	Significance level for tests.	`0.05`

Returns:

Type	Description
`dict`	Keys depend on the method. Always includes `'method_type'` and `'checks'` (list of individual test result dicts).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> rows = []
>>> for u in range(100):
...     treated = int(u < 50)
...     fe = rng.normal()
...     for post in (0, 1):
...         y = (1.0 + fe + 0.5 * post
...              + 0.8 * treated * post + rng.normal())
...         rows.append({"unit": u, "y": y,
...                      "treated": treated, "post": post})
>>> df = pd.DataFrame(rows)
>>> res = sp.did(df, y="y", treat="treated", time="post")
>>> out = sp.diagnose_result(res, print_results=False)
>>> out["method_type"]
'did'
>>> sorted(out.keys())
['checks', 'method_type', 'next_steps', 'violations']

kitagawa_test ¶

kitagawa_test(data: DataFrame, y: str, treatment: str, instrument: str, n_boot: int = 1000, n_grid: int = 100, seed: Optional[int] = None) -> KitagawaResult

Kitagawa (2015) specification test for the validity of LATE.

Tests whether the data are consistent with the LATE assumptions (independence, exclusion restriction, monotonicity) by checking that the implied complier potential-outcome CDFs are proper distribution functions.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`y`	`str`	Outcome variable name.	required
`treatment`	`str`	Endogenous binary treatment variable (D).	required
`instrument`	`str`	Binary instrument variable (Z).	required
`n_boot`	`int`	Number of bootstrap replications for the p-value.	`1000`
`n_grid`	`int`	Number of grid points for evaluating the CDF conditions.	`100`
`seed`	`int`	Random seed for reproducibility.	`None`

Returns:

Type	Description
`KitagawaResult`	Test results including statistic, p-value, and first-stage effect.

Raises:

Type	Description
`ValueError`	If instrument or treatment are not binary, or first stage is zero.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> z = rng.integers(0, 2, size=800)  # binary instrument
>>> d = ((0.2 + 0.5 * z + rng.normal(scale=0.3, size=800)) > 0.5)
>>> d = d.astype(int)  # first stage: z shifts take-up
>>> y = 1.0 + 0.8 * d + rng.normal(size=800)
>>> df = pd.DataFrame(
...     {"outcome": y, "treated": d, "assigned": z})
>>> result = sp.kitagawa_test(
...     data=df,
...     y="outcome",
...     treatment="treated",
...     instrument="assigned",
...     n_boot=200,
...     seed=42,
... )
>>> type(result).__name__
'KitagawaResult'
>>> bool(0.0 <= result.p_value <= 1.0)
True

rosenbaum_bounds ¶

rosenbaum_bounds(treated: Optional[Sequence[float]] = None, control: Optional[Sequence[float]] = None, *, data: Optional[DataFrame] = None, y: Optional[str] = None, treat: Optional[str] = None, pair_id: Optional[str] = None, method: str = 'wilcoxon', alternative: str = 'greater', gamma_grid: Optional[Sequence[float]] = None, alpha: float = 0.05) -> RosenbaumResult

Compute Rosenbaum bounds on a paired observational study.

You can pass either two parallel arrays (treated, control) of matched outcomes, or a long-format DataFrame with columns y, treat, pair_id.

Parameters:

Name	Type	Description	Default
`treated`	`array - like`	Outcome in the treated / control unit of each matched pair (same length). Ignored if `data` is provided.	`None`
`control`	`array - like`	Outcome in the treated / control unit of each matched pair (same length). Ignored if `data` is provided.	`None`
`data`	`DataFrame`	Long-format data. Must contain exactly two rows per `pair_id`, one with `treat=1` and one with `treat=0`.	`None`
`method`	`('wilcoxon', 'sign')`	Wilcoxon signed-rank bound (continuous) or binomial sign test (robust / binary).	`"wilcoxon"`
`alternative`	`('greater', 'less', 'two-sided')`	Direction of the alternative hypothesis for the treatment effect.	`"greater"`
`gamma_grid`	`sequence of float`	Gamma values (>= 1) over which to compute bounding p-values. Default: `np.arange(1.0, 3.01, 0.1)`.	`None`
`alpha`	`float`	Significance level used to report `gamma_critical`.	`0.05`

Returns:

Type	Description
`RosenbaumResult`	`gamma_critical` is the smallest Gamma in the grid at which the upper-bound p-value exceeds `alpha`. It is `inf` if the study is insensitive across the grid, and `1.0` if already sensitive.

Examples:

Two parallel arrays of matched-pair outcomes (Wilcoxon bound):

>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> treated = rng.normal(1.0, 1.0, 80)
>>> control = rng.normal(0.0, 1.0, 80)
>>> res = sp.rosenbaum_bounds(treated, control)
>>> res.gamma_critical >= 1.0
True

A binary / robust sign-test bound on an insensitive example:

>>> import numpy as np
>>> t = np.array([2.0, 3.0, 4.0, 5.0])
>>> c = np.array([1.0, 1.0, 1.0, 1.0])
>>> bounds = sp.rosenbaum_bounds(t, c, method="sign", gamma_grid=[1.0, 2.0])
>>> bool(np.all(bounds.pvalue_upper >= bounds.pvalue_lower))
True

statspai.diagnostics¶

diagnostics ¶

WeakRobustResult ¶

to_frame ¶

KitagawaResult dataclass ¶

summary ¶

RosenbaumResult dataclass ¶

to_dict ¶

oster_bounds ¶

mccrary_test ¶

diagnose ¶

het_test ¶

reset_test ¶

vif ¶

hausman_test ¶

anderson_rubin_test ¶

effective_f_test ¶

tF_critical_value ¶

weakrobust ¶

evalue_from_result ¶

evalue_rd ¶

bias_factor ¶

diagnose_result ¶

kitagawa_test ¶

rosenbaum_bounds ¶

`statspai.diagnostics`¶

KitagawaResult `dataclass` ¶

RosenbaumResult `dataclass` ¶