statspai.diagnostics¶
diagnostics ¶
Diagnostics and sensitivity analysis for StatsPAI.
Provides: - Oster (2019) coefficient stability bounds - McCrary (2008) density discontinuity test for RD manipulation
WeakRobustResult ¶
Container holding the unified weak-IV-robust panel.
KitagawaResult
dataclass
¶
Result of the Kitagawa (2015) LATE validity test.
Attributes:
| Name | Type | Description |
|---|---|---|
statistic |
float
|
Test statistic (maximum violation of the inequality conditions). |
p_value |
float
|
Bootstrap p-value. |
first_stage |
float
|
First-stage effect: P(D=1|Z=1) - P(D=1|Z=0). |
n_boot |
int
|
Number of bootstrap replications used. |
n_obs |
int
|
Number of observations. |
max_violation_treated |
float
|
Maximum violation of the treated-complier CDF condition. |
max_violation_untreated |
float
|
Maximum violation of the untreated-complier CDF condition. |
Methods:
| Name | Description |
|---|---|
summary |
Print a formatted summary. |
RosenbaumResult
dataclass
¶
Result container for :func:rosenbaum_bounds.
oster_bounds ¶
oster_bounds(data: Optional[DataFrame] = None, y: Optional[str] = None, treat: Optional[str] = None, controls: Optional[List[str]] = None, r_max: Optional[float] = None, delta: float = 1.0, beta_short: Optional[float] = None, r2_short: Optional[float] = None, beta_long: Optional[float] = None, r2_long: Optional[float] = None, alpha: float = 0.05) -> Dict[str, Any]
Oster (2019) coefficient stability bounds.
Assesses robustness of a treatment effect estimate to omitted variable bias by computing how much unobservable selection (relative to observable selection) would be needed to explain away the result.
Can be called in two ways:
-
From data — runs short and long regressions internally::
result = oster_bounds(data, y='wage', treat='training', controls=['age', 'education'])
-
From pre-computed statistics — supply β and R² directly::
result = oster_bounds(beta_short=2.5, r2_short=0.15, beta_long=2.0, r2_long=0.45)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data (required for method 1). |
None
|
y
|
str
|
Outcome variable. |
None
|
treat
|
str
|
Treatment variable. |
None
|
controls
|
list of str
|
Control variables (included in the long but not short regression). |
None
|
r_max
|
float
|
Maximum R² under full selection. Default: min(1.0, 1.3 × R²_long). |
None
|
delta
|
float
|
Proportionality assumption: ratio of unobservable-to-observable selection. δ=1 means equal selection. |
1.0
|
beta_short
|
float
|
Treatment coefficient from short regression (no controls). |
None
|
r2_short
|
float
|
R² from short regression. |
None
|
beta_long
|
float
|
Treatment coefficient from long regression (with controls). |
None
|
r2_long
|
float
|
R² from long regression. |
None
|
alpha
|
float
|
Significance level for the identified set. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
- |
Examples:
mccrary_test ¶
mccrary_test(data: DataFrame, x: str, c: float = 0, bw: Optional[float] = None, n_bins: Optional[int] = None, alpha: float = 0.05) -> CausalResult
McCrary (2008) density discontinuity test for RD manipulation.
Tests the null hypothesis that the density of the running variable is continuous at the cutoff. A rejection suggests possible manipulation of the running variable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
x
|
str
|
Running variable name. |
required |
c
|
float
|
RD cutoff. |
0
|
bw
|
float
|
Bandwidth for local linear density estimation. Default: automatic (2 × Silverman rule). |
None
|
n_bins
|
int
|
Number of histogram bins per side. Default: ceil(sqrt(n/2)). |
None
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
diagnose ¶
Comprehensive regression diagnostics in one call.
Equivalent to running Stata's estat hettest, estat ovtest,
estat vif after a regression.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Data used in the regression. |
required |
y
|
str
|
Dependent variable name. |
required |
x
|
list of str
|
Independent variable names (excluding constant). |
required |
print_results
|
bool
|
Print formatted output. |
True
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys: |
Examples:
het_test ¶
Breusch-Pagan test for heteroskedasticity.
H0: Homoskedastic errors (constant variance). H1: Variance depends on regressors.
Equivalent to Stata's estat hettest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
as in ``diagnose()``
|
|
required |
y
|
as in ``diagnose()``
|
|
required |
x
|
as in ``diagnose()``
|
|
required |
Returns:
| Type | Description |
|---|---|
dict
|
|
References
Breusch, T.S. and Pagan, A.R. (1979). Econometrica, 47(5), 1287-1294. [@breusch1979simple]
reset_test ¶
Ramsey RESET test for functional form misspecification.
Tests whether nonlinear combinations of fitted values help explain the dependent variable. Rejection suggests the model is missing important nonlinearities.
Equivalent to Stata's estat ovtest.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
as in ``diagnose()``
|
|
required |
y
|
as in ``diagnose()``
|
|
required |
x
|
as in ``diagnose()``
|
|
required |
powers
|
int
|
Include ŷ², ŷ³, ..., ŷ^powers in the auxiliary regression. |
3
|
Returns:
| Type | Description |
|---|---|
dict
|
|
References
Ramsey, J.B. (1969). JRSS-B, 31(2), 350-371. [@ramsey1969tests]
vif ¶
Variance Inflation Factors for multicollinearity detection.
VIF > 10 is a common rule of thumb for concerning multicollinearity.
Equivalent to Stata's estat vif.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
x
|
list of str
|
Independent variables. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: |
Notes
VIF_j = 1 / (1 - R²_j) where R²_j is from regressing x_j on all other x variables.
hausman_test ¶
hausman_test(data: DataFrame, y: str, x: List[str], id: str, time: str, alpha: float = 0.05) -> Dict[str, Any]
Hausman test for FE vs RE in panel data.
Equivalent to Stata's hausman fe re.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
y
|
str
|
Dependent variable. |
required |
x
|
list of str
|
Independent variables (time-varying). |
required |
id
|
str
|
Unit identifier. |
required |
time
|
str
|
Time period identifier. |
required |
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
>>> result = sp.hausman_test(df, y='wage', x=['education', 'experience'],
... id='worker', time='year')
>>> print(f"chi2({result['df']}) = {result['statistic']:.2f}, "
... f"p = {result['pvalue']:.4f}")
>>> print(f"Recommendation: {result['recommendation']}")
Notes
The test statistic is:
.. math:: H = (\hat{\beta}{FE} - \hat{\beta}{RE})' [V(\hat{\beta}{FE}) - V(\hat{\beta}{RE})]^{-1} (\hat{\beta}{FE} - \hat{\beta}{RE})
Under H0, H ~ χ²(k).
- Reject H0 (p < 0.05) → Use Fixed Effects
- Fail to reject → Use Random Effects (more efficient)
See Hausman (1978, Econometrica).
anderson_rubin_test ¶
anderson_rubin_test(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, h0: float = 0, alpha: float = 0.05, vcov: str = 'HC1') -> Dict[str, Any]
Anderson-Rubin (1949) test — size-correct under weak instruments.
Tests H0: β_endog = h0 and constructs a confidence set by
inverting the test over a grid of candidate values. The AR test
has correct size regardless of instrument strength and is the
recommended inference procedure when F_eff < 10.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
endog
|
str
|
Endogenous regressor (single). |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
exog
|
list of str
|
Included exogenous controls. |
None
|
h0
|
float
|
Null hypothesis value for the endogenous coefficient. |
0
|
alpha
|
float
|
Significance level. |
0.05
|
vcov
|
(HC0, HC1, classic)
|
Variance estimator used for the Olea-Pflueger effective F reported alongside AR. |
'HC0'
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
>>> result = sp.anderson_rubin_test(df, y='wage', endog='education',
... instruments=['parent_edu', 'distance'])
>>> print(result['interpretation'])
Notes
Under H0: β = β₀, regress Y - β₀·D on Z and W. The
F-test on Z gives the AR statistic. The confidence set is
constructed by collecting all β₀ values not rejected at level
alpha. If the AR set is unbounded on one or both sides, the
returned (low, high) will be ±inf accordingly.
effective_f_test ¶
effective_f_test(data: DataFrame, endog: str, instruments: List[str], exog: Optional[List[str]] = None, vcov: str = 'HC1') -> Dict[str, Any]
Olea-Pflueger (2013) robust effective F statistic for weak instruments.
Computes the heteroskedasticity-robust effective F that is a
pre-test for the concentration parameter of the first stage. Under
homoskedasticity (vcov='classic') and a single instrument it
reduces exactly to the standard first-stage F.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
endog
|
str
|
Endogenous regressor (single endogenous variable). |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
exog
|
list of str
|
Included exogenous controls (a constant is added automatically). |
None
|
vcov
|
(HC0, HC1, classic)
|
Variance estimator for the first-stage residuals:
|
'HC0'
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Notes
The formula (Andrews-Stock-Sun 2019, eq. 4.13) is
.. math::
F_{\text{eff}} = \frac{\hat\pi' (\tilde Z' \tilde Z)\hat\pi}
{\mathrm{tr}\!\left(\hat\Omega\,(\tilde Z' \tilde Z)^{-1}\right)}
where :math:\tilde Z, \tilde D are residualized after partialling
out the exogenous controls, :math:\hat\pi is the first-stage OLS
coefficient vector, and :math:\hat\Omega = \sum_i \hat\eta_i^2
\tilde z_i \tilde z_i' is the HC meat.
Under homoskedasticity :math:\hat\Omega \approx \hat\sigma_\eta^2
\tilde Z'\tilde Z, so the trace collapses to :math:k_z \hat
\sigma_\eta^2 and :math:F_{\text{eff}} reduces to the
first-stage F.
Examples:
tF_critical_value ¶
Lee–McCrary–Moreira–Porter (2022, AER) tF adjusted critical value.
Returns the adjusted two-sided t-ratio critical value c(F) such
that |t| > c(F) is a valid 1 - alpha test of the 2SLS
coefficient, robust to weak instruments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
first_stage_F
|
float
|
Observed first-stage F statistic (or Olea-Pflueger F_eff). |
required |
alpha
|
float
|
Significance level. Only |
0.05
|
Returns:
| Type | Description |
|---|---|
float
|
Adjusted critical value. Returns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
The LMMP tF procedure gives exactly correct 5 % size for any
first-stage strength. The standard 1.96 critical value
over-rejects substantially when F < 104.7 (the value at which
c = 1.96).
Examples:
weakrobust ¶
weakrobust(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, *, h0: float = 0.0, alpha: float = 0.05, vcov: str = 'HC1', include_clr: bool = True, include_k: bool = True, clr_simulations: int = 20000, grid_size: int = 401, random_state: Optional[int] = None) -> WeakRobustResult
Stata-style unified weak-instrument-robust diagnostic panel.
Bundles in a single call:
- Classical first-stage F and Olea–Pflueger (2013) effective F — instrument-strength pre-tests.
- Kleibergen–Paap (2006) rk LM and Wald F — rank test of the reduced form, heteroskedasticity-robust Cragg-Donald analogue.
- Anderson–Rubin (1949) F test of
H0: β_endog = h0and its weak-IV-robust confidence set. - Moreira (2003) Conditional LR (CLR) test and CLR confidence set by grid inversion (uniformly most powerful invariant for a single endogenous regressor).
- Kleibergen (2002) K score test and K confidence set.
- Lee–McCrary–Moreira–Porter (2022) tF adjusted critical value.
This entry point is the Python analogue of Stata 19's
ivregress 2sls; estat weakrobust and weakiv / ivreg2 in
Stata 17+, unifying tools that are scattered across ivmodel (R),
linearmodels (Python), and the user-written packages weakiv,
rivtest (Stata).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome and single endogenous regressor (column names in |
required |
endog
|
str
|
Outcome and single endogenous regressor (column names in |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
exog
|
list of str
|
Included exogenous controls. An intercept is always added. |
None
|
h0
|
float
|
Null value for the endogenous coefficient. All under-H0 tests
(AR, CLR, K) are evaluated at |
0.0
|
alpha
|
float
|
Significance level for the robust confidence sets. |
0.05
|
vcov
|
(HC0, HC1, classic)
|
Used by the Olea–Pflueger effective F. |
'HC0'
|
include_clr
|
bool
|
Also run the CLR test and invert it for a CLR confidence set. |
True
|
include_k
|
bool
|
Also run the Kleibergen K score test and K confidence set. |
True
|
clr_simulations
|
int
|
Monte-Carlo draws for the CLR null distribution. |
20 000
|
grid_size
|
int
|
Grid resolution used by AR/CLR/K confidence-set inversion. |
401
|
random_state
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
WeakRobustResult
|
Accepts |
Examples:
>>> panel = sp.weakrobust(df, y='wage', endog='educ',
... instruments=['nearc2','nearc4'],
... exog=['age','exper'])
>>> print(panel.summary())
>>> panel.to_frame()
References
Anderson–Rubin (1949) Ann. Math. Stat. 20, 46-63. Kleibergen (2002) Econometrica 70, 1781-1803. Moreira (2003) Econometrica 71, 1027-1048. Kleibergen–Paap (2006) J. Econom. 133, 97-126. Olea–Pflueger (2013) JBES 31, 358-369. Lee–McCrary–Moreira–Porter (2022) AER 112, 3260-3290. [@anderson1949estimation]
evalue_from_result ¶
evalue_from_result(result, measure: str = 'SMD', rare: Optional[bool] = None, rare_outcome: Optional[bool] = None, true: Optional[float] = None) -> Dict[str, Any]
Compute an E-value from a StatsPAI CausalResult object.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
CausalResult
|
Result from any StatsPAI causal estimator exposing a scalar
|
required |
measure
|
str
|
How to interpret |
'SMD'
|
rare
|
Optional[bool]
|
Passed through to :func: |
None
|
rare_outcome
|
Optional[bool]
|
Passed through to :func: |
None
|
true
|
Optional[bool]
|
Passed through to :func: |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
Same structure as :func: |
evalue_rd ¶
evalue_rd(n11: float, n10: float, n01: float, n00: float, true: float = 0.0, alpha: float = 0.05, grid: float = 0.0001) -> Dict[str, Any]
Exact E-value for a risk difference from a 2x2 table.
Implements the Ding & VanderWeele (2016) risk-difference E-value,
a faithful port of EValue::evalues.RD. The exposure must be
coded so that the risk difference is non-negative.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n11
|
float
|
Exposed cases and exposed non-cases. |
required |
n10
|
float
|
Exposed cases and exposed non-cases. |
required |
n01
|
float
|
Unexposed cases and unexposed non-cases. |
required |
n00
|
float
|
Unexposed cases and unexposed non-cases. |
required |
true
|
float
|
Reference risk difference (must be <= the observed RD). |
0.0
|
alpha
|
float
|
Significance level for the confidence-limit E-value. |
0.05
|
grid
|
float
|
Step size of the bias-factor grid search for the CI E-value. |
1e-4
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
bias_factor ¶
Confounding bias factor B (Ding & VanderWeele 2016).
Given the maximum risk ratios of an unmeasured confounder U with
the exposure (rr_eu) and with the outcome (rr_ud), the
factor by which they can bias a risk ratio is
B = (rr_eu * rr_ud) / (rr_eu + rr_ud - 1).
The E-value is the value e such that bias_factor(e, e) equals
the observed risk ratio.
diagnose_result ¶
One-call diagnostic battery, auto-selected by method type.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
EconometricResults or CausalResult
|
A fitted result from any StatsPAI estimator. |
required |
print_results
|
bool
|
If True, print formatted diagnostics to stdout. |
True
|
alpha
|
float
|
Significance level for tests. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys depend on the method. Always includes |
kitagawa_test ¶
kitagawa_test(data: DataFrame, y: str, treatment: str, instrument: str, n_boot: int = 1000, n_grid: int = 100, seed: Optional[int] = None) -> KitagawaResult
Kitagawa (2015) specification test for the validity of LATE.
Tests whether the data are consistent with the LATE assumptions (independence, exclusion restriction, monotonicity) by checking that the implied complier potential-outcome CDFs are proper distribution functions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable name. |
required |
treatment
|
str
|
Endogenous binary treatment variable (D). |
required |
instrument
|
str
|
Binary instrument variable (Z). |
required |
n_boot
|
int
|
Number of bootstrap replications for the p-value. |
1000
|
n_grid
|
int
|
Number of grid points for evaluating the CDF conditions. |
100
|
seed
|
int
|
Random seed for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
KitagawaResult
|
Test results including statistic, p-value, and first-stage effect. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If instrument or treatment are not binary, or first stage is zero. |
Examples:
rosenbaum_bounds ¶
rosenbaum_bounds(treated: Optional[Sequence[float]] = None, control: Optional[Sequence[float]] = None, *, data: Optional[DataFrame] = None, y: Optional[str] = None, treat: Optional[str] = None, pair_id: Optional[str] = None, method: str = 'wilcoxon', alternative: str = 'greater', gamma_grid: Optional[Sequence[float]] = None, alpha: float = 0.05) -> RosenbaumResult
Compute Rosenbaum bounds on a paired observational study.
You can pass either two parallel arrays (treated, control) of
matched outcomes, or a long-format DataFrame with columns
y, treat, pair_id.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
treated
|
array - like
|
Outcome in the treated / control unit of each matched pair
(same length). Ignored if |
None
|
control
|
array - like
|
Outcome in the treated / control unit of each matched pair
(same length). Ignored if |
None
|
data
|
DataFrame
|
Long-format data. Must contain exactly two rows per |
None
|
method
|
('wilcoxon', 'sign')
|
Wilcoxon signed-rank bound (continuous) or binomial sign test (robust / binary). |
"wilcoxon"
|
alternative
|
('greater', 'less', 'two-sided')
|
Direction of the alternative hypothesis for the treatment effect. |
"greater"
|
gamma_grid
|
sequence of float
|
Gamma values (>= 1) over which to compute bounding p-values.
Default: |
None
|
alpha
|
float
|
Significance level used to report |
0.05
|
Returns:
| Type | Description |
|---|---|
RosenbaumResult
|
|