statspai.iv¶
iv ¶
statspai.iv — the unified Instrumental Variables namespace.
The goal of this subpackage is to be the single entry point for every IV-flavoured workflow in StatsPAI, regardless of which sub-module the underlying implementation lives in.
The subpackage itself is callable::
sp.iv("y ~ (d ~ z) + x", data=df) # 2SLS (default)
sp.iv("y ~ (d ~ z) + x", data=df, method="liml") # LIML
sp.iv(method="kernel", y=..., endog=..., instruments=..., data=df)
sp.iv.fit(...) # equivalent (fit = _dispatch alias)
sp.iv.kernel_iv(...) # individual estimators still reachable
That callable is provided by a tiny ModuleType subclass installed via
sys.modules[__name__].__class__ (the standard PEP 562-style trick).
Sub-method coverage (method= keyword, all aliases lowercased):
- K-class formula path (
regression.iv):2sls/tsls/iv,liml,fuller,gmm,jive. - Modern JIVE variants (
iv.jive_variants):jive1,ujive,ijive,rjive. - Many-weak (
iv.many_weak):jive_mw,many_weak_ar. - Lasso / post-Lasso:
lasso(regression.advanced_iv.lasso_iv),post_lasso/bch(iv.post_lasso.bch_post_lasso_iv). - ML / nonparametric:
kernel(iv.kernel_iv),npiv(iv.npiv),ivdml(iv.ivdml),deepiv(deepiv.deepiv, optional). - Bayesian (
iv.bayesian_iv):bayes/bayesian. - LATE / MTE:
continuous_late(iv.continuous_late),mte(iv.mte),ivmte_bounds(iv.ivmte_lp). - Quantile IV (
regression.iv_quantile):ivqreg/quantile. - Plausibly exogenous sensitivity (
iv.plausibly_exogenous):plausibly_exog_uci/plausibly_exog_ltz. - Shift-share (
bartik):shift_share/bartik.
Diagnostics (anderson_rubin_test, effective_f_test,
kleibergen_paap_rk, sanderson_windmeijer, conditional_lr_test)
remain standalone — they are not estimators and intentionally do not show
up in the method= table.
Examples:
>>> import statspai as sp
>>> # Standard 2SLS with a rich diagnostic panel
>>> res = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df)
>>> print(res.summary())
>>> print(res.diagnostics) # MOP F, KP rk, SW, AR CI
>>> # Sensitivity to exclusion-restriction violations
>>> chr = sp.iv(
... method="plausibly_exog_ltz",
... y="y", endog="d", instruments=["z1", "z2"],
... gamma_mean=0.0, gamma_var=0.01, data=df,
... )
>>> # Marginal treatment effects
>>> m = sp.iv(method="mte",
... y="y", endog="d", instruments=["z"], exog=["x"], data=df)
IVRegression ¶
Bases: BaseModel
Instrumental Variables regression model.
Supports multiple estimation methods via method parameter:
'2sls', 'liml', 'fuller', 'gmm', 'jive'.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
Formula with IV syntax: |
None
|
data
|
DataFrame
|
|
None
|
method
|
str
|
Estimation method. |
'2sls'
|
fuller_alpha
|
float
|
Fuller constant (only used when method='fuller'). |
1.0
|
y
|
array - like
|
Alternative to formula interface. |
None
|
X_exog
|
array - like
|
Alternative to formula interface. |
None
|
X_endog
|
array - like
|
Alternative to formula interface. |
None
|
Z
|
array - like
|
Alternative to formula interface. |
None
|
var_names
|
array - like
|
Alternative to formula interface. |
None
|
first_stage
property
¶
First-stage diagnostics for each endogenous variable.
sargan_test
property
¶
Sargan/Hansen J overidentification test results.
fit ¶
fit(robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> EconometricResults
Fit the IV model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
robust
|
str
|
Standard-error type ('nonrobust', 'hc0', 'hc1', 'hc2', 'hc3'). |
'nonrobust'
|
cluster
|
str
|
Variable name for clustering. |
None
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
predict ¶
Generate predictions from the fitted IV model.
For a structural-form estimator, the natural forecast of y given
new data is X_exog β_exog + X_endog β_endog — i.e. we plug
observed values of the endogenous variables through the structural
equation. Instruments are not used at prediction time.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
New data at which to predict. Must contain all exogenous and
endogenous variables referenced by the model's formula. If
|
None
|
KleibergenPaapResult
dataclass
¶
Container for Kleibergen-Paap rk test output.
SandersonWindmeijerResult
dataclass
¶
Sanderson-Windmeijer conditional F for each endogenous variable.
CLRResult
dataclass
¶
Moreira (2003) CLR test.
MTEResult
dataclass
¶
Marginal treatment effects result.
WeakIVConfidenceSet
dataclass
¶
as_intervals ¶
Return the CI as a list of (lo, hi) intervals (handles disconnection).
PostLassoResult
dataclass
¶
Return of :func:bch_post_lasso_iv.
BayesianIVResult
dataclass
¶
Posterior from Bayesian IV.
NPIVResult
dataclass
¶
Nonparametric IV estimation result.
KernelIVResult
dataclass
¶
Output of kernel IV regression.
ContinuousLATEResult
dataclass
¶
Continuous-instrument LATE on the maximal complier class.
IVDMLResult
dataclass
¶
Output of IV × DML.
IVDiagResult
dataclass
¶
Container for :func:iv_diag output.
Attributes:
| Name | Type | Description |
|---|---|---|
n |
int
|
Sample size after listwise deletion. |
n_endog, n_instruments, n_exog |
int
|
Counts of endogenous regressors, excluded instruments, included exogenous controls (excluding intercept). |
endog, instruments, exog |
list[str]
|
Variable names. |
beta_2sls, se_2sls, t_2sls, p_2sls |
float
|
2SLS point estimate, analytic SE, t-ratio, p-value (single endogenous regressor). |
ci_analytic_2sls |
tuple[float, float]
|
Analytic Wald CI based on |
beta_ols, se_ols, ci_ols |
(float, float, tuple[float, float])
|
OLS counterpart (informative comparator; not causal). |
first_stage_F |
float
|
Classical first-stage F. |
effective_F |
float
|
Olea–Pflueger (2013) robust effective F. |
tF_critical_value |
float
|
Lee et al. (2022, AER) tF adjusted 5 % critical value at the
observed first-stage F. |
ar_stat, ar_pvalue |
float
|
Anderson–Rubin (1949) F-statistic and p-value at |
ar_ci |
tuple[float, float]
|
AR confidence set (grid-inverted). |
clr_ci, k_ci |
tuple[float, float] | None
|
Moreira (2003) CLR and Kleibergen (2002) K confidence sets.
|
kp_rk_lm, kp_rk_lm_pvalue, kp_rk_f |
float | None
|
Kleibergen–Paap (2006) rk LM, p-value, Wald F. |
bootstrap_ci_analytic, bootstrap_ci_pairs, bootstrap_ci_wild |
Pair-/wild-bootstrap CI tuples (or |
|
bootstrap_se_pairs, bootstrap_se_wild |
float | None
|
Bootstrap standard errors (matching CI sources). |
bootstrap_n |
int
|
Number of bootstrap replications actually used. |
ltz_ci, ltz_warning |
(tuple[float, float] | None, str | None)
|
Conley–Hansen–Rossi LTZ sensitivity CI under
|
tF_adjusted_ci |
tuple[float, float] | None
|
|
tsls_late_caveat |
str | None
|
BBMT (2022/2025) / Słoczyński (2024) caveat text whenever the specification is at risk of negative-weight LATE pathologies. |
diagnostics |
dict
|
All numeric outputs in a flat dict (also returned by
:meth: |
raw |
dict
|
Internal scratchpad with arrays (residuals, fitted values). |
to_latex ¶
to_latex(caption: Optional[str] = None, label: Optional[str] = None, float_format: str = '%.4f') -> str
Render the summary table as a LaTeX tabular string.
to_word ¶
Write the summary table to a .docx file (requires python-docx).
plot ¶
Dispatch to :mod:statspai.iv.plot plotting helpers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
('diagnostic', 'forest', 'weak_iv', 'first_stage')
|
Which plot to render. |
'diagnostic'
|
iv ¶
iv(formula: Optional[str] = None, data: Optional[DataFrame] = None, method: str = '2sls', robust: str = 'nonrobust', cluster: Optional[str] = None, fuller_alpha: float = 1.0, absorb: Optional[Union[str, List[str]]] = None, **kwargs) -> EconometricResults
Unified instrumental variables estimation.
Supports multiple methods through the method parameter:
'2sls'— Two-Stage Least Squares (default).'liml'— Limited Information Maximum Likelihood. Better finite- sample properties under weak instruments; approximately median-unbiased.'fuller'— Fuller (1977) modified LIML with finite-sample bias correction.fuller_alpha=1removes first-order bias;fuller_alpha=4minimises MSE under normality.'gmm'— Efficient two-step GMM. More efficient than 2SLS under heteroskedasticity when over-identified.'jive'— Jackknife IV (Angrist, Imbens & Krueger 1999). Reduces many-instrument bias by using leave-one-out fitted values.
For DeepIV (neural network IV) use sp.deepiv().
For Bartik shift-share IV use sp.bartik().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula:
|
None
|
data
|
DataFrame
|
Data containing all variables. |
None
|
method
|
str
|
Estimation method: '2sls', 'liml', 'fuller', 'gmm', 'jive'. |
'2sls'
|
robust
|
str
|
Standard-error type ('nonrobust', 'hc0', 'hc1', 'hc2', 'hc3'). |
'nonrobust'
|
cluster
|
str
|
Variable name for clustered standard errors. |
None
|
fuller_alpha
|
float
|
Fuller modification constant (only used when |
1.0
|
absorb
|
str or list of str
|
Column name(s) of high-dimensional fixed effects to partial out
before fitting (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
Fitted model results with integrated IV diagnostics:
|
Examples:
>>> # Standard 2SLS
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df)
>>> print(result.summary())
>>> # LIML (better with weak instruments)
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df, method='liml')
>>> # Fuller with bias correction
>>> result = sp.iv("wage ~ (education ~ parent_edu) + experience",
... data=df, method='fuller', fuller_alpha=1)
>>> # Efficient GMM with robust SEs
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df, method='gmm', robust='hc1')
>>> # JIVE (many instruments)
>>> result = sp.iv("wage ~ (education ~ z1 + z2 + z3 + z4 + z5) + experience",
... data=df, method='jive')
Notes
Which method to choose?
- Start with
'2sls'. If first-stage F < 10, switch to'liml'or'fuller'. - If you have many instruments (m >> k₂) and worry about bias, use
'jive'or'liml'. - If over-identified and you suspect heteroskedasticity, use
'gmm'for efficiency. - For nonparametric / ML-based IV, see
sp.deepiv().
Diagnostics included automatically:
- First-stage F < 10 triggers a weak-instrument warning.
- Sargan test (2SLS/LIML/Fuller/JIVE) or Hansen J (GMM) for overidentification.
- Durbin-Wu-Hausman test for endogeneity.
References
- Wooldridge (2010), Ch. 5-8.
- Stock & Yogo (2005), for weak-instrument critical values.
- Fuller (1977), for the finite-sample correction.
- Hansen (1982), for GMM.
- Angrist, Imbens & Krueger (1999), for JIVE.
ivreg ¶
ivreg(formula: str, data: DataFrame, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> EconometricResults
Instrumental variables regression (2SLS).
.. deprecated::
Use sp.iv(formula, data, method='2sls') instead.
ivreg is kept for backward compatibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula: |
required |
data
|
DataFrame
|
|
required |
robust
|
str
|
|
'nonrobust'
|
cluster
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
liml ¶
liml(formula: str = None, data: DataFrame = None, y: str = None, x_endog: List[str] = None, x_exog: List[str] = None, z: List[str] = None, robust: str = 'nonrobust', cluster: str = None, fuller: float = None, alpha: float = 0.05) -> EconometricResults
Limited Information Maximum Likelihood (LIML) estimator.
More robust to weak instruments than 2SLS. Fuller's modification provides improved finite-sample properties.
Equivalent to Stata's ivregress liml y (x_endog = z) x_exog.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
Formula: "y ~ x_exog | x_endog | z" or "y ~ x_exog + (x_endog ~ z)". |
None
|
data
|
DataFrame
|
|
None
|
y
|
str
|
Outcome variable. |
None
|
x_endog
|
list of str
|
Endogenous regressors. |
None
|
x_exog
|
list of str
|
Exogenous regressors (included instruments). |
None
|
z
|
list of str
|
Excluded instruments. |
None
|
robust
|
str
|
|
'nonrobust'
|
cluster
|
str
|
|
None
|
fuller
|
float
|
Fuller's constant (typically 1 or 4). If None, pure LIML. |
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
Examples:
jive_legacy ¶
jive_legacy(data: DataFrame, y: str, x_endog: List[str], x_exog: List[str] = None, z: List[str] = None, robust: str = 'nonrobust', cluster: str = None, variant: str = 'jive1', alpha: float = 0.05) -> EconometricResults
Jackknife Instrumental Variables Estimation (JIVE).
Reduces finite-sample bias from many instruments by using leave-one-out fitted values as instruments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x_endog
|
list of str
|
|
required |
x_exog
|
list of str
|
|
None
|
z
|
list of str
|
|
None
|
robust
|
str
|
|
'nonrobust'
|
cluster
|
str
|
|
None
|
variant
|
str
|
'jive1' (Angrist et al. 1999) or 'jive2' (alternative). |
'jive1'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
lasso_iv ¶
lasso_iv(data: DataFrame, y: str, x_endog: List[str], x_exog: List[str] = None, z: List[str] = None, robust: str = 'robust', cluster: str = None, penalty: str = 'bic', alpha: float = 0.05) -> EconometricResults
LASSO-selected instrumental variables.
Uses LASSO to select relevant instruments from a large set, then estimates IV/2SLS with selected instruments. Belloni, Chen, Chernozhukov & Hansen (2012).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x_endog
|
list of str
|
|
required |
x_exog
|
list of str
|
|
None
|
z
|
list of str
|
Full set of candidate instruments. |
None
|
robust
|
str
|
|
'robust'
|
cluster
|
str
|
|
None
|
penalty
|
str
|
Instrument selection criterion: 'bic', 'aic', 'cv'. |
'bic'
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
Examples:
anderson_rubin_test ¶
anderson_rubin_test(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, h0: float = 0, alpha: float = 0.05, vcov: str = 'HC1') -> Dict[str, Any]
Anderson-Rubin (1949) test — size-correct under weak instruments.
Tests H0: β_endog = h0 and constructs a confidence set by
inverting the test over a grid of candidate values. The AR test
has correct size regardless of instrument strength and is the
recommended inference procedure when F_eff < 10.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
endog
|
str
|
Endogenous regressor (single). |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
exog
|
list of str
|
Included exogenous controls. |
None
|
h0
|
float
|
Null hypothesis value for the endogenous coefficient. |
0
|
alpha
|
float
|
Significance level. |
0.05
|
vcov
|
(HC0, HC1, classic)
|
Variance estimator used for the Olea-Pflueger effective F reported alongside AR. |
'HC0'
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
>>> result = sp.anderson_rubin_test(df, y='wage', endog='education',
... instruments=['parent_edu', 'distance'])
>>> print(result['interpretation'])
Notes
Under H0: β = β₀, regress Y - β₀·D on Z and W. The
F-test on Z gives the AR statistic. The confidence set is
constructed by collecting all β₀ values not rejected at level
alpha. If the AR set is unbounded on one or both sides, the
returned (low, high) will be ±inf accordingly.
effective_f_test ¶
effective_f_test(data: DataFrame, endog: str, instruments: List[str], exog: Optional[List[str]] = None, vcov: str = 'HC1') -> Dict[str, Any]
Olea-Pflueger (2013) robust effective F statistic for weak instruments.
Computes the heteroskedasticity-robust effective F that is a
pre-test for the concentration parameter of the first stage. Under
homoskedasticity (vcov='classic') and a single instrument it
reduces exactly to the standard first-stage F.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
endog
|
str
|
Endogenous regressor (single endogenous variable). |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
exog
|
list of str
|
Included exogenous controls (a constant is added automatically). |
None
|
vcov
|
(HC0, HC1, classic)
|
Variance estimator for the first-stage residuals:
|
'HC0'
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Notes
The formula (Andrews-Stock-Sun 2019, eq. 4.13) is
.. math::
F_{\text{eff}} = \frac{\hat\pi' (\tilde Z' \tilde Z)\hat\pi}
{\mathrm{tr}\!\left(\hat\Omega\,(\tilde Z' \tilde Z)^{-1}\right)}
where :math:\tilde Z, \tilde D are residualized after partialling
out the exogenous controls, :math:\hat\pi is the first-stage OLS
coefficient vector, and :math:\hat\Omega = \sum_i \hat\eta_i^2
\tilde z_i \tilde z_i' is the HC meat.
Under homoskedasticity :math:\hat\Omega \approx \hat\sigma_\eta^2
\tilde Z'\tilde Z, so the trace collapses to :math:k_z \hat
\sigma_\eta^2 and :math:F_{\text{eff}} reduces to the
first-stage F.
Examples:
tF_critical_value ¶
Lee–McCrary–Moreira–Porter (2022, AER) tF adjusted critical value.
Returns the adjusted two-sided t-ratio critical value c(F) such
that |t| > c(F) is a valid 1 - alpha test of the 2SLS
coefficient, robust to weak instruments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
first_stage_F
|
float
|
Observed first-stage F statistic (or Olea-Pflueger F_eff). |
required |
alpha
|
float
|
Significance level. Only |
0.05
|
Returns:
| Type | Description |
|---|---|
float
|
Adjusted critical value. Returns |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
The LMMP tF procedure gives exactly correct 5 % size for any
first-stage strength. The standard 1.96 critical value
over-rejects substantially when F < 104.7 (the value at which
c = 1.96).
Examples:
kleibergen_paap_rk ¶
kleibergen_paap_rk(endog: Union[ndarray, DataFrame], instruments: Union[ndarray, DataFrame], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, cov_type: str = 'robust', cluster: Optional[Union[ndarray, Series]] = None, add_const: bool = True) -> KleibergenPaapResult
Kleibergen-Paap (2006) rk Wald / LM statistic.
Tests the null that the reduced-form coefficient matrix on the excluded
instruments has rank n_endog - 1 (under-identification) against the
alternative of full rank.
This is the heteroskedasticity- and cluster-robust generalisation of the
classical Cragg-Donald statistic. ivreg2 in Stata reports the
identical statistic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endog
|
(array or DataFrame, shape(n, p))
|
Endogenous regressors. |
required |
instruments
|
(array or DataFrame, shape(n, k))
|
Excluded instruments ( |
required |
exog
|
array, DataFrame or list of column names
|
Included exogenous regressors (controls). Intercept is added
automatically when |
None
|
data
|
DataFrame
|
Used only when |
None
|
cov_type
|
('nonrobust', 'robust', 'cluster')
|
Covariance for the stacked reduced-form equations. |
'nonrobust'
|
cluster
|
array - like
|
Required when |
None
|
add_const
|
bool
|
Prepend a constant to the exogenous block. |
True
|
Returns:
| Type | Description |
|---|---|
KleibergenPaapResult
|
|
sanderson_windmeijer ¶
sanderson_windmeijer(endog: Union[ndarray, DataFrame], instruments: Union[ndarray, DataFrame], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, endog_names: Optional[List[str]] = None) -> SandersonWindmeijerResult
Sanderson-Windmeijer (2016) conditional first-stage F.
For each endogenous regressor j, residualises all other
endogenous regressors out of both the outcome (that endog column) and
the instruments, then reports the first-stage F of the resulting
partial regression. This is the correct individual-endogenous weak-IV
diagnostic when multiple endogenous regressors are present.
When only one endogenous regressor is present, this reduces exactly to the standard first-stage F.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endog
|
(array or DataFrame, shape(n, p))
|
|
required |
instruments
|
(array or DataFrame, shape(n, k))
|
|
required |
exog
|
array, DataFrame or list of column names
|
|
None
|
data
|
DataFrame
|
|
None
|
add_const
|
bool
|
|
True
|
endog_names
|
list of str
|
Labels for endogenous columns when passing numpy arrays. |
None
|
Returns:
| Type | Description |
|---|---|
SandersonWindmeijerResult
|
|
conditional_lr_test ¶
conditional_lr_test(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, beta0: float = 0.0, add_const: bool = True, n_simulations: int = 20000, random_state: Optional[int] = None) -> CLRResult
Moreira (2003) Conditional Likelihood Ratio (CLR) test.
Tests H0: beta = beta0 in a single-endogenous-variable IV model.
Weak-IV-robust and uniformly most powerful invariant in the one
endogenous-variable case.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array, Series or column name
|
Outcome and the single endogenous regressor. |
required |
endog
|
array, Series or column name
|
Outcome and the single endogenous regressor. |
required |
instruments
|
array, DataFrame or list of column names
|
|
required |
exog
|
array, DataFrame or list of column names
|
|
None
|
data
|
DataFrame
|
|
None
|
beta0
|
float
|
Null-hypothesis value of beta on |
0.0
|
add_const
|
bool
|
|
True
|
n_simulations
|
int
|
Monte-Carlo draws for the conditional critical value. |
20000
|
random_state
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
CLRResult
|
|
plausibly_exogenous_uci ¶
plausibly_exogenous_uci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], gamma_grid: Union[ndarray, Iterable[float]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, ci_level: float = 0.95) -> PlausiblyExogenousResult
Union-of-CIs (UCI) plausibly exogenous bounds (Conley, Hansen, Rossi 2012).
For each candidate γ in gamma_grid (one value per instrument), the
model y = D*beta + Z*gamma + W*alpha + u is estimated by 2SLS after
subtracting Z @ gamma from y; the final confidence set for
beta is the union over γ of the per-γ 2SLS CIs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array, Series or column name
|
Outcome and a single endogenous regressor. |
required |
endog
|
array, Series or column name
|
Outcome and a single endogenous regressor. |
required |
instruments
|
array, DataFrame or list of str
|
|
required |
gamma_grid
|
array - like
|
Candidate direct-effect vectors. If |
required |
exog
|
usual options.
|
|
None
|
data
|
usual options.
|
|
None
|
add_const
|
usual options.
|
|
None
|
ci_level
|
usual options.
|
|
None
|
Returns:
| Type | Description |
|---|---|
PlausiblyExogenousResult
|
|
plausibly_exogenous_ltz ¶
plausibly_exogenous_ltz(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], gamma_mean: Union[float, ndarray] = 0.0, gamma_var: Union[float, ndarray] = 0.0, exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, ci_level: float = 0.95) -> PlausiblyExogenousResult
Local-to-zero (LTZ) plausibly exogenous (Conley, Hansen, Rossi 2012).
Places a Gaussian prior on γ with mean gamma_mean and covariance
gamma_var * I (scalar) or the matrix gamma_var, then integrates
it out to yield a closed-form adjustment to β's asymptotic variance:
beta_LTZ = beta_2SLS - A * gamma_mean
Var(beta) = Var_2SLS + A * Omega * A'
where A = (X'P_Z X)^{-1} X'P_Z Z restricted to β's row.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
usual.
|
|
required |
endog
|
usual.
|
|
required |
instruments
|
usual.
|
|
required |
exog
|
usual.
|
|
required |
data
|
usual.
|
|
required |
add_const
|
usual.
|
|
required |
gamma_mean
|
float or array of shape (k,)
|
|
0.0
|
gamma_var
|
float or array of shape (k, k)
|
Prior variance. |
0.0
|
ci_level
|
float
|
|
0.95
|
Returns:
| Type | Description |
|---|---|
PlausiblyExogenousResult
|
|
jive1 ¶
Angrist-Imbens-Krueger (1999) JIVE1.
ijive ¶
Ackerberg-Devereux (2009) IJIVE.
rjive ¶
Hansen-Kozbur (2014) Ridge JIVE.
ivmte_bounds ¶
ivmte_bounds(y: Union[ndarray, Series, str], treatment: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, target: str = 'ate', late_bounds: Optional[Tuple[float, float]] = None, policy_prob: Optional[ndarray] = None, basis_degree: int = 3, n_propensity_bins: int = 8, bounds_outcome: Optional[Tuple[float, float]] = None, decreasing_mte: bool = False, add_const: bool = True, include_bmw_point: bool = True) -> IVMTEBounds
Sharp identified bounds for an MTE-type target parameter — MST (2018).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
usual IV arguments; ``treatment``
|
must be binary. |
required |
treatment
|
usual IV arguments; ``treatment``
|
must be binary. |
required |
instruments
|
usual IV arguments; ``treatment``
|
must be binary. |
required |
exog
|
usual IV arguments; ``treatment``
|
must be binary. |
required |
data
|
usual IV arguments; ``treatment``
|
must be binary. |
required |
target
|
('ate', 'att', 'atu', 'late', 'prte')
|
|
'ate'
|
late_bounds
|
(p_lo, p_hi), only for ``target='late'``.
|
|
None
|
policy_prob
|
new propensity realisation, only for ``target='prte'``.
|
|
None
|
basis_degree
|
polynomial order K for the MTR basis.
|
|
3
|
n_propensity_bins
|
number of propensity-score cells used for the
|
reduced-form IV moments. |
8
|
bounds_outcome
|
(lo, hi) box-constraint on the MTR functions.
|
If your outcome is in [0, 1] (e.g. employment), pass (0, 1). |
None
|
decreasing_mte
|
if True, impose non-increasing MTE (Heckman-Vytlacil).
|
|
False
|
include_bmw_point
|
also run :func:`sp.iv.mte` with the same basis
|
and return the point estimate for side-by-side reporting. |
True
|
Returns:
| Type | Description |
|---|---|
IVMTEBounds
|
|
anderson_rubin_ci ¶
anderson_rubin_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 401, beta_grid: Optional[ndarray] = None, add_const: bool = True) -> WeakIVConfidenceSet
Anderson-Rubin (1949) confidence set by grid inversion.
For each candidate β₀, compute the AR F-statistic
AR(β₀) = (u₀' P_Z u₀ / k) / (u₀' M_Z u₀ / (n - k - kW))
u₀ = y - β₀ · d (partialled out of exogenous controls)
and include β₀ in the CI whenever AR(β₀) ≤ F_{k, n-k-kW}^{1-α}.
Valid under any instrument strength. Under weak identification the set can be disconnected or unbounded — we flag both.
conditional_lr_ci ¶
conditional_lr_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 201, beta_grid: Optional[ndarray] = None, n_sim: int = 5000, add_const: bool = True, random_state: Optional[int] = None) -> WeakIVConfidenceSet
Moreira (2003) CLR confidence set by grid inversion.
At each candidate β₀, compute the CLR statistic and its conditional
critical value via Monte-Carlo given T'T; include β₀ iff
CLR(β₀) ≤ c(T'T, 1-α).
Uniformly most powerful invariant under normal errors with a single endogenous regressor. Tight under strong ID, wide under weak ID.
k_test_ci ¶
k_test_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 401, beta_grid: Optional[ndarray] = None, add_const: bool = True) -> WeakIVConfidenceSet
Kleibergen (2002) K-test confidence set by grid inversion.
The K-statistic projects the AR score onto the (estimated) score direction of β, giving a 1-df χ²-valued pivot even under weak ID.
K(β₀) = n · (score_β|β₀)² / var
≈ (S'T)² / (T'T)
where S and T are the AR score and "T-statistic" from Moreira (2003). Faster than CLR but slightly less powerful; still weak-IV-robust.
bch_post_lasso_iv ¶
bch_post_lasso_iv(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, alpha: float = 0.05, c: float = 1.1, add_const: bool = True, robust: bool = True, ensure_min_instruments: int = 1) -> PostLassoResult
Post-Lasso 2SLS with rigorous, data-driven penalty.
Recipe (BCH 2012 §3):
- Partial out controls
exogfromy,endog, and every column ofinstruments. - Select relevant instruments by LASSO with rigorous penalty
λ = 2 c √{2 n log(2 p / α)}and iterated heteroskedastic loadings (Algorithm 1). - If fewer than
ensure_min_instrumentssurvive, add the instruments with largest univariate first-stage t-stat. - Re-estimate the first stage by OLS on the selected subset — post-Lasso removes shrinkage bias.
- Plug the post-Lasso fitted value into 2SLS for β̂.
- Heteroskedasticity-robust (HC1) SEs by default.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
outcome and the single endogenous regressor.
|
|
required |
endog
|
outcome and the single endogenous regressor.
|
|
required |
instruments
|
(many) candidate instruments — p may exceed n (the
|
method shines precisely there). |
required |
exog
|
controls (default: intercept only).
|
|
None
|
data
|
DataFrame for string-name inputs.
|
|
None
|
alpha
|
penalty confidence level.
|
|
0.05
|
c
|
slack constant (BCH default 1.1).
|
|
1.1
|
add_const
|
whether to include a constant in the exogenous block.
|
|
True
|
robust
|
use HC1 standard errors (default True).
|
|
True
|
ensure_min_instruments
|
if LASSO selects 0, force this many strong ones in.
|
|
1
|
Returns:
| Type | Description |
|---|---|
PostLassoResult
|
|
bch_lambda ¶
BCH (2012) rigorous penalty level: λ = 2 · c · √{2 n · log(2 p / α)}.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n
|
int
|
|
required |
p
|
int
|
Number of candidate instruments. |
required |
alpha
|
float
|
Target confidence level (BCH recommend 0.05 / log(n)). |
0.05
|
c
|
float
|
Slack constant (BCH 2012 recommend 1.1). |
1.1
|
Returns:
| Type | Description |
|---|---|
float
|
|
bch_selected ¶
bch_selected(endog: ndarray, instruments: ndarray, exog: Optional[ndarray] = None, alpha: float = 0.05, c: float = 1.1, max_refit: int = 15) -> Tuple[List[int], ndarray, float]
BCH first-stage instrument selection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
endog
|
(n,) array — single endogenous regressor (partialled out of controls).
|
|
required |
instruments
|
(n, p) array of candidate instruments (partialled out of controls).
|
|
required |
exog
|
unused; kept for API symmetry.
|
|
None
|
alpha
|
penalty-rule parameters.
|
|
0.05
|
c
|
penalty-rule parameters.
|
|
0.05
|
max_refit
|
refit iterations for penalty loadings.
|
|
15
|
Returns:
| Type | Description |
|---|---|
(sel_indices, psi_final, lam)
|
|
jive_mw ¶
jive_mw(data: DataFrame, y: str, endog: str, instruments: Sequence[str], exog: Optional[Sequence[str]] = None, alpha: float = 0.05) -> ManyWeakIVResult
Jackknife Instrumental Variables Estimator (AIK 1999; Phillips-Hale 2018 variant). Less biased than 2SLS when K/n is not small.
Returns:
| Type | Description |
|---|---|
ManyWeakIVResult
|
|
many_weak_ar ¶
many_weak_ar(data: DataFrame, y: str, endog: str, instruments: Sequence[str], exog: Optional[Sequence[str]] = None, beta_grid: Optional[Sequence[float]] = None, alpha: float = 0.05) -> ManyWeakIVResult
Jackknife-Anderson-Rubin confidence set by grid inversion — valid under many-weak-IV (Mikusheva-Sun 2024, simplified).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
endog
|
str
|
|
required |
instruments
|
sequence of str
|
|
required |
exog
|
sequence of str
|
|
None
|
beta_grid
|
sequence of float
|
Candidate beta values to invert. |
None
|
alpha
|
float
|
|
0.05
|
continuous_iv_late ¶
continuous_iv_late(data: DataFrame, y: str, treat: str, instrument: str, n_quantiles: int = 4, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> ContinuousLATEResult
LATE with a continuous instrument via quantile-bin Wald estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
treat
|
str
|
|
required |
instrument
|
str
|
|
required |
n_quantiles
|
int
|
Number of quantile bins of the instrument; LATE is averaged across bins weighted by complier share. |
4
|
alpha
|
float
|
|
0.05
|
n_boot
|
int
|
|
200
|
seed
|
int
|
|
0
|
Returns:
| Type | Description |
|---|---|
ContinuousLATEResult
|
|
iv_compare ¶
iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs) -> DataFrame
Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula ( |
None
|
data
|
DataFrame
|
|
None
|
methods
|
sequence of str
|
Methods to dispatch through :func: |
('2sls', 'liml', 'fuller', 'jive')
|
alpha
|
float
|
For Wald-CI columns. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
columns |
fit ¶
fit(formula: Optional[str] = None, data: Any = None, *, method: str = '2sls', augmented_diagnostics: bool = True, **kwargs: Any)
Alias for :func:_dispatch. See sp.iv.__doc__ for usage.