`statspai.iv`¶

iv ¶

statspai.iv — the unified Instrumental Variables namespace.

The goal of this subpackage is to be the single entry point for every IV-flavoured workflow in StatsPAI, regardless of which sub-module the underlying implementation lives in.

The subpackage itself is callable::

sp.iv("y ~ (d ~ z) + x", data=df)                   # 2SLS (default)
sp.iv("y ~ (d ~ z) + x", data=df, method="liml")    # LIML
sp.iv(method="kernel", y=..., endog=..., instruments=..., data=df)
sp.iv.fit(...)         # equivalent (fit = _dispatch alias)
sp.iv.kernel_iv(...)   # individual estimators still reachable

That callable is provided by a tiny ModuleType subclass installed via sys.modules[__name__].__class__ (the standard PEP 562-style trick).

Sub-method coverage (method= keyword, all aliases lowercased):

K-class formula path (regression.iv): 2sls / tsls / iv, liml, fuller, gmm, jive.
Modern JIVE variants (iv.jive_variants): jive1, ujive, ijive, rjive.
Many-weak (iv.many_weak): jive_mw, many_weak_ar.
Lasso / post-Lasso: lasso (regression.advanced_iv.lasso_iv), rlasso / rigorous_lasso (rlasso.rlasso_iv — faithful hdm::rlassoIV port; instrument selection by default, double selection when exog= controls are given), post_lasso / bch (iv.post_lasso.bch_post_lasso_iv, deprecated — superseded by rlasso).
ML / nonparametric: kernel (iv.kernel_iv), npiv (iv.npiv), ivdml (iv.ivdml), deepiv (deepiv.deepiv, optional).
Bayesian (iv.bayesian_iv): bayes / bayesian.
LATE / MTE: continuous_late (iv.continuous_late), mte (iv.mte), ivmte_bounds (iv.ivmte_lp).
Quantile IV (regression.iv_quantile): ivqreg / quantile.
Plausibly exogenous sensitivity (iv.plausibly_exogenous): plausibly_exog_uci / plausibly_exog_ltz.
Shift-share (bartik): shift_share / bartik.

Diagnostics (anderson_rubin_test, effective_f_test, kleibergen_paap_rk, sanderson_windmeijer, conditional_lr_test) remain standalone — they are not estimators and intentionally do not show up in the method= table.

References

Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. doi:10.1080/01621459.1996.10476902 [@angrist1996identification]

Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. [@angrist2009mostly]

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> z1, z2, x1, u = (rng.normal(size=n), rng.normal(size=n),
...                  rng.normal(size=n), rng.normal(size=n))
>>> d = 0.7 * z1 + 0.5 * z2 + 0.3 * x1 + u + rng.normal(size=n)
>>> y = 1.0 + 1.5 * d + 0.4 * x1 + u + rng.normal(size=n)
>>> df = pd.DataFrame({"y": y, "d": d, "z1": z1, "z2": z2, "x1": x1})
>>> # Standard 2SLS with a rich diagnostic panel
>>> res = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df)
>>> bool(res.params["d"] > 0)
True

>>> # Sensitivity to exclusion-restriction violations
>>> chr = sp.iv(
...     method="plausibly_exog_ltz",
...     y="y", endog="d", instruments=["z1", "z2"],
...     gamma_mean=0.0, gamma_var=0.01, data=df,
... )

>>> # Marginal treatment effects (binary endogenous treatment)
>>> prop = 1 / (1 + np.exp(-(0.8 * z1 + 0.3 * x1 + u)))
>>> dbin = (rng.uniform(size=n) < prop).astype(float)
>>> ybin = 1.0 + 1.5 * dbin + 0.4 * x1 + u + rng.normal(size=n)
>>> df_mte = pd.DataFrame({"y": ybin, "d": dbin, "z": z1, "x": x1})
>>> m = sp.iv(method="mte",
...          y="y", endog="d", instruments=["z"], exog=["x"], data=df_mte)

IVRegression ¶

Bases: BaseModel

Instrumental Variables regression model.

Supports multiple estimation methods via method parameter: '2sls', 'liml', 'fuller', 'gmm', 'jive'.

Parameters:

Name	Type	Description	Default
`formula`	`str`	Formula with IV syntax: `"y ~ (endog ~ z1 + z2) + exog1 + exog2"`	`None`
`data`	`DataFrame`		`None`
`method`	`str`	Estimation method.	`'2sls'`
`fuller_alpha`	`float`	Fuller constant (only used when method='fuller'). `alpha=1` gives the bias-corrected Fuller estimator; `alpha=4` minimises MSE under normal errors.	`1.0`
`y`	`array - like`	Alternative to formula interface.	`None`
`X_exog`	`array - like`	Alternative to formula interface.	`None`
`X_endog`	`array - like`	Alternative to formula interface.	`None`
`Z`	`array - like`	Alternative to formula interface.	`None`
`var_names`	`array - like`	Alternative to formula interface.	`None`

References

Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. doi:10.1080/01621459.1996.10476902 [@angrist1996identification]

Angrist, J. D. and Pischke, J.-S. (2009). Mostly Harmless Econometrics: An Empiricist's Companion. Princeton University Press. [@angrist2009mostly]

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(2)
>>> z = rng.normal(size=300)
>>> u = rng.normal(size=300)
>>> x = 0.8 * z + u + rng.normal(size=300)
>>> y = 1.0 + 2.0 * x + u + rng.normal(size=300)
>>> df = pd.DataFrame({"y": y, "x": x, "z": z})
>>> model = sp.IVRegression("y ~ (x ~ z)", data=df, method="2sls")
>>> res = model.fit()
>>> bool(1.5 < float(res.params["x"]) < 2.5)
True

first_stage `property` ¶

first_stage: List[Dict[str, float]]

First-stage diagnostics for each endogenous variable.

sargan_test `property` ¶

sargan_test: Optional[Dict[str, float]]

Sargan/Hansen J overidentification test results.

hausman_test `property` ¶

hausman_test: Dict[str, float]

Durbin-Wu-Hausman endogeneity test results.

fit ¶

fit(robust: Any = 'nonrobust', cluster: Optional[str] = None, **kwargs: Any) -> EconometricResults

Fit the IV model.

Parameters:

Name	Type	Description	Default
`robust`	`str or bool`	Standard-error type. Accepts 'nonrobust' and 'hc0'–'hc3' (case-insensitive), plus the aliases `True` / `'robust'` (≡ HC1, matching Stata) and `'white'` (≡ HC0). Classical and robust SEs match `ivregress 2sls, small` / `..., robust small` (the finite-sample t convention).	`'nonrobust'`
`cluster`	`str`	Variable name for clustering.	`None`

Returns:

Type	Description
`EconometricResults`

predict ¶

predict(data: Optional[DataFrame] = None) -> ndarray

Generate predictions from the fitted IV model.

For a structural-form estimator, the natural forecast of y given new data is X_exog β_exog + X_endog β_endog — i.e. we plug observed values of the endogenous variables through the structural equation. Instruments are not used at prediction time.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	New data at which to predict. Must contain all exogenous and endogenous variables referenced by the model's formula. If `None`, returns in-sample fitted values.	`None`

KleibergenPaapResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for Kleibergen-Paap rk test output.

SandersonWindmeijerResult `dataclass` ¶

Bases: ResultProtocolMixin

Sanderson-Windmeijer conditional F for each endogenous variable.

CLRResult `dataclass` ¶

Bases: ResultProtocolMixin

Moreira (2003) CLR test.

MTEResult `dataclass` ¶

Bases: ResultProtocolMixin

Marginal treatment effects result.

WeakIVConfidenceSet `dataclass` ¶

as_intervals ¶

as_intervals() -> List[Tuple[float, float]]

Return the CI as a list of (lo, hi) intervals (handles disconnection).

PostLassoResult `dataclass` ¶

Bases: ResultProtocolMixin

Return of :func:bch_post_lasso_iv.

BayesianIVResult `dataclass` ¶

Bases: ResultProtocolMixin

Posterior from Bayesian IV.

NPIVResult `dataclass` ¶

Bases: ResultProtocolMixin

Nonparametric IV estimation result.

KernelIVResult `dataclass` ¶

Bases: ResultProtocolMixin

Output of kernel IV regression.

Returned by :func:sp.kernel_iv. Holds the treatment grid, the estimated structural function h_hat on that grid, the uniform confidence band (ci_low / ci_high), and the bandwidth. Call .summary() for a formatted preview.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 300
>>> z = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> d = 0.8 * z + 0.5 * u + rng.normal(size=n)
>>> y = np.sin(d) + u + 0.3 * rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'd': d, 'z': z})
>>> res = sp.kernel_iv(df, y='y', treat='d', instrument='z',
...                    n_boot=50)
>>> isinstance(res, sp.KernelIVResult)
True
>>> res.h_hat.shape  # structural function on a 30-point grid
(30,)

ContinuousLATEResult `dataclass` ¶

Bases: ResultProtocolMixin

Continuous-instrument LATE on the maximal complier class.

Returned by :func:sp.continuous_iv_late. Holds the LATE estimate, bootstrap SE, CI, and the complier share of the maximal complier class. Call .summary() for a formatted report.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> z = rng.normal(size=n)
>>> v = rng.normal(size=n)
>>> d = (0.9 * z + v > 0).astype(float)
>>> y = 1.2 * d + 0.5 * v + rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'd': d, 'z': z})
>>> res = sp.continuous_iv_late(df, y='y', treat='d', instrument='z')
>>> isinstance(res, sp.ContinuousLATEResult)
True
>>> round(res.estimate, 2)
1.81

IVDMLResult `dataclass` ¶

Bases: ResultProtocolMixin

Output of IV × DML.

IVDiagResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for :func:iv_diag output.

Attributes:

Name	Type	Description
`n`	`int`	Sample size after listwise deletion.
`n_endog, n_instruments, n_exog`	`int`	Counts of endogenous regressors, excluded instruments, included exogenous controls (excluding intercept).
`endog, instruments, exog`	`list[str]`	Variable names.
`beta_2sls, se_2sls, t_2sls, p_2sls`	`float`	2SLS point estimate, analytic SE, t-ratio, p-value (single endogenous regressor).
`ci_analytic_2sls`	`tuple[float, float]`	Analytic Wald CI based on `se_2sls` (level = `1 - alpha`).
`beta_ols, se_ols, ci_ols`	`(float, float, tuple[float, float])`	OLS counterpart (informative comparator; not causal).
`first_stage_F`	`float`	Classical first-stage F.
`effective_F`	`float`	Olea–Pflueger (2013) robust effective F.
`tF_critical_value`	`float`	Lee et al. (2022, AER) tF adjusted 5 % critical value at the observed first-stage F. `inf` if F < 3.84.
`ar_stat, ar_pvalue`	`float`	Anderson–Rubin (1949) F-statistic and p-value at `h0`.
`ar_ci`	`tuple[float, float]`	AR confidence set (grid-inverted). `±inf` flags a one-/two- sided unbounded set.
`clr_ci, k_ci`	`tuple[float, float] \| None`	Moreira (2003) CLR and Kleibergen (2002) K confidence sets. `None` if not requested.
`kp_rk_lm, kp_rk_lm_pvalue, kp_rk_f`	`float \| None`	Kleibergen–Paap (2006) rk LM, p-value, Wald F.
`bootstrap_ci_analytic, bootstrap_ci_pairs, bootstrap_ci_wild`		Pair-/wild-bootstrap CI tuples (or `None`).
`bootstrap_se_pairs, bootstrap_se_wild`	`float \| None`	Bootstrap standard errors (matching CI sources).
`bootstrap_n`	`int`	Number of bootstrap replications actually used.
`ltz_ci, ltz_warning`	`(tuple[float, float] \| None, str \| None)`	Conley–Hansen–Rossi LTZ sensitivity CI under `gamma_var = (gamma_sd) ** 2`.
`tF_adjusted_ci`	`tuple[float, float] \| None`	`beta ± tF_critical_value × se`. Falls back to `±inf` when `F < 3.84` (per LMMP 2022).
`tsls_late_caveat`	`str \| None`	BBMT (2022/2025) / Słoczyński (2024) caveat text whenever the specification is at risk of negative-weight LATE pathologies.
`diagnostics`	`dict`	All numeric outputs in a flat dict (also returned by :meth:`to_dict`). Useful for downstream agent workflows.
`raw`	`dict`	Internal scratchpad with arrays (residuals, fitted values).

Examples:

>>> import statspai as sp
>>> df = sp.dgp_iv(n=300, n_instruments=2, first_stage=0.6, seed=0)
>>> r = sp.iv.iv_diag(df, y='y', endog='treatment',
...                   instruments=['instrument_1', 'instrument_2'],
...                   exog=['x1', 'x2'], n_boot=200, random_state=42)
>>> isinstance(r, sp.IVDiagResult)
True
>>> bool(r.effective_F > 0)        # Olea-Pflueger robust F
True
>>> r.to_frame().shape[0] > 0      # tidy diagnostic table
True

to_frame ¶

to_frame() -> DataFrame

Return a tidy summary table — one row per estimator/metric.

to_dict ¶

to_dict() -> Dict[str, Any]

Return all numeric diagnostics as a flat dict (jsonable).

to_latex ¶

to_latex(caption: Optional[str] = None, label: Optional[str] = None, float_format: str = '%.4f') -> str

Render the summary table as a LaTeX tabular string.

to_excel ¶

to_excel(path: str) -> None

Write the summary table to path (one sheet).

to_word ¶

to_word(path: str, title: Optional[str] = None) -> None

Write the summary table to a .docx file (requires python-docx).

plot ¶

plot(kind: str = 'diagnostic', **kwargs: Any) -> Any

Dispatch to :mod:statspai.iv.plot plotting helpers.

Parameters:

Name	Type	Description	Default
`kind`	`('diagnostic', 'forest', 'weak_iv', 'first_stage')`	Which plot to render. `'diagnostic'` returns the 2x2 panel.	`'diagnostic'`

iv ¶

iv(formula: Optional[str] = None, data: Optional[DataFrame] = None, method: str = '2sls', robust: str = 'nonrobust', cluster: Optional[str] = None, fuller_alpha: float = 1.0, absorb: Optional[Union[str, List[str]]] = None, **kwargs: Any) -> EconometricResults

Unified instrumental variables estimation.

Supports multiple methods through the method parameter:

'2sls' — Two-Stage Least Squares (default).
'liml' — Limited Information Maximum Likelihood. Better finite- sample properties under weak instruments; approximately median-unbiased.
'fuller' — Fuller (1977) modified LIML with finite-sample bias correction. fuller_alpha=1 removes first-order bias; fuller_alpha=4 minimises MSE under normality.
'gmm' — Efficient two-step GMM. More efficient than 2SLS under heteroskedasticity when over-identified.
'jive' — Jackknife IV (Angrist, Imbens & Krueger 1999). Reduces many-instrument bias by using leave-one-out fitted values.

For DeepIV (neural network IV) use sp.deepiv(). For Bartik shift-share IV use sp.bartik().

Parameters:

Name	Type	Description	Default
`formula`	`str`	IV formula: `"y ~ (endog ~ z1 + z2) + exog1 + exog2"` Variables in parentheses before `~`: endogenous regressors Variables in parentheses after `~`: excluded instruments Variables outside parentheses: exogenous controls	`None`
`data`	`DataFrame`	Data containing all variables.	`None`
`method`	`str`	Estimation method: '2sls', 'liml', 'fuller', 'gmm', 'jive'.	`'2sls'`
`robust`	`str`	Standard-error type ('nonrobust', 'hc0', 'hc1', 'hc2', 'hc3').	`'nonrobust'`
`cluster`	`str`	Variable name for clustered standard errors.	`None`
`fuller_alpha`	`float`	Fuller modification constant (only used when `method='fuller'`).	`1.0`
`absorb`	`str or list of str`	Column name(s) of high-dimensional fixed effects to partial out before fitting (e.g. `absorb="firm"` or `absorb=["firm", "year"]`). Routes `y`, exogenous controls, endogenous regressors, and instruments through :func:`sp.fast.demean` (Rust HDFE backend) and drops singletons, then runs 2SLS in residualised space. The intercept is dropped because the absorbed FEs span the constant. The residual DOF is adjusted by `sum(G_k - 1)`, mirroring :func:`sp.fast.feols(absorb=...)`. Currently only wired for `method='2sls'`; LIML / Fuller / GMM / JIVE raise `NotImplementedError` (Phase 3b).	`None`

Returns:

Type	Description
`EconometricResults`	Fitted model results with integrated IV diagnostics: First-stage F-statistics and partial R² Sargan/Hansen J overidentification test (when over-identified) Durbin-Wu-Hausman endogeneity test Weak instrument warnings

Examples:

>>> # Standard 2SLS
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df)
>>> print(result.summary())

>>> # LIML (better with weak instruments)
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df, method='liml')

>>> # Fuller with bias correction
>>> result = sp.iv("wage ~ (education ~ parent_edu) + experience",
...               data=df, method='fuller', fuller_alpha=1)

>>> # Efficient GMM with robust SEs
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df, method='gmm', robust='hc1')

>>> # JIVE (many instruments)
>>> result = sp.iv("wage ~ (education ~ z1 + z2 + z3 + z4 + z5) + experience",
...               data=df, method='jive')

Notes

Which method to choose?

Start with '2sls'. If first-stage F < 10, switch to 'liml' or 'fuller'.
If you have many instruments (m >> k₂) and worry about bias, use 'jive' or 'liml'.
If over-identified and you suspect heteroskedasticity, use 'gmm' for efficiency.
For nonparametric / ML-based IV, see sp.deepiv().

Diagnostics included automatically:

First-stage F < 10 triggers a weak-instrument warning.
Sargan test (2SLS/LIML/Fuller/JIVE) or Hansen J (GMM) for overidentification.
Durbin-Wu-Hausman test for endogeneity.

References

Wooldridge (2010), Ch. 5-8.
Stock & Yogo (2005), for weak-instrument critical values.
Fuller (1977), for the finite-sample correction.
Hansen (1982), for GMM.
Angrist, Imbens & Krueger (1999), for JIVE.

ivreg ¶

ivreg(formula: str, data: DataFrame, robust: str = 'nonrobust', cluster: Optional[str] = None, *, vce: Optional[str] = None, wild_reps: int = 999, wild_weight_type: str = 'rademacher', seed: Optional[int] = None, conley_lat: Optional[str] = None, conley_lon: Optional[str] = None, conley_cutoff: Optional[float] = None, **kwargs: Any) -> EconometricResults

Instrumental variables regression (2SLS).

.. deprecated:: Use sp.iv(formula, data, method='2sls') instead. ivreg is kept for backward compatibility.

Parameters:

Name	Type	Description	Default
`formula`	`str`	IV formula: `"y ~ (endog ~ z1 + z2) + exog1 + exog2"`	required
`data`	`DataFrame`		required
`robust`	`str`		`'nonrobust'`
`cluster`	`str`		`None`
`vce`	`str`	Set `vce="wild"` (with `cluster=`) to run the WRE wild cluster bootstrap (Davidson-MacKinnon 2010) on the endogenous coefficient — pinned to Stata `boottest` after `ivreg2`. Otherwise `vce` is the canonical alias for `robust`.	`None`
`wild_reps`	`int`	Controls for the `vce="wild"` path.	`999`
`wild_weight_type`	`int`	Controls for the `vce="wild"` path.	`999`
`seed`	`int`	Controls for the `vce="wild"` path.	`999`

Returns:

Type	Description
`EconometricResults`

References

Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of Causal Effects Using Instrumental Variables. Journal of the American Statistical Association. doi:10.1080/01621459.1996.10476902 [@angrist1996identification]

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> z = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> x = 0.8 * z + u + rng.normal(size=n)        # endogenous regressor
>>> y = 1.5 * x + 2.0 * u + rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'x': x, 'z': z})
>>> result = sp.ivreg("y ~ (x ~ z)", data=df)
>>> bool(abs(result.params['x'] - 1.5) < 0.2)  # 2SLS recovers the true effect
True

>>> # Preferred modern entry point:
>>> result = sp.iv("y ~ (x ~ z)", data=df, method='2sls')

liml ¶

liml(formula: Optional[str] = None, data: Optional[DataFrame] = None, y: Optional[str] = None, x_endog: Optional[List[str]] = None, x_exog: Optional[List[str]] = None, z: Optional[List[str]] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, fuller: Optional[float] = None, alpha: float = 0.05) -> EconometricResults

Limited Information Maximum Likelihood (LIML) estimator.

More robust to weak instruments than 2SLS. Fuller's modification provides improved finite-sample properties.

Equivalent to Stata's ivregress liml y (x_endog = z) x_exog.

Parameters:

Name	Type	Description	Default
`formula`	`str`	Formula: "y ~ x_exog \| x_endog \| z" or "y ~ x_exog + (x_endog ~ z)".	`None`
`data`	`DataFrame`		`None`
`y`	`str`	Outcome variable.	`None`
`x_endog`	`list of str`	Endogenous regressors.	`None`
`x_exog`	`list of str`	Exogenous regressors (included instruments).	`None`
`z`	`list of str`	Excluded instruments.	`None`
`robust`	`str`		`'nonrobust'`
`cluster`	`str`		`None`
`fuller`	`float`	Fuller's constant (typically 1 or 4). If None, pure LIML.	`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 300
>>> z1, z2 = rng.normal(size=n), rng.normal(size=n)   # excluded instruments
>>> exper = rng.normal(size=n)                        # exogenous control
>>> u = rng.normal(size=n)                            # endogeneity
>>> educ = 0.6 * z1 + 0.5 * z2 + 0.3 * exper + u + rng.normal(size=n)
>>> lwage = 1.0 + 0.8 * educ + 0.2 * exper + 1.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({'lwage': lwage, 'educ': educ, 'exper': exper,
...                    'z1': z1, 'z2': z2})
>>> result = sp.liml(data=df, y='lwage', x_endog=['educ'],
...                  x_exog=['exper'], z=['z1', 'z2'])
>>> bool(abs(result.params['educ'] - 0.8) < 0.2)
True

jive_legacy ¶

jive_legacy(data: DataFrame, y: str, x_endog: List[str], x_exog: Optional[List[str]] = None, z: Optional[List[str]] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, variant: str = 'jive1', alpha: float = 0.05) -> EconometricResults

Jackknife Instrumental Variables Estimation (JIVE).

Reduces finite-sample bias from many instruments by using leave-one-out fitted values as instruments.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`x_endog`	`list of str`		required
`x_exog`	`list of str`		`None`
`z`	`list of str`		`None`
`robust`	`str`		`'nonrobust'`
`cluster`	`str`		`None`
`variant`	`str`	'jive1' (Angrist et al. 1999) or 'jive2' (alternative).	`'jive1'`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 300
>>> Z = rng.normal(size=(n, 5))                  # many instruments
>>> u = rng.normal(size=n)
>>> x = Z @ np.full(5, 0.4) + u + rng.normal(size=n)
>>> y = 1.0 * x + 1.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame(Z, columns=[f'z{i}' for i in range(5)])
>>> df['y'], df['x'] = y, x
>>> result = sp.jive(df, y='y', x_endog=['x'],
...                  z=[f'z{i}' for i in range(5)])
>>> bool(abs(result.params['x'] - 1.0) < 0.2)  # bias-reduced IV estimate
True

lasso_iv ¶

lasso_iv(data: DataFrame, y: str, x_endog: List[str], x_exog: Optional[List[str]] = None, z: Optional[List[str]] = None, robust: str = 'robust', cluster: Optional[str] = None, penalty: str = 'bic', alpha: float = 0.05) -> EconometricResults

LASSO-selected instrumental variables.

Uses LASSO to select relevant instruments from a large set, then estimates IV/2SLS with selected instruments. Belloni, Chen, Chernozhukov & Hansen (2012).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`x_endog`	`list of str`		required
`x_exog`	`list of str`		`None`
`z`	`list of str`	Full set of candidate instruments.	`None`
`robust`	`str`		`'robust'`
`cluster`	`str`		`None`
`penalty`	`str`	Instrument selection criterion: 'bic', 'aic', 'cv'.	`'bic'`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(7)
>>> n = 400
>>> # 20 candidate instruments, only the first 5 are relevant
>>> Z = rng.normal(size=(n, 20))
>>> u = rng.normal(size=n)                            # endogeneity
>>> educ = Z[:, :5] @ np.full(5, 0.5) + u + rng.normal(size=n)
>>> lwage = 1.0 + 0.7 * educ + 1.2 * u + rng.normal(size=n)
>>> df = pd.DataFrame(Z, columns=[f'z{i}' for i in range(20)])
>>> df['lwage'], df['educ'] = lwage, educ
>>> result = sp.lasso_iv(df, y='lwage', x_endog=['educ'],
...                      z=[f'z{i}' for i in range(20)])
>>> bool(abs(result.params['educ'] - 0.7) < 0.2)
True

anderson_rubin_test ¶

anderson_rubin_test(data: DataFrame, y: str, endog: str, instruments: List[str], exog: Optional[List[str]] = None, h0: float = 0, alpha: float = 0.05, vcov: str = 'HC1') -> Dict[str, Any]

Anderson-Rubin (1949) test — size-correct under weak instruments.

Tests H0: β_endog = h0 and constructs a confidence set by inverting the test over a grid of candidate values. The AR test has correct size regardless of instrument strength and is the recommended inference procedure when F_eff < 10.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`	Outcome variable.	required
`endog`	`str`	Endogenous regressor (single).	required
`instruments`	`list of str`	Excluded instruments.	required
`exog`	`list of str`	Included exogenous controls.	`None`
`h0`	`float`	Null hypothesis value for the endogenous coefficient.	`0`
`alpha`	`float`	Significance level.	`0.05`
`vcov`	`(HC0, HC1, classic)`	Variance estimator used for the Olea-Pflueger effective F reported alongside AR.	`'HC0'`

Returns:

Type Description

dict

'ar_stat' : AR F statistic at h0. 'ar_df' : Degrees of freedom (k_z, n - k_w - k_z). 'ar_pvalue' : P-value of AR test at h0. 'ar_ci' : (low, high) AR confidence set (grid-inverted). 'beta_2sls' : 2SLS point estimate. 'first_stage_F' : Classical first-stage F. 'effective_F' : Olea-Pflueger robust F_eff. 'tF_critical_value' : Lee et al. (2022) adjusted 5 % two-sided critical value. None if alpha != 0.05. 'strength' : Instrument-strength interpretation. 'interpretation' : Human-readable summary.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> result = sp.anderson_rubin_test(df, y='wage', endog='educ',
...                                 instruments=['z1', 'z2'])
>>> bool(0.0 <= result['ar_pvalue'] <= 1.0)
True

Notes

Under H0: β = β₀, regress Y - β₀·D on Z and W. The F-test on Z gives the AR statistic. The confidence set is constructed by collecting all β₀ values not rejected at level alpha. If the AR set is unbounded on one or both sides, the returned (low, high) will be ±inf accordingly.

effective_f_test ¶

effective_f_test(data: DataFrame, endog: str, instruments: List[str], exog: Optional[List[str]] = None, vcov: str = 'HC1') -> Dict[str, Any]

Olea-Pflueger (2013) robust effective F statistic for weak instruments.

Computes the heteroskedasticity-robust effective F that is a pre-test for the concentration parameter of the first stage. Under homoskedasticity (vcov='classic') and a single instrument it reduces exactly to the standard first-stage F.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`endog`	`str`	Endogenous regressor (single endogenous variable).	required
`instruments`	`list of str`	Excluded instruments.	required
`exog`	`list of str`	Included exogenous controls (a constant is added automatically).	`None`
`vcov`	`(HC0, HC1, classic)`	Variance estimator for the first-stage residuals: `'classic'` — homoskedastic; F_eff equals first-stage F. `'HC0'` — White heteroskedasticity-robust. `'HC1'` — HC0 with small-sample correction `n/(n-k)`.	`'HC0'`

Returns:

Type	Description
`dict`	`'F_eff'` : Olea-Pflueger effective F. `'first_stage_F'` : Classical first-stage F (for comparison). `'n_instruments'` : Number of excluded instruments (`k_z`). `'n_obs'` : Sample size. `'strength'` : Interpretation string. `'stock_yogo_10pct'` : 23.1 (conventional threshold for <10 % bias).

Notes

The formula (Andrews-Stock-Sun 2019, eq. 4.13) is

.. math::

F_{\text{eff}} = \frac{\hat\pi' (\tilde Z' \tilde Z)\hat\pi}
    {\mathrm{tr}\!\left(\hat\Omega\,(\tilde Z' \tilde Z)^{-1}\right)}

where :math:\tilde Z, \tilde D are residualized after partialling out the exogenous controls, :math:\hat\pi is the first-stage OLS coefficient vector, and :math:\hat\Omega = \sum_i \hat\eta_i^2 \tilde z_i \tilde z_i' is the HC meat.

Under homoskedasticity :math:\hat\Omega \approx \hat\sigma_\eta^2 \tilde Z'\tilde Z, so the trace collapses to :math:k_z \hat \sigma_\eta^2 and :math:F_{\text{eff}} reduces to the first-stage F.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> educ = 0.7 * z1 + 0.5 * z2 + 0.5 * u + rng.normal(size=n)
>>> df = pd.DataFrame({
...     "wage": 1.0 + 0.4 * educ + u + rng.normal(size=n),
...     "educ": educ, "z1": z1, "z2": z2,
... })
>>> res = sp.effective_f_test(df, endog='educ', instruments=['z1', 'z2'])
>>> bool(res['F_eff'] > res['stock_yogo_10pct'])  # strong instruments
True

tF_critical_value ¶

tF_critical_value(first_stage_F: float, alpha: float = 0.05) -> float

Lee–McCrary–Moreira–Porter (2022, AER) tF adjusted critical value.

Returns the adjusted two-sided t-ratio critical value c(F) such that |t| > c(F) is a valid 1 - alpha test of the 2SLS coefficient, robust to weak instruments.

Parameters:

Name	Type	Description	Default
`first_stage_F`	`float`	Observed first-stage F statistic (or Olea-Pflueger F_eff).	required
`alpha`	`float`	Significance level. Only `0.05` is implemented (the only level for which LMMP publish a complete table).	`0.05`

Returns:

Type	Description
`float`	Adjusted critical value. Returns `inf` when `F ≤ 3.84` (AR inference should be used instead). Converges to `1.96` as `F → ∞`.

Raises:

Type	Description
`ValueError`	If `alpha` is not 0.05.

Notes

The LMMP tF procedure gives exactly correct 5 % size for any first-stage strength. The standard 1.96 critical value over-rejects substantially when F < 104.7 (the value at which c = 1.96).

Examples:

>>> import statspai as sp
>>> # With first-stage F = 10, the adjusted critical value is ~3.16
>>> c = sp.tF_critical_value(10.0)
>>> print(f"Adjusted 5% critical value: {c:.2f}")

kleibergen_paap_rk ¶

kleibergen_paap_rk(endog: Union[ndarray, DataFrame], instruments: Union[ndarray, DataFrame], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, cov_type: str = 'robust', cluster: Optional[Union[ndarray, Series]] = None, add_const: bool = True) -> KleibergenPaapResult

Kleibergen-Paap (2006) rk Wald / LM statistic.

Tests the null that the reduced-form coefficient matrix on the excluded instruments has rank n_endog - 1 (under-identification) against the alternative of full rank.

This is the heteroskedasticity- and cluster-robust generalisation of the classical Cragg-Donald statistic. ivreg2 in Stata reports the identical statistic.

Parameters:

Name	Type	Description	Default
`endog`	`(array or DataFrame, shape(n, p))`	Endogenous regressors.	required
`instruments`	`(array or DataFrame, shape(n, k))`	Excluded instruments (`k >= p`).	required
`exog`	`array, DataFrame or list of column names`	Included exogenous regressors (controls). Intercept is added automatically when `add_const=True`.	`None`
`data`	`DataFrame`	Used only when `exog` is a list of column names.	`None`
`cov_type`	`('nonrobust', 'robust', 'cluster')`	Covariance for the stacked reduced-form equations.	`'nonrobust'`
`cluster`	`array - like`	Required when `cov_type='cluster'`.	`None`
`add_const`	`bool`	Prepend a constant to the exogenous block.	`True`

Returns:

Type	Description
`KleibergenPaapResult`

sanderson_windmeijer ¶

sanderson_windmeijer(endog: Union[ndarray, DataFrame], instruments: Union[ndarray, DataFrame], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, endog_names: Optional[List[str]] = None) -> SandersonWindmeijerResult

Sanderson-Windmeijer (2016) conditional first-stage F.

For each endogenous regressor j, residualises all other endogenous regressors out of both the outcome (that endog column) and the instruments, then reports the first-stage F of the resulting partial regression. This is the correct individual-endogenous weak-IV diagnostic when multiple endogenous regressors are present.

When only one endogenous regressor is present, this reduces exactly to the standard first-stage F.

Parameters:

Name	Type	Description	Default
`endog`	`(array or DataFrame, shape(n, p))`		required
`instruments`	`(array or DataFrame, shape(n, k))`		required
`exog`	`array, DataFrame or list of column names`		`None`
`data`	`DataFrame`		`None`
`add_const`	`bool`		`True`
`endog_names`	`list of str`	Labels for endogenous columns when passing numpy arrays.	`None`

Returns:

Type	Description
`SandersonWindmeijerResult`

conditional_lr_test ¶

conditional_lr_test(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, beta0: float = 0.0, add_const: bool = True, n_simulations: int = 20000, random_state: Optional[int] = None) -> CLRResult

Moreira (2003) Conditional Likelihood Ratio (CLR) test.

Tests H0: beta = beta0 in a single-endogenous-variable IV model. Weak-IV-robust and uniformly most powerful invariant in the one endogenous-variable case.

Parameters:

Name	Type	Description	Default
`y`	`array, Series or column name`	Outcome and the single endogenous regressor.	required
`endog`	`array, Series or column name`	Outcome and the single endogenous regressor.	required
`instruments`	`array, DataFrame or list of column names`		required
`exog`	`array, DataFrame or list of column names`		`None`
`data`	`DataFrame`		`None`
`beta0`	`float`	Null-hypothesis value of beta on `endog`.	`0.0`
`add_const`	`bool`		`True`
`n_simulations`	`int`	Monte-Carlo draws for the conditional critical value.	`20000`
`random_state`	`int`		`None`

Returns:

Type	Description
`CLRResult`

plausibly_exogenous_uci ¶

plausibly_exogenous_uci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], gamma_grid: Union[ndarray, Iterable[float]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, ci_level: float = 0.95) -> PlausiblyExogenousResult

Union-of-CIs (UCI) plausibly exogenous bounds (Conley, Hansen, Rossi 2012).

For each candidate γ in gamma_grid (one value per instrument), the model y = D*beta + Z*gamma + W*alpha + u is estimated by 2SLS after subtracting Z @ gamma from y; the final confidence set for beta is the union over γ of the per-γ 2SLS CIs.

Parameters:

Name	Type	Description	Default
`y`	`array, Series or column name`	Outcome and a single endogenous regressor.	required
`endog`	`array, Series or column name`	Outcome and a single endogenous regressor.	required
`instruments`	`array, DataFrame or list of str`		required
`gamma_grid`	`array - like`	Candidate direct-effect vectors. If `instruments` has `k` columns, `gamma_grid` may be (m,) for k=1 or (m, k) for k>1.	required
`exog`	`usual options.`		`None`
`data`	`usual options.`		`None`
`add_const`	`usual options.`		`None`
`ci_level`	`usual options.`		`None`

Returns:

Type	Description
`PlausiblyExogenousResult`

plausibly_exogenous_ltz ¶

plausibly_exogenous_ltz(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], gamma_mean: Union[float, ndarray] = 0.0, gamma_var: Union[float, ndarray] = 0.0, exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, add_const: bool = True, ci_level: float = 0.95) -> PlausiblyExogenousResult

Local-to-zero (LTZ) plausibly exogenous (Conley, Hansen, Rossi 2012).

Places a Gaussian prior on γ with mean gamma_mean and covariance gamma_var * I (scalar) or the matrix gamma_var, then integrates it out to yield a closed-form adjustment to β's asymptotic variance:

beta_LTZ   = beta_2SLS - A * gamma_mean
Var(beta)  = Var_2SLS + A * Omega * A'

where A = (X'P_Z X)^{-1} X'P_Z Z restricted to β's row.

Parameters:

Name	Type	Description	Default
`y`	`usual.`		required
`endog`	`usual.`		required
`instruments`	`usual.`		required
`exog`	`usual.`		required
`data`	`usual.`		required
`add_const`	`usual.`		required
`gamma_mean`	`float or array of shape (k,)`		`0.0`
`gamma_var`	`float or array of shape (k, k)`	Prior variance. `0` reproduces exact exogeneity → 2SLS result.	`0.0`
`ci_level`	`float`		`0.95`

Returns:

Type	Description
`PlausiblyExogenousResult`

jive1 ¶

jive1(y: Any, endog: Any, instruments: Any, exog: Any = None, data: Optional[DataFrame] = None, add_const: bool = True) -> JIVEResult

Angrist-Imbens-Krueger (1999) JIVE1.

ujive ¶

ujive(y: Any, endog: Any, instruments: Any, exog: Any = None, data: Optional[DataFrame] = None, add_const: bool = True) -> JIVEResult

Kolesár (2013) UJIVE.

ijive ¶

ijive(y: Any, endog: Any, instruments: Any, exog: Any = None, data: Optional[DataFrame] = None, add_const: bool = True) -> JIVEResult

Ackerberg-Devereux (2009) IJIVE.

rjive ¶

rjive(y: Any, endog: Any, instruments: Any, exog: Any = None, data: Optional[DataFrame] = None, add_const: bool = True, ridge: float = 1.0) -> JIVEResult

Hansen-Kozbur (2014) Ridge JIVE.

ivmte_bounds ¶

ivmte_bounds(y: Union[ndarray, Series, str], treatment: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, target: str = 'ate', late_bounds: Optional[Tuple[float, float]] = None, policy_prob: Optional[ndarray] = None, basis_degree: int = 3, n_propensity_bins: int = 8, bounds_outcome: Optional[Tuple[float, float]] = None, decreasing_mte: bool = False, add_const: bool = True, include_bmw_point: bool = True) -> IVMTEBounds

Sharp identified bounds for an MTE-type target parameter — MST (2018).

Parameters:

Name	Type	Description	Default
`y`	usual IV arguments; ``treatment``	must be binary.	required
`treatment`	usual IV arguments; ``treatment``	must be binary.	required
`instruments`	usual IV arguments; ``treatment``	must be binary.	required
`exog`	usual IV arguments; ``treatment``	must be binary.	required
`data`	usual IV arguments; ``treatment``	must be binary.	required
`target`	`('ate', 'att', 'atu', 'late', 'prte')`		`'ate'`
`late_bounds`	(p_lo, p_hi), only for ``target='late'``.		`None`
`policy_prob`	new propensity realisation, only for ``target='prte'``.		`None`
`basis_degree`	`polynomial order K for the MTR basis.`		`3`
`n_propensity_bins`	`number of propensity-score cells used for the`	reduced-form IV moments.	`8`
`bounds_outcome`	`(lo, hi) box-constraint on the MTR functions.`	If your outcome is in [0, 1] (e.g. employment), pass (0, 1).	`None`
`decreasing_mte`	`if True, impose non-increasing MTE (Heckman-Vytlacil).`		`False`
`include_bmw_point`	also run :func:`sp.iv.mte` with the same basis	and return the point estimate for side-by-side reporting.	`True`

Returns:

Type	Description
`IVMTEBounds`

anderson_rubin_ci ¶

anderson_rubin_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 401, beta_grid: Optional[ndarray] = None, add_const: bool = True) -> WeakIVConfidenceSet

Anderson-Rubin (1949) confidence set by grid inversion.

For each candidate β₀, compute the AR F-statistic

AR(β₀) = (u₀' P_Z u₀ / k) / (u₀' M_Z u₀ / (n - k - kW))
u₀ = y - β₀ · d  (partialled out of exogenous controls)

and include β₀ in the CI whenever AR(β₀) ≤ F_{k, n-k-kW}^{1-α}.

Valid under any instrument strength. Under weak identification the set can be disconnected or unbounded — we flag both.

conditional_lr_ci ¶

conditional_lr_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 201, beta_grid: Optional[ndarray] = None, n_sim: int = 5000, add_const: bool = True, random_state: Optional[int] = None) -> WeakIVConfidenceSet

Moreira (2003) CLR confidence set by grid inversion.

At each candidate β₀, compute the CLR statistic and its conditional critical value via Monte-Carlo given T'T; include β₀ iff CLR(β₀) ≤ c(T'T, 1-α).

Uniformly most powerful invariant under normal errors with a single endogenous regressor. Tight under strong ID, wide under weak ID.

k_test_ci ¶

k_test_ci(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, level: float = 0.95, n_grid: int = 401, beta_grid: Optional[ndarray] = None, add_const: bool = True) -> WeakIVConfidenceSet

Kleibergen (2002) K-test confidence set by grid inversion.

The K-statistic projects the AR score onto the (estimated) score direction of β, giving a 1-df χ²-valued pivot even under weak ID.

K(β₀)  =  n · (score_β|β₀)² / var
       ≈  (S'T)² / (T'T)

where S and T are the AR score and "T-statistic" from Moreira (2003). Faster than CLR but slightly less powerful; still weak-IV-robust.

bch_post_lasso_iv ¶

bch_post_lasso_iv(y: Union[ndarray, Series, str], endog: Union[ndarray, Series, str], instruments: Union[ndarray, DataFrame, List[str]], exog: Optional[Union[ndarray, DataFrame, List[str]]] = None, data: Optional[DataFrame] = None, alpha: float = 0.05, c: float = 1.1, add_const: bool = True, robust: bool = True, ensure_min_instruments: int = 1) -> PostLassoResult

Post-Lasso 2SLS with rigorous, data-driven penalty.

.. note:: This is StatsPAI's original reconstruction of the BCH (2012) pipeline. For numerical agreement with R's hdm package — including the canonical eminent-domain application and selection on both instruments and controls — use :func:statspai.rlasso_iv (a faithful, parity-tested port of hdm::rlassoIV). The two differ in penalty construction (asymptotic √{2n log(2p/α)} here vs. hdm's exact 2c√n·Φ⁻¹(1−γ/2p)) and in control selection, so their numbers are not interchangeable.

Recipe (BCH 2012 §3):

Partial out controls exog from y, endog, and every column of instruments.
Select relevant instruments by LASSO with rigorous penalty λ = 2 c √{2 n log(2 p / α)} and iterated heteroskedastic loadings (Algorithm 1).
If fewer than ensure_min_instruments survive, add the instruments with largest univariate first-stage t-stat.
Re-estimate the first stage by OLS on the selected subset — post-Lasso removes shrinkage bias.
Plug the post-Lasso fitted value into 2SLS for β̂.
Heteroskedasticity-robust (HC1) SEs by default.

Parameters:

Name	Type	Description	Default
`y`	`outcome and the single endogenous regressor.`		required
`endog`	`outcome and the single endogenous regressor.`		required
`instruments`	`(many) candidate instruments — p may exceed n (the`	method shines precisely there).	required
`exog`	`controls (default: intercept only).`		`None`
`data`	`DataFrame for string-name inputs.`		`None`
`alpha`	`penalty confidence level.`		`0.05`
`c`	`slack constant (BCH default 1.1).`		`1.1`
`add_const`	`whether to include a constant in the exogenous block.`		`True`
`robust`	`use HC1 standard errors (default True).`		`True`
`ensure_min_instruments`	`if LASSO selects 0, force this many strong ones in.`		`1`

Returns:

Type	Description
`PostLassoResult`

bch_lambda ¶

bch_lambda(n: int, p: int, alpha: float = 0.05, c: float = 1.1) -> float

BCH (2012) rigorous penalty level: λ = 2 · c · √{2 n · log(2 p / α)}.

Parameters:

Name	Type	Description	Default
`n`	`int`		required
`p`	`int`	Number of candidate instruments.	required
`alpha`	`float`	Target confidence level (BCH recommend 0.05 / log(n)).	`0.05`
`c`	`float`	Slack constant (BCH 2012 recommend 1.1).	`1.1`

Returns:

Type	Description
`float`

bch_selected ¶

bch_selected(endog: ndarray, instruments: ndarray, exog: Optional[ndarray] = None, alpha: float = 0.05, c: float = 1.1, max_refit: int = 15) -> Tuple[List[int], ndarray, float]

BCH first-stage instrument selection.

Parameters:

Name	Type	Default
`endog`	`(n,) array — single endogenous regressor (partialled out of controls).`	required
`instruments`	`(n, p) array of candidate instruments (partialled out of controls).`	required
`exog`	`unused; kept for API symmetry.`	`None`
`alpha`	`penalty-rule parameters.`	`0.05`
`c`	`penalty-rule parameters.`	`0.05`
`max_refit`	`refit iterations for penalty loadings.`	`15`

Returns:

Type	Description
`(sel_indices, psi_final, lam)`

jive_mw ¶

jive_mw(data: DataFrame, y: str, endog: str, instruments: Sequence[str], exog: Optional[Sequence[str]] = None, alpha: float = 0.05) -> ManyWeakIVResult

Jackknife Instrumental Variables Estimator (AIK 1999; Phillips-Hale 2018 variant). Less biased than 2SLS when K/n is not small.

Returns:

Type	Description
`ManyWeakIVResult`

many_weak_ar ¶

many_weak_ar(data: DataFrame, y: str, endog: str, instruments: Sequence[str], exog: Optional[Sequence[str]] = None, beta_grid: Optional[Sequence[float]] = None, alpha: float = 0.05) -> ManyWeakIVResult

Jackknife-Anderson-Rubin confidence set by grid inversion — valid under many-weak-IV (Mikusheva-Sun 2024, simplified).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`endog`	`str`		required
`instruments`	`sequence of str`		required
`exog`	`sequence of str`		`None`
`beta_grid`	`sequence of float`	Candidate beta values to invert.	`None`
`alpha`	`float`		`0.05`

continuous_iv_late ¶

continuous_iv_late(data: DataFrame, y: str, treat: str, instrument: str, n_quantiles: int = 4, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> ContinuousLATEResult

LATE with a continuous instrument via quantile-bin Wald estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`treat`	`str`		required
`instrument`	`str`		required
`n_quantiles`	`int`	Number of quantile bins of the instrument; LATE is averaged across bins weighted by complier share.	`4`
`alpha`	`float`		`0.05`
`n_boot`	`int`		`200`
`seed`	`int`		`0`

Returns:

Type	Description
`ContinuousLATEResult`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> z = rng.normal(size=n)
>>> v = rng.normal(size=n)
>>> d = (0.9 * z + v > 0).astype(float)
>>> y = 1.2 * d + 0.5 * v + rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'd': d, 'z': z})
>>> res = sp.continuous_iv_late(df, y='y', treat='d',
...                             instrument='z')
>>> round(res.estimate, 2)
1.81
>>> round(res.complier_share, 2)  # maximal complier class
0.35

iv_compare ¶

iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs: Any) -> DataFrame

Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.

Parameters:

Name	Type	Description	Default
`formula`	`str`	IV formula (`"y ~ (d ~ z) + x"`).	`None`
`data`	`DataFrame`		`None`
`methods`	`sequence of str`	Methods to dispatch through :func:`sp.iv` (and therefore the unified IV dispatcher).	`('2sls', 'liml', 'fuller', 'jive')`
`alpha`	`float`	For Wald-CI columns.	`0.05`

Returns:

Type	Description
`DataFrame`	columns `method`, `estimate`, `SE`, `CI lower`, `CI upper`, `first_stage_F`, `effective_F`.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> n = 500
>>> x = rng.normal(size=n)
>>> z1 = rng.normal(size=n)
>>> z2 = rng.normal(size=n)
>>> u = rng.normal(size=n)
>>> d = 0.6 * z1 + 0.4 * z2 + 0.3 * x + u + rng.normal(size=n)
>>> y = 1.5 * d + 0.5 * x + u + rng.normal(size=n)
>>> df = pd.DataFrame(
...     {'y': y, 'd': d, 'z1': z1, 'z2': z2, 'x': x})
>>> tab = sp.iv_compare('y ~ (d ~ z1 + z2) + x', data=df)
>>> list(tab['method'])
['2sls', 'liml', 'fuller', 'jive']
>>> [round(b, 2) for b in tab['estimate']]  # true beta 1.5
[1.41, 1.41, 1.42, 1.33]

fit ¶

fit(formula: Optional[str] = None, data: Any = None, *, method: str = '2sls', augmented_diagnostics: bool = True, **kwargs: Any) -> Any

Alias for :func:_dispatch. See sp.iv.__doc__ for usage.

statspai.iv¶

iv ¶

IVRegression ¶

first_stage property ¶

sargan_test property ¶

hausman_test property ¶

fit ¶

predict ¶

KleibergenPaapResult dataclass ¶

SandersonWindmeijerResult dataclass ¶

CLRResult dataclass ¶

MTEResult dataclass ¶

WeakIVConfidenceSet dataclass ¶

as_intervals ¶

PostLassoResult dataclass ¶

BayesianIVResult dataclass ¶

NPIVResult dataclass ¶

KernelIVResult dataclass ¶

ContinuousLATEResult dataclass ¶

IVDMLResult dataclass ¶

IVDiagResult dataclass ¶

to_frame ¶

to_dict ¶

to_latex ¶

to_excel ¶

to_word ¶

plot ¶

iv ¶

ivreg ¶

liml ¶

jive_legacy ¶

lasso_iv ¶

anderson_rubin_test ¶

effective_f_test ¶

tF_critical_value ¶

kleibergen_paap_rk ¶

sanderson_windmeijer ¶

conditional_lr_test ¶

plausibly_exogenous_uci ¶

plausibly_exogenous_ltz ¶

jive1 ¶

ujive ¶

ijive ¶

rjive ¶

ivmte_bounds ¶

anderson_rubin_ci ¶

conditional_lr_ci ¶

k_test_ci ¶

bch_post_lasso_iv ¶

bch_lambda ¶

bch_selected ¶

jive_mw ¶

many_weak_ar ¶

continuous_iv_late ¶

iv_compare ¶

fit ¶

`statspai.iv`¶

first_stage `property` ¶

sargan_test `property` ¶

hausman_test `property` ¶

KleibergenPaapResult `dataclass` ¶

SandersonWindmeijerResult `dataclass` ¶

CLRResult `dataclass` ¶

MTEResult `dataclass` ¶

WeakIVConfidenceSet `dataclass` ¶

PostLassoResult `dataclass` ¶

BayesianIVResult `dataclass` ¶

NPIVResult `dataclass` ¶

KernelIVResult `dataclass` ¶

ContinuousLATEResult `dataclass` ¶

IVDMLResult `dataclass` ¶

IVDiagResult `dataclass` ¶