Skip to content

Instrumental variables

statspai.iv — the unified IV namespace: a fixest-style formula front end (sp.iv / sp.ivreg), a modern single-endogenous reporting bundle (sp.iv_diag), a k-class / JIVE estimator panel (sp.iv_compare), and two frontier estimators (sp.kernel_iv, sp.continuous_iv_late).

See also the decision guide: Choosing an IV estimator, and the exhaustive auto-generated listing under Full API reference → iv.

The formula front end — sp.iv / sp.ivreg

IV models use the fixest convention: endogenous regressors and their instruments go in parentheses, (endog ~ instruments). Everything outside the parentheses is exogenous.

import statspai as sp

data = sp.datasets.card_1995()

# Card (1995) returns to schooling: educ instrumented by college proximity.
r = sp.ivreg(
    "lwage ~ (educ ~ nearc4) + exper + expersq + black + south + smsa",
    data=data,
)
print(r.summary())

# Multiple excluded instruments + an exogenous control:
r = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df)

# IV with high-dimensional fixed effects absorbed (reghdfe / ivreghdfe style):
r = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df, absorb="firm + year")

sp.iv and sp.ivreg are the same estimator; ivreg is kept as the Stata-flavoured alias. Two-stage least squares is the default.

Modern reporting bundle — sp.iv_diag

For the common single-endogenous case, sp.iv_diag returns the full post-2022 reporting standard in one object: the point estimate, first-stage strength (effective F), weak-instrument-robust confidence intervals (Anderson–Rubin and, optionally, conditional-likelihood-ratio and k-class inversions), and bootstrap alternatives.

res = sp.iv_diag(
    data, y="lwage", endog="educ", instruments="nearc4",
    exog=["exper", "expersq", "black", "south", "smsa"],
    cluster="region",
    include_clr_ci=True,   # conditional likelihood-ratio CI
    include_k_ci=True,     # k-class CI
)
res.summary()

Always check the first stage

A weak first stage biases 2SLS toward OLS and breaks conventional standard errors. iv_diag reports the effective F-statistic and weak-IV-robust intervals precisely so you do not have to interpret a 2SLS point estimate in the dark. As a rule of thumb, treat a first stage below the usual thresholds as a signal to report the Anderson–Rubin interval rather than the 2SLS CI.

Comparing estimators — sp.iv_compare

With several (possibly many) instruments, 2SLS is biased in finite samples. sp.iv_compare runs a panel of k-class and jackknife estimators side by side so the many-instrument bias is visible:

tbl = sp.iv_compare(
    "y ~ (d ~ z1 + z2 + z3) + x1", data=df,
    methods=("2sls", "liml", "fuller", "jive"),
)
tbl   # one row per estimator: coefficient, SE, CI
  • 2SLS — the workhorse; minimum-bias only with strong instruments.
  • LIML — limited-information ML; better-centred under many instruments.
  • Fuller — a finite-sample-adjusted LIML with finite moments.
  • JIVE — jackknife IV; removes the own-observation bias of 2SLS.

Frontier estimators

# Nonparametric IV via kernel ridge in RKHS (Y on D instrumented by Z):
k = sp.kernel_iv(data, y="y", treat="d", instrument="z")

# LATE with a continuous instrument (quantile-bin Wald estimator):
late = sp.continuous_iv_late(data, y="y", treat="d", instrument="z")

Identifying assumptions (agent-native)

Every IV estimator ships an agent card you can inspect before estimating:

sp.agent_card("kernel_iv")["assumptions"]
# instrument relevance · exclusion restriction · exogeneity · (LATE) monotonicity

The core requirements are relevance (a non-zero first stage), exclusion (the instrument affects the outcome only through the treatment), exogeneity / independence of the instrument, and — for a LATE interpretation — monotonicity (no defiers).

Method-level API

The public top-level entry points used in examples and agent workflows are documented here explicitly; the exhaustive namespace listing remains available under Full API reference -> iv.

sp.iv(...)

iv

iv(formula: Optional[str] = None, data: Optional[DataFrame] = None, method: str = '2sls', robust: str = 'nonrobust', cluster: Optional[str] = None, fuller_alpha: float = 1.0, absorb: Optional[Union[str, List[str]]] = None, **kwargs) -> EconometricResults

Unified instrumental variables estimation.

Supports multiple methods through the method parameter:

  • '2sls' — Two-Stage Least Squares (default).
  • 'liml' — Limited Information Maximum Likelihood. Better finite- sample properties under weak instruments; approximately median-unbiased.
  • 'fuller' — Fuller (1977) modified LIML with finite-sample bias correction. fuller_alpha=1 removes first-order bias; fuller_alpha=4 minimises MSE under normality.
  • 'gmm' — Efficient two-step GMM. More efficient than 2SLS under heteroskedasticity when over-identified.
  • 'jive' — Jackknife IV (Angrist, Imbens & Krueger 1999). Reduces many-instrument bias by using leave-one-out fitted values.

For DeepIV (neural network IV) use sp.deepiv(). For Bartik shift-share IV use sp.bartik().

Parameters:

Name Type Description Default
formula str

IV formula: "y ~ (endog ~ z1 + z2) + exog1 + exog2"

  • Variables in parentheses before ~: endogenous regressors
  • Variables in parentheses after ~: excluded instruments
  • Variables outside parentheses: exogenous controls
None
data DataFrame

Data containing all variables.

None
method str

Estimation method: '2sls', 'liml', 'fuller', 'gmm', 'jive'.

'2sls'
robust str

Standard-error type ('nonrobust', 'hc0', 'hc1', 'hc2', 'hc3').

'nonrobust'
cluster str

Variable name for clustered standard errors.

None
fuller_alpha float

Fuller modification constant (only used when method='fuller').

1.0
absorb str or list of str

Column name(s) of high-dimensional fixed effects to partial out before fitting (e.g. absorb="firm" or absorb=["firm", "year"]). Routes y, exogenous controls, endogenous regressors, and instruments through :func:sp.fast.demean (Rust HDFE backend) and drops singletons, then runs 2SLS in residualised space. The intercept is dropped because the absorbed FEs span the constant. The residual DOF is adjusted by sum(G_k - 1), mirroring :func:sp.fast.feols(absorb=...). Currently only wired for method='2sls'; LIML / Fuller / GMM / JIVE raise NotImplementedError (Phase 3b).

None

Returns:

Type Description
EconometricResults

Fitted model results with integrated IV diagnostics:

  • First-stage F-statistics and partial R²
  • Sargan/Hansen J overidentification test (when over-identified)
  • Durbin-Wu-Hausman endogeneity test
  • Weak instrument warnings

Examples:

>>> # Standard 2SLS
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df)
>>> print(result.summary())
>>> # LIML (better with weak instruments)
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df, method='liml')
>>> # Fuller with bias correction
>>> result = sp.iv("wage ~ (education ~ parent_edu) + experience",
...               data=df, method='fuller', fuller_alpha=1)
>>> # Efficient GMM with robust SEs
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
...               data=df, method='gmm', robust='hc1')
>>> # JIVE (many instruments)
>>> result = sp.iv("wage ~ (education ~ z1 + z2 + z3 + z4 + z5) + experience",
...               data=df, method='jive')
Notes

Which method to choose?

  • Start with '2sls'. If first-stage F < 10, switch to 'liml' or 'fuller'.
  • If you have many instruments (m >> k₂) and worry about bias, use 'jive' or 'liml'.
  • If over-identified and you suspect heteroskedasticity, use 'gmm' for efficiency.
  • For nonparametric / ML-based IV, see sp.deepiv().

Diagnostics included automatically:

  • First-stage F < 10 triggers a weak-instrument warning.
  • Sargan test (2SLS/LIML/Fuller/JIVE) or Hansen J (GMM) for overidentification.
  • Durbin-Wu-Hausman test for endogeneity.
References
  • Wooldridge (2010), Ch. 5-8.
  • Stock & Yogo (2005), for weak-instrument critical values.
  • Fuller (1977), for the finite-sample correction.
  • Hansen (1982), for GMM.
  • Angrist, Imbens & Krueger (1999), for JIVE.

sp.ivreg(...)

ivreg

ivreg(formula: str, data: DataFrame, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> EconometricResults

Instrumental variables regression (2SLS).

.. deprecated:: Use sp.iv(formula, data, method='2sls') instead. ivreg is kept for backward compatibility.

Parameters:

Name Type Description Default
formula str

IV formula: "y ~ (endog ~ z1 + z2) + exog1 + exog2"

required
data DataFrame
required
robust str
'nonrobust'
cluster str
None

Returns:

Type Description
EconometricResults

sp.kernel_iv(...)

kernel_iv

Kernel IV regression with uniform inference (Lob et al. 2025, arXiv 2511.21603).

Estimates the structural function h*(D) = E[Y | do(D)] non- parametrically using kernel ridge regression in a reproducing kernel Hilbert space, with a uniform confidence band (vs. pointwise CIs).

KernelIVResult dataclass

Output of kernel IV regression.

kernel_iv

kernel_iv(data: DataFrame, y: str, treat: str, instrument: str, grid: Optional[ndarray] = None, bandwidth: Optional[float] = None, ridge: float = 0.001, alpha: float = 0.05, n_boot: int = 100, seed: int = 0) -> KernelIVResult

Kernel IV regression of Y on D instrumented by Z.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treat str
required
instrument str
required
grid array

Grid of treatment values to evaluate h(d); defaults to the empirical 5–95th percentile range with 30 points.

None
bandwidth float
None
ridge float

Tikhonov regularisation.

1e-3
alpha float
0.05
n_boot int

Bootstrap reps for the uniform CI.

100
seed int
0

Returns:

Type Description
KernelIVResult

sp.continuous_iv_late(...)

continuous_iv_late

continuous_iv_late(data: DataFrame, y: str, treat: str, instrument: str, n_quantiles: int = 4, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> ContinuousLATEResult

LATE with a continuous instrument via quantile-bin Wald estimator.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
treat str
required
instrument str
required
n_quantiles int

Number of quantile bins of the instrument; LATE is averaged across bins weighted by complier share.

4
alpha float
0.05
n_boot int
200
seed int
0

Returns:

Type Description
ContinuousLATEResult

sp.iv_diag(...)

iv_diag

Modern IV reporting bundle — sp.iv.iv_diag.

This is the StatsPAI port of the post-2022 reporting standard for linear IV in applied work, mirroring R ivDiag (Lal et al. 2024) and going a step further by integrating the weak-IV-robust confidence sets (:mod:statspai.iv.weak_iv_ci) and the Conley–Hansen–Rossi sensitivity toolkit (:mod:statspai.iv.plausibly_exogenous) into a single object.

What it returns

:class:IVDiagResult — a structured bundle with:

  • 2SLS (and optional OLS) point estimate with analytic + bootstrap standard errors (pairs and / or wild bootstrap; cluster-aware).
  • Effective F (Olea–Pflueger 2013) and tF-corrected critical value (Lee–McCrary–Moreira–Porter 2022, AER 112, 3260–3290).
  • Anderson–Rubin (1949) F at H0 and the AR confidence set; optional CLR / K confidence sets via :mod:statspai.iv.weak_iv_ci.
  • Kleibergen–Paap (2006) rk LM and Wald F.
  • Conley–Hansen–Rossi (2012) plausibly-exogenous LTZ sensitivity.
  • A reading of the "TSLS-as-LATE" caveat (Blandhol–Bonney–Mogstad–Torgovitsky 2022/2025; Słoczyński 2024) when covariates are present and the endogenous regressor is binary.

The result object exposes .summary(), .to_frame(), .to_dict(), .to_latex(), .to_excel(), .to_word(), and .plot() for one-call report output.

References

Anderson, T.W. and Rubin, H. (1949). [@anderson1949estimation]

Olea, J.L.M. and Pflueger, C. (2013). [@olea2013robust]

Conley, T.G., Hansen, C.B. and Rossi, P.E. (2012). [@conley2012plausibly]

Lee, D.S., McCrary, J., Moreira, M.J. and Porter, J. (2022). AER 112, 3260–3290. [@lee2022valid]

Young, A. (2022). EER 147, 104112. [@young2022consistency]

Keane, M.P. and Neal, T. (2024). Annual Review of Economics 16, 185–212. [@keane2024practical]

Lal, A., Lockhart, M., Xu, Y. and Zu, Z. (2024). Political Analysis 32(4), 521–540. [@lal2024much]

Blandhol, C., Bonney, J., Mogstad, M. and Torgovitsky, A. (2022/2025). NBER WP 29709. [@blandhol2025tsls]

Słoczyński, T. (2024). arXiv:2011.06695. [@sloczynski2024should]

IVDiagResult dataclass

Container for :func:iv_diag output.

Attributes:

Name Type Description
n int

Sample size after listwise deletion.

n_endog, n_instruments, n_exog int

Counts of endogenous regressors, excluded instruments, included exogenous controls (excluding intercept).

endog, instruments, exog list[str]

Variable names.

beta_2sls, se_2sls, t_2sls, p_2sls float

2SLS point estimate, analytic SE, t-ratio, p-value (single endogenous regressor).

ci_analytic_2sls tuple[float, float]

Analytic Wald CI based on se_2sls (level = 1 - alpha).

beta_ols, se_ols, ci_ols (float, float, tuple[float, float])

OLS counterpart (informative comparator; not causal).

first_stage_F float

Classical first-stage F.

effective_F float

Olea–Pflueger (2013) robust effective F.

tF_critical_value float

Lee et al. (2022, AER) tF adjusted 5 % critical value at the observed first-stage F. inf if F < 3.84.

ar_stat, ar_pvalue float

Anderson–Rubin (1949) F-statistic and p-value at h0.

ar_ci tuple[float, float]

AR confidence set (grid-inverted). ±inf flags a one-/two- sided unbounded set.

clr_ci, k_ci tuple[float, float] | None

Moreira (2003) CLR and Kleibergen (2002) K confidence sets. None if not requested.

kp_rk_lm, kp_rk_lm_pvalue, kp_rk_f float | None

Kleibergen–Paap (2006) rk LM, p-value, Wald F.

bootstrap_ci_analytic, bootstrap_ci_pairs, bootstrap_ci_wild

Pair-/wild-bootstrap CI tuples (or None).

bootstrap_se_pairs, bootstrap_se_wild float | None

Bootstrap standard errors (matching CI sources).

bootstrap_n int

Number of bootstrap replications actually used.

ltz_ci, ltz_warning (tuple[float, float] | None, str | None)

Conley–Hansen–Rossi LTZ sensitivity CI under gamma_var = (gamma_sd) ** 2.

tF_adjusted_ci tuple[float, float] | None

beta ± tF_critical_value × se. Falls back to ±inf when F < 3.84 (per LMMP 2022).

tsls_late_caveat str | None

BBMT (2022/2025) / Słoczyński (2024) caveat text whenever the specification is at risk of negative-weight LATE pathologies.

diagnostics dict

All numeric outputs in a flat dict (also returned by :meth:to_dict). Useful for downstream agent workflows.

raw dict

Internal scratchpad with arrays (residuals, fitted values).

to_frame

to_frame() -> DataFrame

Return a tidy summary table — one row per estimator/metric.

to_dict

to_dict() -> Dict[str, Any]

Return all numeric diagnostics as a flat dict (jsonable).

to_latex

to_latex(caption: Optional[str] = None, label: Optional[str] = None, float_format: str = '%.4f') -> str

Render the summary table as a LaTeX tabular string.

to_excel

to_excel(path: str) -> None

Write the summary table to path (one sheet).

to_word

to_word(path: str, title: Optional[str] = None) -> None

Write the summary table to a .docx file (requires python-docx).

plot

plot(kind: str = 'diagnostic', **kwargs)

Dispatch to :mod:statspai.iv.plot plotting helpers.

Parameters:

Name Type Description Default
kind ('diagnostic', 'forest', 'weak_iv', 'first_stage')

Which plot to render. 'diagnostic' returns the 2x2 panel.

'diagnostic'

iv_diag

iv_diag(data: DataFrame, y: str, endog: str, instruments: Union[str, Sequence[str]], exog: Optional[Union[str, Sequence[str]]] = None, *, cluster: Optional[Union[str, ndarray]] = None, h0: float = 0.0, alpha: float = 0.05, vcov: str = 'HC1', n_boot: int = 1000, boot_methods: Sequence[str] = ('pairs',), include_clr_ci: bool = False, include_k_ci: bool = False, grid_size: int = 401, ltz_gamma_sd: Optional[float] = None, random_state: Optional[int] = None) -> IVDiagResult

Modern IV reporting bundle (single-endogenous, post-2022 standard).

Returns a single :class:IVDiagResult containing:

  • 2SLS point estimate, analytic + bootstrap SEs, and Wald CI;
  • Olea–Pflueger effective F + Lee–McCrary–Moreira–Porter (2022) tF adjusted critical value and tF-corrected CI;
  • Anderson–Rubin (1949) F + AR confidence set; optional CLR / K confidence sets (Moreira 2003; Kleibergen 2002, 2005);
  • Kleibergen–Paap (2006) rk LM and Wald F;
  • Conley–Hansen–Rossi (2012) plausibly-exogenous LTZ sensitivity CI when ltz_gamma_sd is supplied;
  • the BBMT (2022/2025) / Słoczyński (2024) "TSLS-vs-LATE" caveat when covariates are present and the endogenous regressor is binary;
  • OLS comparator (informative; not causal).

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
endog str

Outcome and (single) endogenous regressor column names.

required
instruments str or list[str]

Excluded instruments.

required
exog str or list[str]

Included exogenous controls (intercept added automatically).

None
cluster str or array - like

Cluster variable for cluster-robust SEs and cluster bootstrap.

None
h0 float

Null hypothesis value used by AR / CLR / K test.

0.0
alpha float

Significance level for all CIs.

0.05
vcov ('HC0', 'HC1', 'classic')

Heteroskedasticity-robust covariance type for the analytic SE and the Olea–Pflueger effective F.

'HC0'
n_boot int

Bootstrap replications. Set to 0 to skip bootstrap.

1000
boot_methods tuple of {'pairs', 'wild'}

Which bootstraps to run. Both can be requested.

('pairs',)
include_clr_ci bool

Optionally invert CLR / K tests on a grid for the matching confidence set (slower; CLR uses Monte-Carlo critical values).

False
include_k_ci bool

Optionally invert CLR / K tests on a grid for the matching confidence set (slower; CLR uses Monte-Carlo critical values).

False
grid_size int

Resolution of the AR / CLR / K grid inversion.

401
ltz_gamma_sd float

If supplied, run Conley–Hansen–Rossi LTZ sensitivity with prior γ ~ N(0, ltz_gamma_sd**2). Otherwise, LTZ is skipped.

None
random_state int

Seed for bootstrap and CLR Monte-Carlo.

None

Returns:

Type Description
IVDiagResult

Examples:

>>> import statspai as sp
>>> r = sp.iv.iv_diag(df, y='wage', endog='educ',
...                   instruments=['nearc4', 'nearc2'],
...                   exog=['exper', 'south'],
...                   n_boot=500, ltz_gamma_sd=0.05,
...                   random_state=42)
>>> print(r.summary())
>>> r.to_frame()                 # tidy table
>>> r.plot('diagnostic')         # 2x2 diagnostic panel
>>> r.to_latex(caption='IV diagnostic bundle')
Notes
  • The bundle is single-endogenous by design; for multiple endogenous regressors use :func:sp.iv.weakrobust plus :func:sp.iv.sanderson_windmeijer per regressor.
  • The tF-corrected CI follows LMMP 2022: it widens (vs. the Wald CI) by exactly the multiplicative ratio c(F) / 1.96 and equals ±∞ when F < 3.84.
  • The bootstrap is implemented as a pairs (or wild Rademacher) bootstrap; see Young (2022, EER 147, 104112) for why analytic SEs can be unreliable. Cluster-bootstrap is used automatically when cluster is supplied.

iv_compare

iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs) -> DataFrame

Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.

Parameters:

Name Type Description Default
formula str

IV formula ("y ~ (d ~ z) + x").

None
data DataFrame
None
methods sequence of str

Methods to dispatch through :func:sp.iv (and therefore the unified IV dispatcher).

('2sls', 'liml', 'fuller', 'jive')
alpha float

For Wald-CI columns.

0.05

Returns:

Type Description
DataFrame

columns method, estimate, SE, CI lower, CI upper, first_stage_F, effective_F.

sp.iv_compare(...)

iv_compare

iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs) -> DataFrame

Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.

Parameters:

Name Type Description Default
formula str

IV formula ("y ~ (d ~ z) + x").

None
data DataFrame
None
methods sequence of str

Methods to dispatch through :func:sp.iv (and therefore the unified IV dispatcher).

('2sls', 'liml', 'fuller', 'jive')
alpha float

For Wald-CI columns.

0.05

Returns:

Type Description
DataFrame

columns method, estimate, SE, CI lower, CI upper, first_stage_F, effective_F.