Instrumental variables¶
statspai.iv — the unified IV namespace: a fixest-style formula front end
(sp.iv / sp.ivreg), a modern single-endogenous reporting bundle
(sp.iv_diag), a k-class / JIVE estimator panel (sp.iv_compare), and two
frontier estimators (sp.kernel_iv, sp.continuous_iv_late).
See also the decision guide: Choosing an IV estimator, and the exhaustive auto-generated listing under Full API reference → iv.
The formula front end — sp.iv / sp.ivreg¶
IV models use the fixest convention: endogenous regressors and their
instruments go in parentheses, (endog ~ instruments). Everything outside the
parentheses is exogenous.
import statspai as sp
data = sp.datasets.card_1995()
# Card (1995) returns to schooling: educ instrumented by college proximity.
r = sp.ivreg(
"lwage ~ (educ ~ nearc4) + exper + expersq + black + south + smsa",
data=data,
)
print(r.summary())
# Multiple excluded instruments + an exogenous control:
r = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df)
# IV with high-dimensional fixed effects absorbed (reghdfe / ivreghdfe style):
r = sp.iv("y ~ (d ~ z1 + z2) + x1", data=df, absorb="firm + year")
sp.iv and sp.ivreg are the same estimator; ivreg is kept as the
Stata-flavoured alias. Two-stage least squares is the default.
Modern reporting bundle — sp.iv_diag¶
For the common single-endogenous case, sp.iv_diag returns the full
post-2022 reporting standard in one object: the point estimate, first-stage
strength (effective F), weak-instrument-robust confidence intervals
(Anderson–Rubin and, optionally, conditional-likelihood-ratio and k-class
inversions), and bootstrap alternatives.
res = sp.iv_diag(
data, y="lwage", endog="educ", instruments="nearc4",
exog=["exper", "expersq", "black", "south", "smsa"],
cluster="region",
include_clr_ci=True, # conditional likelihood-ratio CI
include_k_ci=True, # k-class CI
)
res.summary()
Always check the first stage
A weak first stage biases 2SLS toward OLS and breaks conventional
standard errors. iv_diag reports the effective F-statistic and
weak-IV-robust intervals precisely so you do not have to interpret a 2SLS
point estimate in the dark. As a rule of thumb, treat a first stage below
the usual thresholds as a signal to report the Anderson–Rubin interval
rather than the 2SLS CI.
Comparing estimators — sp.iv_compare¶
With several (possibly many) instruments, 2SLS is biased in finite samples.
sp.iv_compare runs a panel of k-class and jackknife estimators side by side
so the many-instrument bias is visible:
tbl = sp.iv_compare(
"y ~ (d ~ z1 + z2 + z3) + x1", data=df,
methods=("2sls", "liml", "fuller", "jive"),
)
tbl # one row per estimator: coefficient, SE, CI
- 2SLS — the workhorse; minimum-bias only with strong instruments.
- LIML — limited-information ML; better-centred under many instruments.
- Fuller — a finite-sample-adjusted LIML with finite moments.
- JIVE — jackknife IV; removes the own-observation bias of 2SLS.
Frontier estimators¶
# Nonparametric IV via kernel ridge in RKHS (Y on D instrumented by Z):
k = sp.kernel_iv(data, y="y", treat="d", instrument="z")
# LATE with a continuous instrument (quantile-bin Wald estimator):
late = sp.continuous_iv_late(data, y="y", treat="d", instrument="z")
Identifying assumptions (agent-native)¶
Every IV estimator ships an agent card you can inspect before estimating:
sp.agent_card("kernel_iv")["assumptions"]
# instrument relevance · exclusion restriction · exogeneity · (LATE) monotonicity
The core requirements are relevance (a non-zero first stage), exclusion (the instrument affects the outcome only through the treatment), exogeneity / independence of the instrument, and — for a LATE interpretation — monotonicity (no defiers).
Method-level API¶
The public top-level entry points used in examples and agent workflows are documented here explicitly; the exhaustive namespace listing remains available under Full API reference -> iv.
sp.iv(...)¶
iv ¶
iv(formula: Optional[str] = None, data: Optional[DataFrame] = None, method: str = '2sls', robust: str = 'nonrobust', cluster: Optional[str] = None, fuller_alpha: float = 1.0, absorb: Optional[Union[str, List[str]]] = None, **kwargs) -> EconometricResults
Unified instrumental variables estimation.
Supports multiple methods through the method parameter:
'2sls'— Two-Stage Least Squares (default).'liml'— Limited Information Maximum Likelihood. Better finite- sample properties under weak instruments; approximately median-unbiased.'fuller'— Fuller (1977) modified LIML with finite-sample bias correction.fuller_alpha=1removes first-order bias;fuller_alpha=4minimises MSE under normality.'gmm'— Efficient two-step GMM. More efficient than 2SLS under heteroskedasticity when over-identified.'jive'— Jackknife IV (Angrist, Imbens & Krueger 1999). Reduces many-instrument bias by using leave-one-out fitted values.
For DeepIV (neural network IV) use sp.deepiv().
For Bartik shift-share IV use sp.bartik().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula:
|
None
|
data
|
DataFrame
|
Data containing all variables. |
None
|
method
|
str
|
Estimation method: '2sls', 'liml', 'fuller', 'gmm', 'jive'. |
'2sls'
|
robust
|
str
|
Standard-error type ('nonrobust', 'hc0', 'hc1', 'hc2', 'hc3'). |
'nonrobust'
|
cluster
|
str
|
Variable name for clustered standard errors. |
None
|
fuller_alpha
|
float
|
Fuller modification constant (only used when |
1.0
|
absorb
|
str or list of str
|
Column name(s) of high-dimensional fixed effects to partial out
before fitting (e.g. |
None
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
Fitted model results with integrated IV diagnostics:
|
Examples:
>>> # Standard 2SLS
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df)
>>> print(result.summary())
>>> # LIML (better with weak instruments)
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df, method='liml')
>>> # Fuller with bias correction
>>> result = sp.iv("wage ~ (education ~ parent_edu) + experience",
... data=df, method='fuller', fuller_alpha=1)
>>> # Efficient GMM with robust SEs
>>> result = sp.iv("wage ~ (education ~ parent_edu + distance) + experience",
... data=df, method='gmm', robust='hc1')
>>> # JIVE (many instruments)
>>> result = sp.iv("wage ~ (education ~ z1 + z2 + z3 + z4 + z5) + experience",
... data=df, method='jive')
Notes
Which method to choose?
- Start with
'2sls'. If first-stage F < 10, switch to'liml'or'fuller'. - If you have many instruments (m >> k₂) and worry about bias, use
'jive'or'liml'. - If over-identified and you suspect heteroskedasticity, use
'gmm'for efficiency. - For nonparametric / ML-based IV, see
sp.deepiv().
Diagnostics included automatically:
- First-stage F < 10 triggers a weak-instrument warning.
- Sargan test (2SLS/LIML/Fuller/JIVE) or Hansen J (GMM) for overidentification.
- Durbin-Wu-Hausman test for endogeneity.
References
- Wooldridge (2010), Ch. 5-8.
- Stock & Yogo (2005), for weak-instrument critical values.
- Fuller (1977), for the finite-sample correction.
- Hansen (1982), for GMM.
- Angrist, Imbens & Krueger (1999), for JIVE.
sp.ivreg(...)¶
ivreg ¶
ivreg(formula: str, data: DataFrame, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> EconometricResults
Instrumental variables regression (2SLS).
.. deprecated::
Use sp.iv(formula, data, method='2sls') instead.
ivreg is kept for backward compatibility.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula: |
required |
data
|
DataFrame
|
|
required |
robust
|
str
|
|
'nonrobust'
|
cluster
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
sp.kernel_iv(...)¶
kernel_iv ¶
Kernel IV regression with uniform inference (Lob et al. 2025, arXiv 2511.21603).
Estimates the structural function h*(D) = E[Y | do(D)] non- parametrically using kernel ridge regression in a reproducing kernel Hilbert space, with a uniform confidence band (vs. pointwise CIs).
KernelIVResult
dataclass
¶
Output of kernel IV regression.
kernel_iv ¶
kernel_iv(data: DataFrame, y: str, treat: str, instrument: str, grid: Optional[ndarray] = None, bandwidth: Optional[float] = None, ridge: float = 0.001, alpha: float = 0.05, n_boot: int = 100, seed: int = 0) -> KernelIVResult
Kernel IV regression of Y on D instrumented by Z.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
treat
|
str
|
|
required |
instrument
|
str
|
|
required |
grid
|
array
|
Grid of treatment values to evaluate h(d); defaults to the empirical 5–95th percentile range with 30 points. |
None
|
bandwidth
|
float
|
|
None
|
ridge
|
float
|
Tikhonov regularisation. |
1e-3
|
alpha
|
float
|
|
0.05
|
n_boot
|
int
|
Bootstrap reps for the uniform CI. |
100
|
seed
|
int
|
|
0
|
Returns:
| Type | Description |
|---|---|
KernelIVResult
|
|
sp.continuous_iv_late(...)¶
continuous_iv_late ¶
continuous_iv_late(data: DataFrame, y: str, treat: str, instrument: str, n_quantiles: int = 4, alpha: float = 0.05, n_boot: int = 200, seed: int = 0) -> ContinuousLATEResult
LATE with a continuous instrument via quantile-bin Wald estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
treat
|
str
|
|
required |
instrument
|
str
|
|
required |
n_quantiles
|
int
|
Number of quantile bins of the instrument; LATE is averaged across bins weighted by complier share. |
4
|
alpha
|
float
|
|
0.05
|
n_boot
|
int
|
|
200
|
seed
|
int
|
|
0
|
Returns:
| Type | Description |
|---|---|
ContinuousLATEResult
|
|
sp.iv_diag(...)¶
iv_diag ¶
Modern IV reporting bundle — sp.iv.iv_diag.
This is the StatsPAI port of the post-2022 reporting standard for linear
IV in applied work, mirroring R ivDiag (Lal et al. 2024) and going a
step further by integrating the weak-IV-robust confidence sets
(:mod:statspai.iv.weak_iv_ci) and the Conley–Hansen–Rossi sensitivity
toolkit (:mod:statspai.iv.plausibly_exogenous) into a single object.
What it returns
:class:IVDiagResult — a structured bundle with:
- 2SLS (and optional OLS) point estimate with analytic + bootstrap standard errors (pairs and / or wild bootstrap; cluster-aware).
- Effective F (Olea–Pflueger 2013) and tF-corrected critical value (Lee–McCrary–Moreira–Porter 2022, AER 112, 3260–3290).
- Anderson–Rubin (1949) F at H0 and the AR confidence set; optional CLR
/ K confidence sets via :mod:
statspai.iv.weak_iv_ci. - Kleibergen–Paap (2006) rk LM and Wald F.
- Conley–Hansen–Rossi (2012) plausibly-exogenous LTZ sensitivity.
- A reading of the "TSLS-as-LATE" caveat (Blandhol–Bonney–Mogstad–Torgovitsky 2022/2025; Słoczyński 2024) when covariates are present and the endogenous regressor is binary.
The result object exposes .summary(), .to_frame(),
.to_dict(), .to_latex(), .to_excel(), .to_word(), and
.plot() for one-call report output.
References
Anderson, T.W. and Rubin, H. (1949). [@anderson1949estimation]
Olea, J.L.M. and Pflueger, C. (2013). [@olea2013robust]
Conley, T.G., Hansen, C.B. and Rossi, P.E. (2012). [@conley2012plausibly]
Lee, D.S., McCrary, J., Moreira, M.J. and Porter, J. (2022). AER 112, 3260–3290. [@lee2022valid]
Young, A. (2022). EER 147, 104112. [@young2022consistency]
Keane, M.P. and Neal, T. (2024). Annual Review of Economics 16, 185–212. [@keane2024practical]
Lal, A., Lockhart, M., Xu, Y. and Zu, Z. (2024). Political Analysis 32(4), 521–540. [@lal2024much]
Blandhol, C., Bonney, J., Mogstad, M. and Torgovitsky, A. (2022/2025). NBER WP 29709. [@blandhol2025tsls]
Słoczyński, T. (2024). arXiv:2011.06695. [@sloczynski2024should]
IVDiagResult
dataclass
¶
Container for :func:iv_diag output.
Attributes:
| Name | Type | Description |
|---|---|---|
n |
int
|
Sample size after listwise deletion. |
n_endog, n_instruments, n_exog |
int
|
Counts of endogenous regressors, excluded instruments, included exogenous controls (excluding intercept). |
endog, instruments, exog |
list[str]
|
Variable names. |
beta_2sls, se_2sls, t_2sls, p_2sls |
float
|
2SLS point estimate, analytic SE, t-ratio, p-value (single endogenous regressor). |
ci_analytic_2sls |
tuple[float, float]
|
Analytic Wald CI based on |
beta_ols, se_ols, ci_ols |
(float, float, tuple[float, float])
|
OLS counterpart (informative comparator; not causal). |
first_stage_F |
float
|
Classical first-stage F. |
effective_F |
float
|
Olea–Pflueger (2013) robust effective F. |
tF_critical_value |
float
|
Lee et al. (2022, AER) tF adjusted 5 % critical value at the
observed first-stage F. |
ar_stat, ar_pvalue |
float
|
Anderson–Rubin (1949) F-statistic and p-value at |
ar_ci |
tuple[float, float]
|
AR confidence set (grid-inverted). |
clr_ci, k_ci |
tuple[float, float] | None
|
Moreira (2003) CLR and Kleibergen (2002) K confidence sets.
|
kp_rk_lm, kp_rk_lm_pvalue, kp_rk_f |
float | None
|
Kleibergen–Paap (2006) rk LM, p-value, Wald F. |
bootstrap_ci_analytic, bootstrap_ci_pairs, bootstrap_ci_wild |
Pair-/wild-bootstrap CI tuples (or |
|
bootstrap_se_pairs, bootstrap_se_wild |
float | None
|
Bootstrap standard errors (matching CI sources). |
bootstrap_n |
int
|
Number of bootstrap replications actually used. |
ltz_ci, ltz_warning |
(tuple[float, float] | None, str | None)
|
Conley–Hansen–Rossi LTZ sensitivity CI under
|
tF_adjusted_ci |
tuple[float, float] | None
|
|
tsls_late_caveat |
str | None
|
BBMT (2022/2025) / Słoczyński (2024) caveat text whenever the specification is at risk of negative-weight LATE pathologies. |
diagnostics |
dict
|
All numeric outputs in a flat dict (also returned by
:meth: |
raw |
dict
|
Internal scratchpad with arrays (residuals, fitted values). |
to_latex ¶
to_latex(caption: Optional[str] = None, label: Optional[str] = None, float_format: str = '%.4f') -> str
Render the summary table as a LaTeX tabular string.
to_word ¶
Write the summary table to a .docx file (requires python-docx).
plot ¶
Dispatch to :mod:statspai.iv.plot plotting helpers.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
kind
|
('diagnostic', 'forest', 'weak_iv', 'first_stage')
|
Which plot to render. |
'diagnostic'
|
iv_diag ¶
iv_diag(data: DataFrame, y: str, endog: str, instruments: Union[str, Sequence[str]], exog: Optional[Union[str, Sequence[str]]] = None, *, cluster: Optional[Union[str, ndarray]] = None, h0: float = 0.0, alpha: float = 0.05, vcov: str = 'HC1', n_boot: int = 1000, boot_methods: Sequence[str] = ('pairs',), include_clr_ci: bool = False, include_k_ci: bool = False, grid_size: int = 401, ltz_gamma_sd: Optional[float] = None, random_state: Optional[int] = None) -> IVDiagResult
Modern IV reporting bundle (single-endogenous, post-2022 standard).
Returns a single :class:IVDiagResult containing:
- 2SLS point estimate, analytic + bootstrap SEs, and Wald CI;
- Olea–Pflueger effective F + Lee–McCrary–Moreira–Porter (2022) tF adjusted critical value and tF-corrected CI;
- Anderson–Rubin (1949) F + AR confidence set; optional CLR / K confidence sets (Moreira 2003; Kleibergen 2002, 2005);
- Kleibergen–Paap (2006) rk LM and Wald F;
- Conley–Hansen–Rossi (2012) plausibly-exogenous LTZ sensitivity
CI when
ltz_gamma_sdis supplied; - the BBMT (2022/2025) / Słoczyński (2024) "TSLS-vs-LATE" caveat when covariates are present and the endogenous regressor is binary;
- OLS comparator (informative; not causal).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
endog
|
str
|
Outcome and (single) endogenous regressor column names. |
required |
instruments
|
str or list[str]
|
Excluded instruments. |
required |
exog
|
str or list[str]
|
Included exogenous controls (intercept added automatically). |
None
|
cluster
|
str or array - like
|
Cluster variable for cluster-robust SEs and cluster bootstrap. |
None
|
h0
|
float
|
Null hypothesis value used by AR / CLR / K test. |
0.0
|
alpha
|
float
|
Significance level for all CIs. |
0.05
|
vcov
|
('HC0', 'HC1', 'classic')
|
Heteroskedasticity-robust covariance type for the analytic SE and the Olea–Pflueger effective F. |
'HC0'
|
n_boot
|
int
|
Bootstrap replications. Set to |
1000
|
boot_methods
|
tuple of {'pairs', 'wild'}
|
Which bootstraps to run. Both can be requested. |
('pairs',)
|
include_clr_ci
|
bool
|
Optionally invert CLR / K tests on a grid for the matching confidence set (slower; CLR uses Monte-Carlo critical values). |
False
|
include_k_ci
|
bool
|
Optionally invert CLR / K tests on a grid for the matching confidence set (slower; CLR uses Monte-Carlo critical values). |
False
|
grid_size
|
int
|
Resolution of the AR / CLR / K grid inversion. |
401
|
ltz_gamma_sd
|
float
|
If supplied, run Conley–Hansen–Rossi LTZ sensitivity with prior
|
None
|
random_state
|
int
|
Seed for bootstrap and CLR Monte-Carlo. |
None
|
Returns:
| Type | Description |
|---|---|
IVDiagResult
|
|
Examples:
>>> import statspai as sp
>>> r = sp.iv.iv_diag(df, y='wage', endog='educ',
... instruments=['nearc4', 'nearc2'],
... exog=['exper', 'south'],
... n_boot=500, ltz_gamma_sd=0.05,
... random_state=42)
>>> print(r.summary())
>>> r.to_frame() # tidy table
>>> r.plot('diagnostic') # 2x2 diagnostic panel
>>> r.to_latex(caption='IV diagnostic bundle')
Notes
- The bundle is single-endogenous by design; for multiple
endogenous regressors use :func:
sp.iv.weakrobustplus :func:sp.iv.sanderson_windmeijerper regressor. - The tF-corrected CI follows LMMP 2022: it widens (vs. the Wald
CI) by exactly the multiplicative ratio
c(F) / 1.96and equals±∞whenF < 3.84. - The bootstrap is implemented as a pairs (or wild Rademacher)
bootstrap; see Young (2022, EER 147, 104112) for why analytic SEs
can be unreliable. Cluster-bootstrap is used automatically when
clusteris supplied.
iv_compare ¶
iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs) -> DataFrame
Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula ( |
None
|
data
|
DataFrame
|
|
None
|
methods
|
sequence of str
|
Methods to dispatch through :func: |
('2sls', 'liml', 'fuller', 'jive')
|
alpha
|
float
|
For Wald-CI columns. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
columns |
sp.iv_compare(...)¶
iv_compare ¶
iv_compare(formula: Optional[str] = None, data: Optional[DataFrame] = None, *, methods: Sequence[str] = ('2sls', 'liml', 'fuller', 'jive'), alpha: float = 0.05, endog_name: Optional[str] = None, **kwargs) -> DataFrame
Run several k-class / JIVE estimators and return a one-row-per-method comparison table — useful for quick sensitivity checks across estimators.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
IV formula ( |
None
|
data
|
DataFrame
|
|
None
|
methods
|
sequence of str
|
Methods to dispatch through :func: |
('2sls', 'liml', 'fuller', 'jive')
|
alpha
|
float
|
For Wald-CI columns. |
0.05
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
columns |