statspai.deepiv¶
deepiv ¶
DeepIV: Deep Learning Instrumental Variables (Hartford et al. 2017).
Uses a two-stage neural network approach: Stage 1: Mixture Density Network estimates P(T | Z, X) Stage 2: Response network minimises counterfactual loss using Monte-Carlo samples from the learned treatment distribution.
References
Hartford, J., Lewis, G., Leyton-Brown, K., & Taddy, M. (2017). "Deep IV: A Flexible Approach for Counterfactual Prediction." Proceedings of the 34th International Conference on Machine Learning (ICML). [@hartford2017deep]
DeepIV ¶
Deep Instrumental Variables estimator (Hartford et al. 2017).
Follows the same defaults as Microsoft EconML's DeepIVEstimator.
See the module docstring for a discussion of the biased vs unbiased
gradient trade-off and when to prefer modern alternatives.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
treat
|
str
|
Endogenous treatment variable (continuous). |
required |
instruments
|
list of str
|
Excluded instruments. |
required |
covariates
|
list of str
|
Exogenous controls. |
required |
n_components
|
int
|
Gaussian mixture components in the MDN. |
10
|
hidden_layers
|
tuple of int
|
Hidden layer sizes. |
(128, 64)
|
first_stage_epochs
|
int
|
Stage 1 training epochs. |
100
|
second_stage_epochs
|
int
|
Stage 2 training epochs. |
100
|
n_samples
|
int
|
MC samples per observation used to form the Stage-2 residual. |
1
|
n_gradient_samples
|
int
|
Independent additional samples for the gradient path. |
0
|
batch_size
|
int
|
|
256
|
learning_rate
|
float
|
|
0.001
|
alpha
|
float
|
Significance level. |
0.05
|
random_state
|
int
|
|
42
|
verbose
|
bool
|
|
False
|
fit ¶
Fit the DeepIV model and return causal effect estimates.
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
effect ¶
Estimate E[Y(t1) - Y(t0) | X] for given treatment levels.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
t0
|
float
|
Baseline treatment value (original scale). |
required |
t1
|
float
|
Counterfactual treatment value (original scale). |
required |
X
|
ndarray
|
Covariates. If None, uses training data. |
None
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Individual-level treatment effects. |
deepiv ¶
deepiv(data: DataFrame, y: str, treat: str, instruments: List[str], covariates: List[str], n_components: int = 10, hidden_layers: Tuple[int, ...] = (128, 64), first_stage_epochs: int = 100, second_stage_epochs: int = 100, n_samples: int = 1, n_gradient_samples: int = 0, batch_size: int = 256, learning_rate: float = 0.001, alpha: float = 0.05, random_state: int = 42, verbose: bool = False) -> CausalResult
Estimate causal effects using Deep Instrumental Variables.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable name. |
required |
treat
|
str
|
Endogenous treatment variable name (continuous). |
required |
instruments
|
list of str
|
Excluded instrument variable names. |
required |
covariates
|
list of str
|
Exogenous control variable names. |
required |
n_components
|
int
|
Number of Gaussian mixture components in the MDN (stage 1). |
10
|
hidden_layers
|
tuple of int
|
Hidden layer sizes for both networks. |
(128, 64)
|
first_stage_epochs
|
int
|
Training epochs for the treatment model (MDN). |
100
|
second_stage_epochs
|
int
|
Training epochs for the response model. |
100
|
n_samples
|
int
|
Number of MC samples per observation used to form the Stage-2
residual |
1
|
n_gradient_samples
|
int
|
Number of independent additional samples used for the
gradient path. When |
0
|
batch_size
|
int
|
Mini-batch size. |
256
|
learning_rate
|
float
|
Adam learning rate. |
1e-3
|
alpha
|
float
|
Significance level for confidence interval. |
0.05
|
random_state
|
int
|
Random seed for reproducibility. |
42
|
verbose
|
bool
|
Print training progress. |
False
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Notes
Use the single-sample default (n_gradient_samples=0) for
most applications — it matches EconML, trains roughly 2x faster, and
the implicit variance regularization often helps in small samples.
Use n_gradient_samples >= 1 when you need the unbiased gradient
(e.g. for theoretical guarantees or when training with very large
batches where the bias dominates).
For high-dimensional covariates or weak instruments, consider DeepGMM / DFIV / DualIV instead — see the module docstring for references.
Examples:
>>> result = sp.deepiv(
... df, y='lwage', treat='educ',
... instruments=['nearc4'],
... covariates=['exper', 'expersq'],
... )
>>> print(result.summary())
>>> # Custom architecture with unbiased gradient
>>> result = sp.deepiv(
... df, y='sales', treat='price',
... instruments=['cost_shifter'],
... covariates=['demand_controls'],
... n_components=20,
... hidden_layers=(256, 128, 64),
... first_stage_epochs=200,
... n_gradient_samples=1, # paired-sample unbiased gradient
... )