Case study: reproducing the DoubleML / hdm 401(k) result with sp.dml¶
The effect of 401(k) eligibility on net financial assets is the
canonical applied example for double machine learning. It appears in the
DoubleML "Getting Started" vignette and traces back to
Chernozhukov & Hansen (2004) and the hdm package (Chernozhukov, Hansen
& Spindler, The R Journal 8(2), 2016). This page shows that sp.dml
reproduces that result, side by side with doubleml-for-py on the same
data.
Getting the data (no bundling)¶
StatsPAI ships no copy of the 401(k) data. The dataset is pulled from DoubleML's own public distribution, so there is a single, canonical, license-clear source and nothing to keep in sync:
from doubleml.datasets import fetch_401K
df = fetch_401K(return_type="DataFrame") # 9,915 households, 14 columns
# outcome: net_tfa (net financial assets)
# treatment: e401 (eligible for a 401(k))
# controls: age, inc, educ, fsize, marr, twoearn, db, pira, hown, nifa, tw
fetch_401K requires doubleml (pip install doubleml); it is not
a runtime dependency of StatsPAI. The extract originates from the 1991
SIPP (a U.S. Census survey) via Chernozhukov & Hansen's replication data.
Estimating the effect with sp.dml¶
import statspai as sp
y, d = "net_tfa", "e401"
covs = ["age", "inc", "educ", "fsize", "marr", "twoearn",
"db", "pira", "hown", "nifa", "tw"]
# Partially linear model (lasso nuisances)
plr = sp.dml(data=df, y=y, treat=d, covariates=covs,
model="plr", ml_g="lasso", ml_m="lasso",
n_folds=5, random_state=42)
print(plr.summary()) # estimate ~ $8,200, se ~ $570
# Interactive (AIPW) model — ATE
irm = sp.dml(data=df, y=y, treat=d, covariates=covs,
model="irm", ml_g="lasso", ml_m="logistic",
n_folds=5, random_state=42)
# Rigorous-Lasso (hdm) nuisances — the theory-correct plug-in penalty
plr_hdm = sp.dml(data=df, y=y, treat=d, covariates=covs,
model="plr", ml_g="rlasso", ml_m="rlasso",
n_folds=5, random_state=42)
Results — sp.dml vs doubleml-for-py¶
Run on the fetched data (StatsPAI 1.20.0, doubleml-for-py 0.11.3,
scikit-learn nuisances, n_folds=5, seed 42):
| Model | sp.dml estimate (se) |
doubleml-for-py estimate (se) |
|---|---|---|
| PLR (lasso) | 8 227.7 (566.3) | 8 166.9 (574.6) |
| IRM ATE (lasso + logistic) | 8 644.5 (1 742.6) | 8 156.5 (1 488.4) |
PLR, rigorous-Lasso (hdm) nuisances |
8 302.8 (469.3) | — |
The doubleml-for-py numbers come from running its own
DoubleMLPLR / DoubleMLIRM on the identical DataFrame:
import doubleml as dml
from sklearn.linear_model import LassoCV, LogisticRegressionCV
dd = dml.DoubleMLData(df, y_col=y, d_cols=d, x_cols=covs)
dml.DoubleMLPLR(dd, ml_l=LassoCV(), ml_m=LassoCV(), n_folds=5).fit()
dml.DoubleMLIRM(dd, ml_g=LassoCV(), ml_m=LogisticRegressionCV(), n_folds=5).fit()
Reading the table. All paths estimate a positive effect of roughly
$8,000–$8,700 of net financial assets attributable to 401(k)
eligibility — the well-known magnitude reported throughout the DoubleML
and hdm literature. The PLR estimates agree to within $60 (≈ 0.1 of a
standard error). The IRM estimates differ by ≈ $490 — about 0.3 of a
standard error, i.e. statistically indistinguishable — because the
AIPW score leaves fold-conditional construction details unspecified and
the two engines draw independent cross-fitting partitions here. The
rigorous-Lasso path (ml_g='rlasso'), which uses hdm's theory-driven
plug-in penalty instead of cross-validation, lands in the same place.
Why the small differences — and where parity is exact¶
These residuals are fold randomization, not a method discrepancy.
When sp.dml and doubleml-for-py are given the same scikit-learn
learners on the same fold partition under a fixed seed, the
partialling-out models (PLR, PLIV) agree to machine precision
(|Δ| at the last float64 unit), and the AIPW models (IRM, IIVM) agree up
to the small score-construction term. That bit-for-bit equivalence is
pinned offline (no network) in
tests/external_parity/test_dml_python_parity.py
and explained in detail in
sp.dml and the DoubleML reference implementation.
For the rigorous-Lasso (hdm) side of the same workflow — sp.rlasso,
rlasso_effect, rlasso_iv, rlassologit, all pinned to hdm to
machine precision — see
Rigorous (data-driven) Lasso — sp.rlasso and the hdm port.
References¶
- Chernozhukov, V. & Hansen, C. (2004). The Effects of 401(k)
Participation on the Wealth Distribution: An Instrumental Quantile
Regression Analysis. Review of Economics and Statistics, 86(3),
735–751. doi
10.1162/0034653041811734. - Chernozhukov, V., Hansen, C. & Spindler, M. (2016). hdm:
High-Dimensional Metrics. The R Journal, 8(2), 185–199.
doi
10.32614/RJ-2016-040. - Bach, P., Chernozhukov, V., Kurz, M.S. & Spindler, M. (2022). DoubleML — An Object-Oriented Implementation of Double Machine Learning in Python. Journal of Machine Learning Research, 23(53), 1–6.
- Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C.,
Newey, W. & Robins, J. (2018). Double/debiased machine learning for
treatment and structural parameters. The Econometrics Journal,
21(1), C1–C68. doi
10.1111/ectj.12097.