Skip to content

statspai.dml

dml

Double/Debiased Machine Learning module for StatsPAI.

Implements the Chernozhukov et al. (2018) framework with separate per-model estimator classes (PLR / IRM / PLIV / IIVM) sharing a common cross-fitting infrastructure.

Public entry points:

  • :func:dml — dispatcher, selects the model via model= string.
  • :class:DoubleML — legacy façade, delegates to a per-model class.
  • :class:DoubleMLPLR, :class:DoubleMLIRM, :class:DoubleMLPLIV, :class:DoubleMLIIVM — direct per-model entry points.
References

Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W., and Robins, J. (2018). "Double/Debiased Machine Learning for Treatment and Structural Parameters." Econometrics Journal, 21(1), C1-C68. [@chernozhukov2018double]

DoubleML

Legacy façade. Prefer the per-model classes directly for new code: :class:DoubleMLPLR, :class:DoubleMLIRM, :class:DoubleMLPLIV, :class:DoubleMLIIVM. Kept for backward compatibility.

DoubleMLPLR

Bases: _DoubleMLBase

Partially linear regression DML (continuous or binary D, no IV).

DoubleMLIRM

Bases: _DoubleMLBase

Interactive regression DML — binary D, ATE via AIPW.

DoubleMLPLIV

Bases: _DoubleMLBase

Partially linear IV DML — endogenous D with continuous/binary Z.

DoubleMLIIVM

Bases: _DoubleMLBase

Interactive IV DML — binary D, binary Z, LATE via Wald.

DMLAveragingResult

Bases: CausalResult

CausalResult extended with per-candidate and weight details.

Attributes stored in model_info:

  • candidates — list of candidate labels.
  • theta_k — per-candidate :math:\hat\theta (only meaningful for the non-stacking weight rules).
  • se_k — per-candidate SE.
  • mse_k — per-candidate nuisance risk (g + m).
  • weights — averaging or stacking weights.
  • weights_g / weights_m — separate stacking weights per nuisance under weight_rule="short_stacking".
  • weight_rule — how the weights were computed.

DMLPanelResult dataclass

Output of :func:dml_panel.

Attributes:

Name Type Description
estimate float

Debiased treatment effect β̂.

se float

Cluster-robust SE at the unit level.

ci_lower, ci_upper float
p_value float
t_stat float
n_units int
n_obs int
n_folds int
include_time_fe bool
ml_g_name str

Short name of the outcome nuisance learner.

ml_m_name str

Short name of the treatment nuisance learner.

method str

Always "dml_panel".

diagnostics dict

Populated with {'y_resid_std', 'd_resid_std', 'corr_yd_resid', 'within_r2', 'omega_cluster'}.

DMLSensitivityResult dataclass

Output of :func:dml_sensitivity.

Attributes:

Name Type Description
estimate float

Original DML point estimate.

se float

Original DML standard error.

rv_q float

Robustness value at the user's q-threshold (default q=1 ⇒ strength of confounder needed to shrink estimate to zero).

rv_qa float

Confounder strength needed to push the (1-α)·100% lower CI across zero. Strictly less than or equal to rv_q.

bias_bound float

Maximum |bias| under the user-specified cf_y, cf_d.

adjusted_estimate_low float
adjusted_estimate_high float

Bias-adjusted estimate range under the (cf_y, cf_d) scenario.

benchmarks DataFrame

For each benchmark covariate X_k: cf_y_bench, cf_d_bench, and the implied bias / adjusted estimate when a confounder is assumed to be k_y, k_d times as strong as X_k in the residualised regression. Empty if benchmark not provided.

s float

Scaling factor :math:S = \sigma_{Y\text{resid}} / \sigma_{D\text{resid}} used in the bias formula.

q float
alpha float
method str

Always "DML-OVB (Chernozhukov-Cinelli-Newey 2022)".

plot

plot(ax=None, levels: Sequence[float] = (0.0, 0.5, 1.0), figsize=(6.0, 5.0))

Plot bias-contour grid for hypothetical (cf_y, cf_d) pairs.

Plots the |bias|/|θ̂| contour as a function of the confounder strength on Y and D. The user can read off the RV directly. Mirrors the sensemakr plotting convention.

DMLDiagnostics dataclass

Bundled DML diagnostics returned by :func:dml_diagnostics.

plot

plot(figsize=(10.0, 8.0), bins: int = 30)

Render a 2×2 publication-style diagnostic panel.

Top-left : overlap histogram (propensity for IRM, |D-resid| for PLR). Top-right : score-density histogram with N(0,σ̂²) overlay. Bottom-left: balance bar chart (corr with each residualised covariate). Bottom-right: Q-Q plot of standardised score vs N(0,1).

dml

dml(data: DataFrame, y: str, treat: str, covariates: List[str], model: str = 'plr', instrument: Optional[Union[str, List[str]]] = None, ml_g: Optional[Any] = None, ml_m: Optional[Any] = None, ml_r: Optional[Any] = None, n_folds: int = 5, n_rep: int = 1, alpha: float = 0.05, random_state: int = 42, sample_weight: Optional[Any] = None) -> CausalResult

Estimate causal effect using Double/Debiased Machine Learning.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treat str

Treatment variable.

required
covariates list of str

High-dimensional controls X.

required
model str

DML model:

  • 'plr' : partially linear (continuous or binary D)
  • 'irm' : interactive regression (binary D; ATE via AIPW)
  • 'pliv' : partially linear IV (endogenous D with instrument Z)
  • 'iivm' : interactive IV (binary D, binary Z; LATE via Wald ratio)
'plr'
instrument str

Scalar instrument Z. Required for model in {'pliv', 'iivm'}.

None
ml_g sklearn estimator or str

Learners for outcome / treatment / instrument nuisance. Pass any scikit-learn-compatible estimator, or one of the convenience string aliases (case-insensitive):

  • 'rf' — random forest (200 trees)
  • 'gbm' — gradient boosting (100 estimators, depth 3)
  • 'lasso' / 'ridge' — penalised linear (CV-tuned)
  • 'linear' / 'ols' — plain linear regression
  • 'logistic' — logistic regression (classifier roles only)
  • 'xgb' / 'lgbm' — XGBoost / LightGBM (optional install)

For binary treatment (model='irm') ml_m is auto-coerced to the classifier variant; same for ml_r under 'iivm'. None falls back to the per-model gradient-boosting default.

None
ml_m sklearn estimator or str

Learners for outcome / treatment / instrument nuisance. Pass any scikit-learn-compatible estimator, or one of the convenience string aliases (case-insensitive):

  • 'rf' — random forest (200 trees)
  • 'gbm' — gradient boosting (100 estimators, depth 3)
  • 'lasso' / 'ridge' — penalised linear (CV-tuned)
  • 'linear' / 'ols' — plain linear regression
  • 'logistic' — logistic regression (classifier roles only)
  • 'xgb' / 'lgbm' — XGBoost / LightGBM (optional install)

For binary treatment (model='irm') ml_m is auto-coerced to the classifier variant; same for ml_r under 'iivm'. None falls back to the per-model gradient-boosting default.

None
ml_r sklearn estimator or str

Learners for outcome / treatment / instrument nuisance. Pass any scikit-learn-compatible estimator, or one of the convenience string aliases (case-insensitive):

  • 'rf' — random forest (200 trees)
  • 'gbm' — gradient boosting (100 estimators, depth 3)
  • 'lasso' / 'ridge' — penalised linear (CV-tuned)
  • 'linear' / 'ols' — plain linear regression
  • 'logistic' — logistic regression (classifier roles only)
  • 'xgb' / 'lgbm' — XGBoost / LightGBM (optional install)

For binary treatment (model='irm') ml_m is auto-coerced to the classifier variant; same for ml_r under 'iivm'. None falls back to the per-model gradient-boosting default.

None
n_folds int
5
n_rep int

Repeated cross-fitting splits (median aggregation).

1
alpha float
0.05
random_state int

Seed for the cross-fitting fold assignment. Repeat splits use random_state + rep so a single random_state fully determines the result.

42
sample_weight ndarray | Series | str

Per-observation weights (e.g., survey/probability weights). Pass either a 1-D array of length len(data) or a column name present in data. Supported for model in {'plr', 'irm', 'pliv', 'iivm'}.

None

Returns:

Type Description
CausalResult

Examples:

>>> # Partially Linear Regression
>>> result = sp.dml(df, y='wage', treat='training',
...                 covariates=['age', 'edu', 'exp'])
>>> # Interactive Regression (binary treatment, ATE)
>>> result = sp.dml(df, y='wage', treat='D', covariates=X_cols,
...                 model='irm')
>>> # Partially Linear IV — endogenous D, instrument Z
>>> result = sp.dml(df, y='earnings', treat='schooling',
...                 covariates=['age', 'father_edu'],
...                 model='pliv', instrument='quarter_of_birth')
>>> # Interactive IV — binary D, binary Z (LATE)
>>> result = sp.dml(df, y='earnings', treat='college',
...                 covariates=['age', 'ability'],
...                 model='iivm', instrument='lottery_win')

dml_model_averaging

dml_model_averaging(data: DataFrame, y: str, treat: str, covariates: Sequence[str], candidates: Optional[List[Tuple[Any, Any, str]]] = None, n_folds: int = 5, seed: int = 0, weight_rule: str = 'short_stacking', alpha: float = 0.05, sample_weight: Optional[Any] = None) -> DMLAveragingResult

Model-averaging / stacking DML-PLR estimator.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome column.

required
treat str

Continuous-or-binary treatment column.

required
covariates list of str

Covariate columns X.

required
candidates list of (ml_g, ml_m, label)

Candidate nuisance learners. ml_g regresses y on X; ml_m regresses treat on X. Defaults to a Lasso/Ridge/ RandomForest/GradientBoosting roster.

None
n_folds int

Cross-fitting folds per candidate.

5
seed int
0
weight_rule ('short_stacking', 'single_best', 'inverse_risk', 'equal')

How to combine candidate nuisance predictions or estimates.

  • "short_stacking" (default; Ahrens et al. 2025 eq. 7) — solve constrained least squares on cross-fitted predictions for each nuisance separately (ŷ and ), produce stacked nuisances, plug into the PLR moment equation.
  • "single_best" — Ahrens et al. (2025, fn. 8): pick the candidate with lowest joint nuisance MSE.
  • "inverse_risk" — :math:w_k \propto 1/(\text{MSE}_g + \text{MSE}_m). Convenience baseline; not in the paper.
  • "equal" — :math:w_k = 1/K. Convenience baseline; not in the paper.

For the non-stacking rules (inverse_risk / equal / single_best) the function computes per-candidate :math:\hat\theta_k and reports the weighted average with a between-candidate-covariance-corrected SE; for "short_stacking" it reports the standard PLR sandwich SE on the stacked-nuisance score (Neyman orthogonality is preserved).

"short_stacking"
alpha float

Two-sided CI level.

0.05
sample_weight ndarray | Series | str

Per-observation weights. If supplied, every nuisance fit uses sample_weight= (with a graceful fallback warning if the learner does not accept it), the CLS stacking objective becomes weighted least squares, and the PLR moment + sandwich variance use weighted sums. The MSE used for inverse_risk / single_best weighting is also the weighted MSE.

None

Returns:

Type Description
DMLAveragingResult

With the weighted :math:\hat\theta, SE, CI, and per-candidate diagnostics under result.model_info.

Examples:

>>> import statspai as sp
>>> r = sp.dml_model_averaging(df, y="y", treat="d",
...                             covariates=[f"x{j}" for j in range(10)])
>>> r.summary()
>>> r.model_info["weights_g"]   # CLS stacking weights for ℓ̂(X) = E[Y|X]
{"lasso": 0.42, "ridge": 0.0, "rf": 0.0, "gbm": 0.58}
>>> r.model_info["weights_m"]   # CLS stacking weights for m̂(X) = E[D|X]
{"lasso": 0.0, "ridge": 0.31, "rf": 0.0, "gbm": 0.69}

dml_panel

dml_panel(data: DataFrame, y: str, treat: str, covariates: List[str], *, unit: str, time: Optional[str] = None, ml_g: Optional[Any] = None, ml_m: Optional[Any] = None, n_folds: int = 5, alpha: float = 0.05, include_time_fe: bool = False, binary_treatment: bool = False, seed: int = 0, sample_weight: Optional[Any] = None) -> DMLPanelResult

Long-panel Double/Debiased ML with unit FE and cluster-robust SE.

Estimates β in

.. math::

Y_{it} = \alpha_i + \lambda_t + \beta D_{it} + g(X_{it}) + \varepsilon_{it}

by (1) within-transforming y, treat and covariates to absorb unit (and optionally time) fixed effects; (2) running cross-fit PLR on the demeaned data with folds that split units; (3) computing the Neyman-orthogonal score and cluster-robust SE at the unit level.

Parameters:

Name Type Description Default
data DataFrame

Long-format panel; must contain y, treat, all covariates, unit, and time (if given).

required
y str

Column names.

required
treat str

Column names.

required
covariates list of str

High-dimensional controls X_it. Pass [] to fit a pure FE model with ML-free residuals.

required
unit str

Unit identifier column. Used both for FE absorption and cross-fit fold assignment.

required
time str

Time identifier column; required when include_time_fe=True.

None
ml_g sklearn-style estimators

Nuisance learners. Default: GradientBoosting with 200 trees, depth 3, lr 0.05 — same convention as :func:sp.dml PLR.

None
ml_m sklearn-style estimators

Nuisance learners. Default: GradientBoosting with 200 trees, depth 3, lr 0.05 — same convention as :func:sp.dml PLR.

None
n_folds int

Cross-fit folds over units. Must be >= 2 and <= n_units.

5
alpha float
0.05
include_time_fe bool

If True, also subtract time means (two-way within transform).

False
binary_treatment bool

Deprecated. Previously routed binary D through a classifier path that mixed within-demeaned features with raw {0,1} labels — the resulting propensity had no clean interpretation as :math:E[\tilde D_{it} \mid \tilde X_{it}]. The flag is now ignored: :func:dml_panel always residualises the within-transformed D̃ with a regressor (PLR is agnostic to D's type, and within-transformed binary D is no longer binary). For DR-style ATE on binary D in panels, use :func:sp.dml(..., model='irm') with unit dummies in the covariate set, :func:sp.etwfe, or :func:sp.callaway_santanna. A :class:DeprecationWarning is emitted when this argument is passed; passing D ∈ {0,1} while binary_treatment=True is validated to catch incidental misuse.

False
seed int

RNG seed for fold assignment.

0
sample_weight ndarray | Series | str

Per-observation weights (e.g., survey/probability weights). Pass either a 1-D array of length len(data) or a column name in data. The within transform becomes a weighted within transform (subtract weighted unit / time means), and the PLR moment + cluster-robust SE use weighted sums. Required if your survey design carries informative sampling probabilities.

None

Returns:

Type Description
class:`DMLPanelResult`
Notes

The cluster-robust SE follows Liang-Zeger 1986 at the unit level (the coarser of the two dimensions): stacked scores are summed within unit before squaring. This is the appropriate clustering level when shocks at higher frequencies than the unit are plausibly correlated (cf. Cameron-Miller 2015 §3.2).

Identification requires no unobserved time-varying confounders — only time-invariant unit heterogeneity + high-dim observed X_it. Violations of strict exogeneity of D (e.g. dynamic-feedback) are not handled here; use :func:sp.msm or :func:sp.gformula_ice.

Examples:

>>> import statspai as sp
>>> res = sp.dml_panel(
...     df, y='log_wage', treat='union',
...     covariates=['exper', 'educ', 'married', 'south'],
...     unit='pid', time='year', include_time_fe=True,
... )
>>> print(res.summary())

dml_sensitivity

dml_sensitivity(result, q: float = 1.0, cf_y: Optional[float] = None, cf_d: Optional[float] = None, benchmark_covariates: Optional[Sequence[str]] = None, k_y: float = 1.0, k_d: float = 1.0) -> DMLSensitivityResult

Compute DML-OVB sensitivity for a fitted DML CausalResult.

Parameters:

Name Type Description Default
result CausalResult

From :func:statspai.dml.dml (PLR or IRM). Must carry the post-fit residuals via model_info['_y_resid'], model_info['_d_resid'], and the design matrix for benchmarks.

required
q float

Bias threshold as a fraction of |θ̂|. q=1 ⇒ confounder needed to shrink estimate to zero; q=0.5 ⇒ half the estimate.

1.0
cf_y float

Hypothesized partial-R² of an unobserved confounder with the residualised outcome and treatment. If both are given, the report includes a bias bound and adjusted-estimate range.

None
cf_d float

Hypothesized partial-R² of an unobserved confounder with the residualised outcome and treatment. If both are given, the report includes a bias bound and adjusted-estimate range.

None
benchmark_covariates list of str

Subset of the original covariates to benchmark against. For each X_k, the benchmark sets cf_y_bench, cf_d_bench to the partial R² that X_k itself contributes (multiplied by k_y, k_d to express "what if a confounder were k× as strong as X_k?").

None
k_y float

Multipliers for the benchmark strengths.

1.0
k_d float

Multipliers for the benchmark strengths.

1.0

Returns:

Type Description
DMLSensitivityResult

dml_diagnostics

dml_diagnostics(result, clip: float = 0.02) -> DMLDiagnostics

Build a :class:DMLDiagnostics report from a DML CausalResult.

Parameters:

Name Type Description Default
result CausalResult

Result returned by :func:statspai.dml.dml. Must include the post-fit residuals (model_info['_y_resid'], model_info['_d_resid']); for IRM, additionally the propensity model_info['diagnostics']['pscore_min'] etc. are surfaced.

required
clip float

For IRM-style overlap: count units with propensity within [0, clip] ∪ [1-clip, 1] as overlap-violating.

0.02

Returns:

Type Description
DMLDiagnostics