Skip to content

statspai.rlasso

rlasso

Rigorous (data-driven) Lasso — a faithful port of R's hdm package.

Public surface
  • :func:rlasso — rigorous (post-)Lasso with a data-driven, theory- justified penalty (hdm::rlasso).
  • :func:rlasso_effect / :func:rlasso_effects — treatment-effect inference after Lasso-selecting controls (hdm::rlassoEffect(s)).
  • :func:rlasso_iv — instrumental-variables estimation with Lasso selection of instruments and/or controls (hdm::rlassoIV).
  • :class:RlassoRegressor / :class:RlassoClassifier — scikit-learn adapters so the rigorous Lasso can serve as a Double-ML nuisance learner (sp.dml(..., ml_g='rlasso')).

Every estimator is validated to agree numerically with hdm (see tests/reference_parity/test_rlasso_parity.py): the core rlasso matches to machine precision; the IV/effect estimators to ~1e-6.

References

Chernozhukov, V., Hansen, C. and Spindler, M. (2016). "hdm: High-Dimensional Metrics." The R Journal, 8(2), 185-199. [@chernozhukov2016hdm]

RLassoFit dataclass

Bases: ResultProtocolMixin

Result of :func:rlasso — mirrors the fields of an hdm rlasso.

predict

predict(newX: ndarray) -> ndarray

Predict on the original (uncentered) scale — hdm::predict.rlasso.

RLassoLogitFit dataclass

Bases: ResultProtocolMixin

Result of :func:rlassologit — mirrors hdm rlassologit.

predict

predict(newX: ndarray, type: str = 'response') -> ndarray

Predict on newXtype='response' (probabilities) or 'link' (log-odds). Mirrors hdm::predict.rlassologit.

RLassoEffectResult dataclass

Bases: ResultProtocolMixin

Return of :func:rlasso_effect.

RLassoIVResult dataclass

Bases: ResultProtocolMixin

Return of :func:rlasso_iv.

RlassoClassifier

Bases: _SklearnCompatEstimator

Linear-probability classifier backed by the rigorous Lasso.

Fits rlasso to the 0/1 label and exposes clipped predict_proba. Use only when a linear-probability propensity is acceptable; for calibrated propensities use a genuine classifier.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 20))
>>> lin = X[:, 0] - 0.5 * X[:, 1]
>>> d = (rng.uniform(size=200) < 1.0 / (1.0 + np.exp(-lin))).astype(float)
>>> clf = sp.RlassoClassifier().fit(X, d)
>>> clf.predict_proba(X).shape  # columns: P(0), P(1)
(200, 2)

RlassologitClassifier

Bases: _SklearnCompatEstimator

Logistic rigorous-Lasso classifier — a genuine (calibrated) propensity.

Unlike :class:RlassoClassifier (a clipped linear-probability model), this fits rlassologit (the logistic rigorous Lasso, glmnet-aligned) and exposes proper logistic predict_proba. It is the principled sparse nuisance learner for binary treatments / instruments in sp.dml(model='irm'/'iivm', ml_m='rlassologit').

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 20))
>>> p = 1 / (1 + np.exp(-(X[:, 0] - X[:, 1])))
>>> d = (rng.uniform(size=300) < p).astype(float)
>>> clf = sp.RlassologitClassifier().fit(X, d)
>>> clf.predict_proba(X).shape  # columns: P(0), P(1)
(300, 2)

RlassoRegressor

Bases: _SklearnCompatEstimator

Rigorous (post-)Lasso as a scikit-learn regressor.

Parameters mirror :func:statspai.rlasso.rlasso. Suitable as ml_g / ml_m in sp.dml(model='plr', ...).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> est = sp.RlassoRegressor(post=True).fit(X, y)
>>> est.predict(X).shape
(100,)
>>> est.coef_.shape
(20,)

RLassoLogitEffectResult dataclass

Bases: ResultProtocolMixin

Return of :func:rlassologit_effect.

rlasso

rlasso(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None, rng: Optional[Generator] = None) -> RLassoFit

Rigorous Lasso / post-Lasso — a faithful port of hdm::rlasso.

Parameters:

Name Type Description Default
X (n, p) array

Design matrix of candidate covariates (p may exceed n).

required
y (n,) array

Response.

required
post bool

If True, re-estimate the selected support by OLS (post-Lasso).

True
intercept bool

Center X and y and report an intercept on the original scale.

True
penalty dict

Overrides for homoscedastic (True / False / "none"), X.dependent.lambda (bool), c (slack, default 1.1), gamma (default 0.1/log(n)), lambda.start and numSim. Defaults reproduce hdm exactly.

None
control dict

Overrides for numIter (default 15), tol (default 1e-5) and threshold (default None).

None
colnames list of str

Names for the columns of X (default V1..Vp).

None
rng numpy Generator

Only used when X.dependent.lambda simulation is requested.

None

Returns:

Type Description
RLassoFit

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> fit = sp.rlasso(X, y, post=True)  # rigorous post-Lasso
>>> int(fit.index.sum()) >= 1  # at least one control kept
True
>>> fit.beta.shape
(20,)

rlassologit

rlassologit(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None) -> RLassoLogitFit

Logistic rigorous (post-)Lasso — a faithful port of hdm::rlassologit.

Parameters:

Name Type Description Default
X (n, p) array of candidate covariates.
required
y (n,) binary outcome in {0, 1}.
required
post bool

If True, refit the selected support by unpenalized logistic regression (post-Lasso); else keep the glmnet-penalized fit.

True
intercept bool

Include an intercept.

True
penalty dict

Overrides for c (slack; default 1.1 for post=True, else 0.5), gamma (default 0.1/log n) and lambda (raw penalty; bypasses the data-driven level).

None
control dict

threshold — coefficients below it are zeroed (default None).

None
colnames list of str

Column names (default V1..Vp).

None

Returns:

Type Description
RLassoLogitFit

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 20))
>>> p = 1 / (1 + np.exp(-(X[:, 0] - X[:, 1])))
>>> y = (rng.uniform(size=300) < p).astype(float)
>>> fit = sp.rlassologit(X, y)
>>> fit.n_selected >= 1
True

rlasso_effect

rlasso_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], method: str = 'partialling out', post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoEffectResult

Effect of d on y after Lasso-selecting controls x.

Faithful port of hdm::rlassoEffect.

Parameters:

Name Type Description Default
x (n, p) controls (array, DataFrame or column names).
required
y outcome and the single target regressor.
required
d outcome and the single target regressor.
required
method ('partialling out', 'double selection')

See the module docstring.

"partialling out"
post bool

Post-Lasso inside the selection steps.

True
I3 bool array

Amelioration set forced into the control set (double-selection only) — hdm's I3 argument.

None
data DataFrame backing string/column-name inputs.
None
penalty dict

Forwarded to :func:statspai.rlasso.rlasso.

None
control dict

Forwarded to :func:statspai.rlasso.rlasso.

None

Returns:

Type Description
RLassoEffectResult

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 12))  # candidate controls
>>> d = X[:, 0] + rng.standard_normal(200)  # treatment
>>> y = 1.5 * d + X[:, 1] + rng.standard_normal(200)
>>> res = sp.rlasso_effect(X[:, 1:], y, d, method="partialling out")
>>> float(res.se) > 0
True
>>> bool(np.isfinite(res.alpha))  # ~1.5
True

rlasso_effects

rlasso_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, method: str = 'partialling out', post: bool = True, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> Dict[str, RLassoEffectResult]

Estimate the effect of each targeted column of X on y.

Faithful port of hdm::rlassoEffects: for every target column j in index, treat column j as d and the remaining columns as controls.

Returns:

Type Description
dict

Mapping column name -> RLassoEffectResult.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 8))
>>> y = X[:, 0] - 0.8 * X[:, 1] + rng.standard_normal(200)
>>> out = sp.rlasso_effects(X, y, index=[0, 1], method="partialling out")
>>> len(out)  # one result per targeted column
2
>>> all(r.se > 0 for r in out.values())
True

rlasso_iv

rlasso_iv(y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], z: Union[ndarray, DataFrame, Sequence[str]], x: Optional[Union[ndarray, DataFrame, Sequence[str]]] = None, data: Optional[DataFrame] = None, select_Z: bool = True, select_X: bool = True, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoIVResult

Instrumental-variables estimation with rigorous-Lasso selection.

A faithful port of hdm::rlassoIV. Estimates the causal effect of an endogenous d on y using (potentially many) instruments z and optional high-dimensional controls x.

Parameters:

Name Type Description Default
y outcome and the endogenous regressor (array, Series or column

name).

required
d outcome and the endogenous regressor (array, Series or column

name).

required
z candidate instruments — ``p_z`` may exceed ``n``.
required
x exogenous controls (optional).
None
data DataFrame backing any string/column-name inputs.
None
select_Z bool

Lasso-select among the instruments.

True
select_X bool

Lasso-select among the controls (partialling-out).

True
post bool

Post-Lasso (OLS refit) inside every selection step.

True
intercept bool

Passed to the underlying rlasso first stages.

True
penalty dict

Forwarded to :func:statspai.rlasso.rlasso (penalty level, loadings, iteration controls).

None
control dict

Forwarded to :func:statspai.rlasso.rlasso (penalty level, loadings, iteration controls).

None

Returns:

Type Description
RLassoIVResult
Notes

With select_Z=True, select_X=False this is the Belloni-Chen- Chernozhukov-Hansen (2012) optimal-instrument estimator that the eminent-domain application made famous; numbers agree with hdm::rlassoIV(..., select.X=FALSE, select.Z=TRUE) to ~1e-6.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> Z = rng.standard_normal((300, 5))  # candidate instruments
>>> X = rng.standard_normal((300, 10))  # candidate controls
>>> d = Z[:, 0] + 0.5 * X[:, 0] + rng.standard_normal(300)
>>> y = 1.0 * d + X[:, 0] + rng.standard_normal(300)
>>> res = sp.rlasso_iv(y=y, d=d, z=Z, x=X, select_Z=True, select_X=False)
>>> float(res.se[0]) > 0  # res.coef[0] ~ 1.0 (BCH optimal-IV)
True

rlassologit_effect

rlassologit_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> RLassoLogitEffectResult

Effect of d on a binary y after Lasso-selecting controls x.

Faithful port of hdm::rlassologitEffect.

Parameters:

Name Type Description Default
x (n, p) controls (array, DataFrame or column names).
required
y binary outcome.
required
d the single treatment / target regressor.
required
post bool

Post-Lasso inside the two selection steps.

True
I3 bool array

Amelioration set forced into the control set (hdm's I3).

None
data DataFrame backing string/column-name inputs.
None

Returns:

Type Description
RLassoLogitEffectResult

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 10))
>>> d = (rng.standard_normal(300) + X[:, 0] > 0).astype(float)
>>> eta = 0.7 * d + X[:, 1] - 0.5 * X[:, 2]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> res = sp.rlassologit_effect(X, y, d)
>>> bool(res.se > 0) and bool(np.isfinite(res.alpha))
True

rlassologit_effects

rlassologit_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> Dict[str, RLassoLogitEffectResult]

Logistic high-dimensional effect of each targeted column of X.

Faithful port of hdm::rlassologitEffects: for every target column j in index, treat column j as d and the remaining columns as controls, with a binary outcome y.

Returns:

Type Description
dict

Mapping column name -> RLassoLogitEffectResult.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(1)
>>> X = rng.standard_normal((300, 8))
>>> eta = 0.8 * X[:, 0] - 0.6 * X[:, 1]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> out = sp.rlassologit_effects(X, y, index=[0, 1])
>>> len(out) == 2 and all(r.se > 0 for r in out.values())
True