`statspai.rlasso`¶

rlasso ¶

Rigorous (data-driven) Lasso — a faithful port of R's hdm package.

Public surface

:func:rlasso — rigorous (post-)Lasso with a data-driven, theory- justified penalty (hdm::rlasso).
:func:rlasso_effect / :func:rlasso_effects — treatment-effect inference after Lasso-selecting controls (hdm::rlassoEffect(s)).
:func:rlasso_iv — instrumental-variables estimation with Lasso selection of instruments and/or controls (hdm::rlassoIV).
:class:RlassoRegressor / :class:RlassoClassifier — scikit-learn adapters so the rigorous Lasso can serve as a Double-ML nuisance learner (sp.dml(..., ml_g='rlasso')).

Every estimator is validated to agree numerically with hdm (see tests/reference_parity/test_rlasso_parity.py): the core rlasso matches to machine precision; the IV/effect estimators to ~1e-6.

References

Chernozhukov, V., Hansen, C. and Spindler, M. (2016). "hdm: High-Dimensional Metrics." The R Journal, 8(2), 185-199. [@chernozhukov2016hdm]

RLassoFit `dataclass` ¶

Bases: ResultProtocolMixin

Result of :func:rlasso — mirrors the fields of an hdm rlasso.

predict ¶

predict(newX: ndarray) -> ndarray

Predict on the original (uncentered) scale — hdm::predict.rlasso.

RLassoLogitFit `dataclass` ¶

Bases: ResultProtocolMixin

Result of :func:rlassologit — mirrors hdm rlassologit.

predict ¶

predict(newX: ndarray, type: str = 'response') -> ndarray

Predict on newX — type='response' (probabilities) or 'link' (log-odds). Mirrors hdm::predict.rlassologit.

RLassoEffectResult `dataclass` ¶

Bases: ResultProtocolMixin

Return of :func:rlasso_effect.

RLassoIVResult `dataclass` ¶

Bases: ResultProtocolMixin

Return of :func:rlasso_iv.

RlassoClassifier ¶

Bases: _SklearnCompatEstimator

Linear-probability classifier backed by the rigorous Lasso.

Fits rlasso to the 0/1 label and exposes clipped predict_proba. Use only when a linear-probability propensity is acceptable; for calibrated propensities use a genuine classifier.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 20))
>>> lin = X[:, 0] - 0.5 * X[:, 1]
>>> d = (rng.uniform(size=200) < 1.0 / (1.0 + np.exp(-lin))).astype(float)
>>> clf = sp.RlassoClassifier().fit(X, d)
>>> clf.predict_proba(X).shape  # columns: P(0), P(1)
(200, 2)

RlassologitClassifier ¶

Bases: _SklearnCompatEstimator

Logistic rigorous-Lasso classifier — a genuine (calibrated) propensity.

Unlike :class:RlassoClassifier (a clipped linear-probability model), this fits rlassologit (the logistic rigorous Lasso, glmnet-aligned) and exposes proper logistic predict_proba. It is the principled sparse nuisance learner for binary treatments / instruments in sp.dml(model='irm'/'iivm', ml_m='rlassologit').

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 20))
>>> p = 1 / (1 + np.exp(-(X[:, 0] - X[:, 1])))
>>> d = (rng.uniform(size=300) < p).astype(float)
>>> clf = sp.RlassologitClassifier().fit(X, d)
>>> clf.predict_proba(X).shape  # columns: P(0), P(1)
(300, 2)

RlassoRegressor ¶

Bases: _SklearnCompatEstimator

Rigorous (post-)Lasso as a scikit-learn regressor.

Parameters mirror :func:statspai.rlasso.rlasso. Suitable as ml_g / ml_m in sp.dml(model='plr', ...).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> est = sp.RlassoRegressor(post=True).fit(X, y)
>>> est.predict(X).shape
(100,)
>>> est.coef_.shape
(20,)

RLassoLogitEffectResult `dataclass` ¶

Bases: ResultProtocolMixin

Return of :func:rlassologit_effect.

rlasso ¶

rlasso(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None, rng: Optional[Generator] = None) -> RLassoFit

Rigorous Lasso / post-Lasso — a faithful port of hdm::rlasso.

Parameters:

Name	Type	Description	Default
`X`	`(n, p) array`	Design matrix of candidate covariates (`p` may exceed `n`).	required
`y`	`(n,) array`	Response.	required
`post`	`bool`	If `True`, re-estimate the selected support by OLS (post-Lasso).	`True`
`intercept`	`bool`	Center `X` and `y` and report an intercept on the original scale.	`True`
`penalty`	`dict`	Overrides for `homoscedastic` (`True` / `False` / `"none"`), `X.dependent.lambda` (bool), `c` (slack, default 1.1), `gamma` (default `0.1/log(n)`), `lambda.start` and `numSim`. Defaults reproduce hdm exactly.	`None`
`control`	`dict`	Overrides for `numIter` (default 15), `tol` (default 1e-5) and `threshold` (default `None`).	`None`
`colnames`	`list of str`	Names for the columns of `X` (default `V1..Vp`).	`None`
`rng`	`numpy Generator`	Only used when `X.dependent.lambda` simulation is requested.	`None`

Returns:

Type	Description
`RLassoFit`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> fit = sp.rlasso(X, y, post=True)  # rigorous post-Lasso
>>> int(fit.index.sum()) >= 1  # at least one control kept
True
>>> fit.beta.shape
(20,)

rlassologit ¶

rlassologit(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None) -> RLassoLogitFit

Logistic rigorous (post-)Lasso — a faithful port of hdm::rlassologit.

Parameters:

Name	Type	Description	Default
`X`	`(n, p) array of candidate covariates.`		required
`y`	`(n,) binary outcome in {0, 1}.`		required
`post`	`bool`	If `True`, refit the selected support by unpenalized logistic regression (post-Lasso); else keep the glmnet-penalized fit.	`True`
`intercept`	`bool`	Include an intercept.	`True`
`penalty`	`dict`	Overrides for `c` (slack; default 1.1 for `post=True`, else 0.5), `gamma` (default `0.1/log n`) and `lambda` (raw penalty; bypasses the data-driven level).	`None`
`control`	`dict`	`threshold` — coefficients below it are zeroed (default None).	`None`
`colnames`	`list of str`	Column names (default `V1..Vp`).	`None`

Returns:

Type	Description
`RLassoLogitFit`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 20))
>>> p = 1 / (1 + np.exp(-(X[:, 0] - X[:, 1])))
>>> y = (rng.uniform(size=300) < p).astype(float)
>>> fit = sp.rlassologit(X, y)
>>> fit.n_selected >= 1
True

rlasso_effect ¶

rlasso_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], method: str = 'partialling out', post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoEffectResult

Effect of d on y after Lasso-selecting controls x.

Faithful port of hdm::rlassoEffect.

Parameters:

Name	Type	Description	Default
`x`	`(n, p) controls (array, DataFrame or column names).`		required
`y`	`outcome and the single target regressor.`		required
`d`	`outcome and the single target regressor.`		required
`method`	`('partialling out', 'double selection')`	See the module docstring.	`"partialling out"`
`post`	`bool`	Post-Lasso inside the selection steps.	`True`
`I3`	`bool array`	Amelioration set forced into the control set (double-selection only) — hdm's `I3` argument.	`None`
`data`	`DataFrame backing string/column-name inputs.`		`None`
`penalty`	`dict`	Forwarded to :func:`statspai.rlasso.rlasso`.	`None`
`control`	`dict`	Forwarded to :func:`statspai.rlasso.rlasso`.	`None`

Returns:

Type	Description
`RLassoEffectResult`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 12))  # candidate controls
>>> d = X[:, 0] + rng.standard_normal(200)  # treatment
>>> y = 1.5 * d + X[:, 1] + rng.standard_normal(200)
>>> res = sp.rlasso_effect(X[:, 1:], y, d, method="partialling out")
>>> float(res.se) > 0
True
>>> bool(np.isfinite(res.alpha))  # ~1.5
True

rlasso_effects ¶

rlasso_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, method: str = 'partialling out', post: bool = True, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> Dict[str, RLassoEffectResult]

Estimate the effect of each targeted column of X on y.

Faithful port of hdm::rlassoEffects: for every target column j in index, treat column j as d and the remaining columns as controls.

Returns:

Type	Description
`dict`	Mapping `column name -> RLassoEffectResult`.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 8))
>>> y = X[:, 0] - 0.8 * X[:, 1] + rng.standard_normal(200)
>>> out = sp.rlasso_effects(X, y, index=[0, 1], method="partialling out")
>>> len(out)  # one result per targeted column
2
>>> all(r.se > 0 for r in out.values())
True

rlasso_iv ¶

rlasso_iv(y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], z: Union[ndarray, DataFrame, Sequence[str]], x: Optional[Union[ndarray, DataFrame, Sequence[str]]] = None, data: Optional[DataFrame] = None, select_Z: bool = True, select_X: bool = True, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoIVResult

Instrumental-variables estimation with rigorous-Lasso selection.

A faithful port of hdm::rlassoIV. Estimates the causal effect of an endogenous d on y using (potentially many) instruments z and optional high-dimensional controls x.

Parameters:

Name	Type	Description	Default
`y`	`outcome and the endogenous regressor (array, Series or column`	name).	required
`d`	`outcome and the endogenous regressor (array, Series or column`	name).	required
`z`	candidate instruments — ``p_z`` may exceed ``n``.		required
`x`	`exogenous controls (optional).`		`None`
`data`	`DataFrame backing any string/column-name inputs.`		`None`
`select_Z`	`bool`	Lasso-select among the instruments.	`True`
`select_X`	`bool`	Lasso-select among the controls (partialling-out).	`True`
`post`	`bool`	Post-Lasso (OLS refit) inside every selection step.	`True`
`intercept`	`bool`	Passed to the underlying `rlasso` first stages.	`True`
`penalty`	`dict`	Forwarded to :func:`statspai.rlasso.rlasso` (penalty level, loadings, iteration controls).	`None`
`control`	`dict`	Forwarded to :func:`statspai.rlasso.rlasso` (penalty level, loadings, iteration controls).	`None`

Returns:

Type	Description
`RLassoIVResult`

Notes

With select_Z=True, select_X=False this is the Belloni-Chen- Chernozhukov-Hansen (2012) optimal-instrument estimator that the eminent-domain application made famous; numbers agree with hdm::rlassoIV(..., select.X=FALSE, select.Z=TRUE) to ~1e-6.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> Z = rng.standard_normal((300, 5))  # candidate instruments
>>> X = rng.standard_normal((300, 10))  # candidate controls
>>> d = Z[:, 0] + 0.5 * X[:, 0] + rng.standard_normal(300)
>>> y = 1.0 * d + X[:, 0] + rng.standard_normal(300)
>>> res = sp.rlasso_iv(y=y, d=d, z=Z, x=X, select_Z=True, select_X=False)
>>> float(res.se[0]) > 0  # res.coef[0] ~ 1.0 (BCH optimal-IV)
True

rlassologit_effect ¶

rlassologit_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> RLassoLogitEffectResult

Effect of d on a binary y after Lasso-selecting controls x.

Faithful port of hdm::rlassologitEffect.

Parameters:

Name	Type	Description	Default
`x`	`(n, p) controls (array, DataFrame or column names).`		required
`y`	`binary outcome.`		required
`d`	`the single treatment / target regressor.`		required
`post`	`bool`	Post-Lasso inside the two selection steps.	`True`
`I3`	`bool array`	Amelioration set forced into the control set (hdm's `I3`).	`None`
`data`	`DataFrame backing string/column-name inputs.`		`None`

Returns:

Type	Description
`RLassoLogitEffectResult`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 10))
>>> d = (rng.standard_normal(300) + X[:, 0] > 0).astype(float)
>>> eta = 0.7 * d + X[:, 1] - 0.5 * X[:, 2]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> res = sp.rlassologit_effect(X, y, d)
>>> bool(res.se > 0) and bool(np.isfinite(res.alpha))
True

rlassologit_effects ¶

rlassologit_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> Dict[str, RLassoLogitEffectResult]

Logistic high-dimensional effect of each targeted column of X.

Faithful port of hdm::rlassologitEffects: for every target column j in index, treat column j as d and the remaining columns as controls, with a binary outcome y.

Returns:

Type	Description
`dict`	Mapping `column name -> RLassoLogitEffectResult`.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(1)
>>> X = rng.standard_normal((300, 8))
>>> eta = 0.8 * X[:, 0] - 0.6 * X[:, 1]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> out = sp.rlassologit_effects(X, y, index=[0, 1])
>>> len(out) == 2 and all(r.se > 0 for r in out.values())
True

statspai.rlasso¶

rlasso ¶

RLassoFit dataclass ¶

predict ¶

RLassoLogitFit dataclass ¶

predict ¶

RLassoEffectResult dataclass ¶

RLassoIVResult dataclass ¶

RlassoClassifier ¶

RlassologitClassifier ¶

RlassoRegressor ¶

RLassoLogitEffectResult dataclass ¶

rlasso ¶

rlassologit ¶

rlasso_effect ¶

rlasso_effects ¶

rlasso_iv ¶

rlassologit_effect ¶

rlassologit_effects ¶

`statspai.rlasso`¶

RLassoFit `dataclass` ¶

RLassoLogitFit `dataclass` ¶

RLassoEffectResult `dataclass` ¶

RLassoIVResult `dataclass` ¶

RLassoLogitEffectResult `dataclass` ¶