statspai.rlasso¶
rlasso ¶
Rigorous (data-driven) Lasso — a faithful port of R's hdm package.
Public surface
- :func:
rlasso— rigorous (post-)Lasso with a data-driven, theory- justified penalty (hdm::rlasso). - :func:
rlasso_effect/ :func:rlasso_effects— treatment-effect inference after Lasso-selecting controls (hdm::rlassoEffect(s)). - :func:
rlasso_iv— instrumental-variables estimation with Lasso selection of instruments and/or controls (hdm::rlassoIV). - :class:
RlassoRegressor/ :class:RlassoClassifier— scikit-learn adapters so the rigorous Lasso can serve as a Double-ML nuisance learner (sp.dml(..., ml_g='rlasso')).
Every estimator is validated to agree numerically with hdm (see
tests/reference_parity/test_rlasso_parity.py): the core rlasso
matches to machine precision; the IV/effect estimators to ~1e-6.
References
Chernozhukov, V., Hansen, C. and Spindler, M. (2016). "hdm: High-Dimensional Metrics." The R Journal, 8(2), 185-199. [@chernozhukov2016hdm]
RLassoFit
dataclass
¶
Bases: ResultProtocolMixin
Result of :func:rlasso — mirrors the fields of an hdm rlasso.
predict ¶
Predict on the original (uncentered) scale — hdm::predict.rlasso.
RLassoLogitFit
dataclass
¶
Bases: ResultProtocolMixin
Result of :func:rlassologit — mirrors hdm rlassologit.
predict ¶
Predict on newX — type='response' (probabilities) or
'link' (log-odds). Mirrors hdm::predict.rlassologit.
RLassoEffectResult
dataclass
¶
Bases: ResultProtocolMixin
Return of :func:rlasso_effect.
RLassoIVResult
dataclass
¶
Bases: ResultProtocolMixin
Return of :func:rlasso_iv.
RlassoClassifier ¶
Bases: _SklearnCompatEstimator
Linear-probability classifier backed by the rigorous Lasso.
Fits rlasso to the 0/1 label and exposes clipped
predict_proba. Use only when a linear-probability propensity is
acceptable; for calibrated propensities use a genuine classifier.
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 20))
>>> lin = X[:, 0] - 0.5 * X[:, 1]
>>> d = (rng.uniform(size=200) < 1.0 / (1.0 + np.exp(-lin))).astype(float)
>>> clf = sp.RlassoClassifier().fit(X, d)
>>> clf.predict_proba(X).shape # columns: P(0), P(1)
(200, 2)
RlassologitClassifier ¶
Bases: _SklearnCompatEstimator
Logistic rigorous-Lasso classifier — a genuine (calibrated) propensity.
Unlike :class:RlassoClassifier (a clipped linear-probability model),
this fits rlassologit (the logistic rigorous Lasso, glmnet-aligned)
and exposes proper logistic predict_proba. It is the principled
sparse nuisance learner for binary treatments / instruments in
sp.dml(model='irm'/'iivm', ml_m='rlassologit').
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 20))
>>> p = 1 / (1 + np.exp(-(X[:, 0] - X[:, 1])))
>>> d = (rng.uniform(size=300) < p).astype(float)
>>> clf = sp.RlassologitClassifier().fit(X, d)
>>> clf.predict_proba(X).shape # columns: P(0), P(1)
(300, 2)
RlassoRegressor ¶
Bases: _SklearnCompatEstimator
Rigorous (post-)Lasso as a scikit-learn regressor.
Parameters mirror :func:statspai.rlasso.rlasso. Suitable as
ml_g / ml_m in sp.dml(model='plr', ...).
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> est = sp.RlassoRegressor(post=True).fit(X, y)
>>> est.predict(X).shape
(100,)
>>> est.coef_.shape
(20,)
RLassoLogitEffectResult
dataclass
¶
Bases: ResultProtocolMixin
Return of :func:rlassologit_effect.
rlasso ¶
rlasso(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None, rng: Optional[Generator] = None) -> RLassoFit
Rigorous Lasso / post-Lasso — a faithful port of hdm::rlasso.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(n, p) array
|
Design matrix of candidate covariates ( |
required |
y
|
(n,) array
|
Response. |
required |
post
|
bool
|
If |
True
|
intercept
|
bool
|
Center |
True
|
penalty
|
dict
|
Overrides for |
None
|
control
|
dict
|
Overrides for |
None
|
colnames
|
list of str
|
Names for the columns of |
None
|
rng
|
numpy Generator
|
Only used when |
None
|
Returns:
| Type | Description |
|---|---|
RLassoFit
|
|
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((100, 20))
>>> beta = np.zeros(20); beta[:3] = [1.0, -1.0, 0.5]
>>> y = X @ beta + 0.5 * rng.standard_normal(100)
>>> fit = sp.rlasso(X, y, post=True) # rigorous post-Lasso
>>> int(fit.index.sum()) >= 1 # at least one control kept
True
>>> fit.beta.shape
(20,)
rlassologit ¶
rlassologit(X: ndarray, y: ndarray, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None, colnames: Optional[List[str]] = None) -> RLassoLogitFit
Logistic rigorous (post-)Lasso — a faithful port of hdm::rlassologit.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(n, p) array of candidate covariates.
|
|
required |
y
|
(n,) binary outcome in {0, 1}.
|
|
required |
post
|
bool
|
If |
True
|
intercept
|
bool
|
Include an intercept. |
True
|
penalty
|
dict
|
Overrides for |
None
|
control
|
dict
|
|
None
|
colnames
|
list of str
|
Column names (default |
None
|
Returns:
| Type | Description |
|---|---|
RLassoLogitFit
|
|
Examples:
rlasso_effect ¶
rlasso_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], method: str = 'partialling out', post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoEffectResult
Effect of d on y after Lasso-selecting controls x.
Faithful port of hdm::rlassoEffect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
(n, p) controls (array, DataFrame or column names).
|
|
required |
y
|
outcome and the single target regressor.
|
|
required |
d
|
outcome and the single target regressor.
|
|
required |
method
|
('partialling out', 'double selection')
|
See the module docstring. |
"partialling out"
|
post
|
bool
|
Post-Lasso inside the selection steps. |
True
|
I3
|
bool array
|
Amelioration set forced into the control set (double-selection
only) — hdm's |
None
|
data
|
DataFrame backing string/column-name inputs.
|
|
None
|
penalty
|
dict
|
Forwarded to :func: |
None
|
control
|
dict
|
Forwarded to :func: |
None
|
Returns:
| Type | Description |
|---|---|
RLassoEffectResult
|
|
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 12)) # candidate controls
>>> d = X[:, 0] + rng.standard_normal(200) # treatment
>>> y = 1.5 * d + X[:, 1] + rng.standard_normal(200)
>>> res = sp.rlasso_effect(X[:, 1:], y, d, method="partialling out")
>>> float(res.se) > 0
True
>>> bool(np.isfinite(res.alpha)) # ~1.5
True
rlasso_effects ¶
rlasso_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, method: str = 'partialling out', post: bool = True, data: Optional[DataFrame] = None, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> Dict[str, RLassoEffectResult]
Estimate the effect of each targeted column of X on y.
Faithful port of hdm::rlassoEffects: for every target column
j in index, treat column j as d and the remaining
columns as controls.
Returns:
| Type | Description |
|---|---|
dict
|
Mapping |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((200, 8))
>>> y = X[:, 0] - 0.8 * X[:, 1] + rng.standard_normal(200)
>>> out = sp.rlasso_effects(X, y, index=[0, 1], method="partialling out")
>>> len(out) # one result per targeted column
2
>>> all(r.se > 0 for r in out.values())
True
rlasso_iv ¶
rlasso_iv(y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], z: Union[ndarray, DataFrame, Sequence[str]], x: Optional[Union[ndarray, DataFrame, Sequence[str]]] = None, data: Optional[DataFrame] = None, select_Z: bool = True, select_X: bool = True, post: bool = True, intercept: bool = True, penalty: Optional[Dict[str, Any]] = None, control: Optional[Dict[str, Any]] = None) -> RLassoIVResult
Instrumental-variables estimation with rigorous-Lasso selection.
A faithful port of hdm::rlassoIV. Estimates the causal effect of
an endogenous d on y using (potentially many) instruments
z and optional high-dimensional controls x.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
outcome and the endogenous regressor (array, Series or column
|
name). |
required |
d
|
outcome and the endogenous regressor (array, Series or column
|
name). |
required |
z
|
candidate instruments — ``p_z`` may exceed ``n``.
|
|
required |
x
|
exogenous controls (optional).
|
|
None
|
data
|
DataFrame backing any string/column-name inputs.
|
|
None
|
select_Z
|
bool
|
Lasso-select among the instruments. |
True
|
select_X
|
bool
|
Lasso-select among the controls (partialling-out). |
True
|
post
|
bool
|
Post-Lasso (OLS refit) inside every selection step. |
True
|
intercept
|
bool
|
Passed to the underlying |
True
|
penalty
|
dict
|
Forwarded to :func: |
None
|
control
|
dict
|
Forwarded to :func: |
None
|
Returns:
| Type | Description |
|---|---|
RLassoIVResult
|
|
Notes
With select_Z=True, select_X=False this is the Belloni-Chen-
Chernozhukov-Hansen (2012) optimal-instrument estimator that the
eminent-domain application made famous; numbers agree with
hdm::rlassoIV(..., select.X=FALSE, select.Z=TRUE) to ~1e-6.
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> Z = rng.standard_normal((300, 5)) # candidate instruments
>>> X = rng.standard_normal((300, 10)) # candidate controls
>>> d = Z[:, 0] + 0.5 * X[:, 0] + rng.standard_normal(300)
>>> y = 1.0 * d + X[:, 0] + rng.standard_normal(300)
>>> res = sp.rlasso_iv(y=y, d=d, z=Z, x=X, select_Z=True, select_X=False)
>>> float(res.se[0]) > 0 # res.coef[0] ~ 1.0 (BCH optimal-IV)
True
rlassologit_effect ¶
rlassologit_effect(x: Union[ndarray, DataFrame, Sequence[str]], y: Union[ndarray, Series, str], d: Union[ndarray, Series, str], post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> RLassoLogitEffectResult
Effect of d on a binary y after Lasso-selecting controls x.
Faithful port of hdm::rlassologitEffect.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
(n, p) controls (array, DataFrame or column names).
|
|
required |
y
|
binary outcome.
|
|
required |
d
|
the single treatment / target regressor.
|
|
required |
post
|
bool
|
Post-Lasso inside the two selection steps. |
True
|
I3
|
bool array
|
Amelioration set forced into the control set (hdm's |
None
|
data
|
DataFrame backing string/column-name inputs.
|
|
None
|
Returns:
| Type | Description |
|---|---|
RLassoLogitEffectResult
|
|
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.standard_normal((300, 10))
>>> d = (rng.standard_normal(300) + X[:, 0] > 0).astype(float)
>>> eta = 0.7 * d + X[:, 1] - 0.5 * X[:, 2]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> res = sp.rlassologit_effect(X, y, d)
>>> bool(res.se > 0) and bool(np.isfinite(res.alpha))
True
rlassologit_effects ¶
rlassologit_effects(X: Union[ndarray, DataFrame], y: Union[ndarray, Series], index: Optional[Sequence[int]] = None, post: bool = True, I3: Optional[ndarray] = None, data: Optional[DataFrame] = None) -> Dict[str, RLassoLogitEffectResult]
Logistic high-dimensional effect of each targeted column of X.
Faithful port of hdm::rlassologitEffects: for every target column
j in index, treat column j as d and the remaining
columns as controls, with a binary outcome y.
Returns:
| Type | Description |
|---|---|
dict
|
Mapping |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(1)
>>> X = rng.standard_normal((300, 8))
>>> eta = 0.8 * X[:, 0] - 0.6 * X[:, 1]
>>> y = (rng.uniform(size=300) < 1 / (1 + np.exp(-eta))).astype(float)
>>> out = sp.rlassologit_effects(X, y, index=[0, 1])
>>> len(out) == 2 and all(r.se > 0 for r in out.values())
True