`statspai.tmle`¶

tmle ¶

Targeted Maximum Likelihood Estimation (TMLE) with Super Learner.

TMLE is a doubly robust, semiparametrically efficient estimator for causal effects that combines initial outcome regression with a targeted bias-correction step using the propensity score.

Components

TMLE : Full TMLE estimator for ATE/ATT with targeting step
SuperLearner : Ensemble learner for nuisance parameter estimation

References

van der Laan, M. J. & Rose, S. (2011). Targeted Learning: Causal Inference for Observational and Experimental Data. Springer Series in Statistics. [@vanderlaan2011targeted]

van der Laan, M. J., Polley, E. C., & Hubbard, A. E. (2007). Super Learner. Statistical Applications in Genetics and Molecular Biology, 6(1). [@vanderlaan2007super]

TMLE ¶

Targeted Maximum Likelihood Estimation.

Parameters:

Name	Type	Default
`data`	`DataFrame`	required
`y`	`str`	required
`treat`	`str`	required
`covariates`	`list of str`	required
`outcome_library`	`list of sklearn estimators`	`None`
`propensity_library`	`list of sklearn estimators`	`None`
`n_folds`	`int`	`5`
`estimand`	`str`	`'ATE'`
`alpha`	`float`	`0.05`
`propensity_bounds`	`tuple`	`(0.025, 0.975)`
`random_state`	`int`	`42`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> x1 = rng.normal(size=n)
>>> x2 = rng.normal(size=n)
>>> ps = 1 / (1 + np.exp(-(0.5 * x1 - 0.3 * x2)))
>>> treat = rng.binomial(1, ps)
>>> y = 1.0 * treat + 0.8 * x1 - 0.5 * x2 + rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'treat': treat, 'x1': x1, 'x2': x2})
>>> est = sp.TMLE(df, y='y', treat='treat', covariates=['x1', 'x2'],
...               n_folds=2, random_state=0)
>>> res = est.fit()
>>> bool(hasattr(res, 'estimate'))
True
>>> bool(res.se > 0)
True

fit ¶

fit() -> CausalResult

Run TMLE and return causal effect estimates.

SuperLearner ¶

Super Learner ensemble (van der Laan et al. 2007).

Parameters:

Name	Type	Description	Default
`library`	`list of sklearn estimators`	Candidate learners. If None, uses a default diverse library.	`None`
`n_folds`	`int`	Number of cross-validation folds.	`5`
`task`	`str`	'regression' or 'classification'.	`'regression'`
`random_state`	`int`		`42`

Examples:

>>> import numpy as np
>>> import statspai as sp
>>> from sklearn.linear_model import LinearRegression, Ridge
>>> rng = np.random.default_rng(42)
>>> n = 400
>>> X = rng.normal(size=(n, 3))
>>> y = X @ np.array([1.0, -0.5, 0.3]) + rng.normal(scale=0.5, size=n)
>>> sl = sp.SuperLearner(
...     library=[LinearRegression(), Ridge(alpha=1.0)],
...     n_folds=3,
... ).fit(X, y)
>>> round(float(sl.weights_.sum()), 2)  # simplex weights
1.0
>>> sl.predict(X[:5]).shape
(5,)

References

vanderlaan2007super

fit ¶

fit(X: Any, y: Any) -> SuperLearner

Fit the Super Learner.

Get cross-validated predictions from each base learner.
Find optimal weights via simplex-constrained least squares.
Refit all base learners on full data.

predict ¶

predict(X: Any) -> ndarray

Predict using the weighted ensemble.

For classification (task='classification') the returned values are the convex combination of base-learner probability predictions and are clipped to (1e-6, 1 - 1e-6) so that callers can take logit(.) without inf. For regression no clipping is applied.

Parameters:

Name	Type	Description	Default
`X`	`ndarray(n, p)`		required

Returns:

Type	Description
`ndarray(n)`

predict_proba ¶

predict_proba(X: Any) -> ndarray

Predict probabilities (for classification task).

Identical to :meth:predict under task='classification' — kept as a separate method for sklearn-style API parity.

summary ¶

summary() -> str

Print Super Learner summary.

LTMLEResult `dataclass` ¶

Bases: ResultProtocolMixin

Structured output of :func:ltmle.

Holds the treated/control marginal means, their ATE contrast, and the associated inference (se, ci, pvalue).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> l0 = rng.normal(size=n)
>>> a0 = rng.binomial(1, 1 / (1 + np.exp(-0.4 * l0)))
>>> l1 = 0.3 * l0 + 0.2 * a0 + rng.normal(size=n)
>>> a1 = rng.binomial(1, 1 / (1 + np.exp(-0.4 * l1)))
>>> y = 1.0 + 0.4 * a0 + 0.3 * a1 + 0.2 * l1 + rng.normal(scale=0.2, size=n)
>>> df = pd.DataFrame({"L0": l0, "A0": a0, "L1": l1, "A1": a1, "Y": y})
>>> res = sp.ltmle(
...     df, y="Y", treatments=["A0", "A1"],
...     covariates_time=[["L0"], ["L1"]],
... )
>>> isinstance(res, sp.LTMLEResult)
True
>>> float(res.ate)
0.71

LTMLESurvivalResult `dataclass` ¶

Bases: ResultProtocolMixin

Counterfactual survival curves and contrasts.

Produced by :func:ltmle_survival. Holds the treated/control counterfactual survival curves, the restricted-mean survival time (RMST) contrast with its standard error and confidence interval, and the terminal risk difference. Use :meth:to_frame for a tidy per-interval survival table.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n, K = 200, 3
>>> A = rng.integers(0, 2, size=(n, K))
>>> L = rng.normal(0, 1, size=(n, K))
>>> T = np.zeros((n, K), dtype=int)
>>> for k in range(K):
...     haz = 1 / (1 + np.exp(-(-1.5 - 0.5 * A[:, k] + 0.3 * L[:, k])))
...     T[:, k] = (rng.uniform(size=n) < haz).astype(int)
>>> df = pd.DataFrame({
...     **{f"T{k+1}": T[:, k] for k in range(K)},
...     **{f"A{k+1}": A[:, k] for k in range(K)},
...     **{f"L{k+1}": L[:, k] for k in range(K)},
... })
>>> res = sp.ltmle_survival(
...     df,
...     event_indicators=["T1", "T2", "T3"],
...     treatments=["A1", "A2", "A3"],
...     covariates_time=[["L1"], ["L2"], ["L3"]],
... )
>>> isinstance(res, sp.LTMLESurvivalResult)
True
>>> len(res.times)
3

HALRegressor ¶

Bases: _BaseHAL

L1-penalised HAL regressor (sklearn-compatible duck-typed API).

Fits an L1-penalised regression on a main-effects HAL basis of per-feature step functions. Exposes the .fit / .predict interface so it can be dropped into cross-fitting pipelines such as :func:sp.tmle and :func:sp.hal_tmle.

Parameters:

Name	Type	Description	Default
`lambda_`	`float`	L1 penalty. `None` selects it via cross-validation.	`None`
`max_anchors_per_col`	`int`	Cap on step-function anchor points per feature.	`40`
`cv`	`int`	Folds for the CV penalty search (used only when `lambda_` is None).	`5`
`random_state`	`int`		`0`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(200, 3))
>>> y = X[:, 0] + 0.5 * X[:, 1] + rng.normal(size=200)
>>> reg = sp.HALRegressor(max_anchors_per_col=10).fit(X, y)
>>> reg.predict(X).shape
(200,)

HALClassifier ¶

Bases: _BaseHAL

L1-penalised HAL logistic classifier (sklearn-compatible duck-typed API).

Fits an L1-penalised logistic regression on a main-effects HAL basis of per-feature step functions. Exposes .fit / .predict / .predict_proba so it can serve as the propensity learner in :func:sp.tmle and :func:sp.hal_tmle.

Parameters:

Name	Type	Description	Default
`C`	`float`	Inverse L1 penalty (larger = less shrinkage), as in scikit-learn.	`1.0`
`max_anchors_per_col`	`int`	Cap on step-function anchor points per feature.	`40`
`random_state`	`int`		`0`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> X = rng.normal(size=(200, 3))
>>> y = (X[:, 0] + rng.normal(size=200) > 0).astype(int)
>>> clf = sp.HALClassifier(max_anchors_per_col=10).fit(X, y)
>>> clf.predict_proba(X).shape
(200, 2)
>>> [int(c) for c in clf.classes_]
[0, 1]

statspai.tmle¶

tmle ¶

TMLE ¶

fit ¶

SuperLearner ¶

fit ¶

predict ¶

predict_proba ¶

summary ¶

LTMLEResult dataclass ¶

LTMLESurvivalResult dataclass ¶

HALRegressor ¶

HALClassifier ¶

`statspai.tmle`¶

LTMLEResult `dataclass` ¶

LTMLESurvivalResult `dataclass` ¶