`statspai.gformula`¶

gformula ¶

Parametric g-formula via Iterative Conditional Expectation (ICE).

Sequential g-computation for longitudinal data with time-varying treatments and time-varying confounding -- the estimator pioneered by Robins (1986) and made tractable in its ICE form by Bang & Robins (2005).

Public API

import statspai as sp result = sp.gformula.ice( ... data=df, ... id_col="id", time_col="t", ... treatment_cols=["A0", "A1", "A2"], ... confounder_cols=["L0", "L1", "L2"], ... outcome_col="Y", ... treatment_strategy=[1, 1, 1], # always-treat ... ) result.summary()

ICEResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of the iterative conditional expectation (ICE) g-formula.

Returned by :func:sp.gformula_ice_fn. Holds the estimated mean outcome under the intervention strategy together with its standard error and confidence interval.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> l0 = rng.normal(size=n)
>>> a0 = rng.binomial(1, 0.5, size=n)
>>> l1 = 0.5 * l0 + 0.3 * a0 + rng.normal(size=n)
>>> a1 = rng.binomial(1, 0.5, size=n)
>>> y = (1.0 + 0.8 * a0 + 1.2 * a1 + 0.5 * l0 + 0.4 * l1
...      + rng.normal(size=n))
>>> df = pd.DataFrame({"id": range(n), "L0": l0, "A0": a0,
...                    "L1": l1, "A1": a1, "Y": y})
>>> res = sp.gformula_ice_fn(
...     df, id_col="id", time_col="id",
...     treatment_cols=["A0", "A1"],
...     confounder_cols=[["L0"], ["L1"]],
...     outcome_col="Y", treatment_strategy=[1, 1])
>>> isinstance(res, sp.ICEResult)
True
>>> res.strategy
[1, 1]

MCGFormulaResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of one or two Monte-Carlo g-formula arms.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(6)
>>> L0 = rng.normal(size=300)
>>> A0 = (rng.uniform(size=300) < 0.5).astype(float)
>>> L1 = 0.5 * L0 + 0.3 * A0 + rng.normal(size=300)
>>> A1 = (rng.uniform(size=300) < 0.5).astype(float)
>>> Y = 1.0 + 0.5 * A0 + 0.5 * A1 + 0.3 * L1 + rng.normal(size=300)
>>> df = pd.DataFrame({"L0": L0, "A0": A0, "L1": L1, "A1": A1, "Y": Y})
>>> res = sp.gformula_mc(
...     df, treatment_cols=["A0", "A1"],
...     confounder_cols=[["L0"], ["L1"]],
...     outcome_col="Y", strategy=[1, 1], control_strategy=[0, 0],
...     n_simulations=500, bootstrap=20, seed=0)
>>> type(res).__name__
'MCGFormulaResult'
>>> bool(res.contrast_value is not None)
True

gformula_ice ¶

gformula_ice(*args: Any, **kwargs: Any) -> ICEResult

Alias for :func:ice to match Stata's gformula naming.

gformula_mc ¶

gformula_mc(data: DataFrame, treatment_cols: Sequence[str], confounder_cols: Union[Sequence[str], Sequence[Sequence[str]]], outcome_col: str, *, strategy: Union[Sequence[float], Callable] = (1, 1, 1), control_strategy: Union[Sequence[float], Callable, None] = None, id_col: Optional[str] = None, time_col: Optional[str] = None, n_simulations: int = 10000, bootstrap: int = 200, alpha: float = 0.05, return_trajectories: bool = False, seed: Optional[int] = None) -> MCGFormulaResult

Monte-Carlo parametric g-formula estimate of :math:E[Y(g)].

Fits a conditional density model for each time-varying confounder given the observed history, a regression for the outcome given the full history, and then simulates n_simulations counterfactual trajectories under the user-supplied treatment strategy.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame (wide format)`	One row per subject. Must contain all treatment, confounder and outcome columns.	required
`treatment_cols`	`list[str]`	Treatment column names, chronologically ordered.	required
`confounder_cols`	`list[str] \| list[list[str]]`	Either a flat list (same confounders at every timepoint) or a list-of-lists with per-timepoint confounder sets. A confounder whose name ends in an index / time tag can be supplied as a flat list; for true time-varying confounders supply the nested form so the g-formula respects the correct temporal ordering.	required
`outcome_col`	`str`	Terminal outcome column.	required
`strategy`	`Sequence[float] or Callable`	Either a static sequence of treatment values (e.g. `[1]*K` = always-treat), or a callable with signature `strategy(t: int, history: dict) -> np.ndarray of length n_sim` for dynamic regimes that depend on simulated state.	`(1,1,1)`
`control_strategy`	optional, same type as ``strategy``	If provided, a second arm is simulated under the control strategy and a contrast (treat − control) is reported.	`None`
`id_col`	`str`	Unused when `data` is already wide; kept for API symmetry with :func:`ice`.	`None`
`time_col`	`str`	Unused when `data` is already wide; kept for API symmetry with :func:`ice`.	`None`
`n_simulations`	`int`	Monte-Carlo trajectories per arm.	`10 000`
`bootstrap`	`int`	Non-parametric bootstrap replicates for SE / CI. Set to 0 to skip inference (point estimate only).	`200`
`alpha`	`float`		`0.05`
`return_trajectories`	`bool`	Attach the simulated trajectories DataFrame to the result.	`False`
`seed`	`int`		`None`

Returns:

Type	Description
`MCGFormulaResult`

Examples:

Static always-treat vs. never-treat strategy on a wide three-period panel with one time-varying confounder per period:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> L0 = rng.normal(0, 1, n)
>>> A0 = (rng.uniform(size=n) < 1 / (1 + np.exp(-L0))).astype(float)
>>> L1 = 0.5 * L0 + 0.5 * A0 + rng.normal(0, 1, n)
>>> A1 = (rng.uniform(size=n) < 1 / (1 + np.exp(-L1))).astype(float)
>>> L2 = 0.5 * L1 + 0.5 * A1 + rng.normal(0, 1, n)
>>> A2 = (rng.uniform(size=n) < 1 / (1 + np.exp(-L2))).astype(float)
>>> Y = 1.0 + 0.8 * (A0 + A1 + A2) + 0.5 * L2 + rng.normal(0, 1, n)
>>> df = pd.DataFrame({'L0': L0, 'A0': A0, 'L1': L1, 'A1': A1,
...                    'L2': L2, 'A2': A2, 'Y': Y})
>>> res = sp.gformula_mc(
...     df,
...     treatment_cols=['A0', 'A1', 'A2'],
...     confounder_cols=[['L0'], ['L1'], ['L2']],
...     outcome_col='Y',
...     strategy=[1, 1, 1],
...     control_strategy=[0, 0, 0],
...     n_simulations=2000, bootstrap=50, seed=0,
... )
>>> bool(np.isfinite(res.value))
True

Dynamic regime: treat at each period only when the current time-varying confounder is positive.

>>> def dynamic(t, hist):
...     return (hist[f'L{t}'] > 0).astype(float)
>>> res2 = sp.gformula_mc(
...     df,
...     treatment_cols=['A0', 'A1', 'A2'],
...     confounder_cols=[['L0'], ['L1'], ['L2']],
...     outcome_col='Y',
...     strategy=dynamic,
...     n_simulations=2000, bootstrap=0, seed=0,
... )
>>> bool(np.isfinite(res2.value))
True

statspai.gformula¶

gformula ¶

ICEResult dataclass ¶

MCGFormulaResult dataclass ¶

gformula_ice ¶

gformula_mc ¶

`statspai.gformula`¶

ICEResult `dataclass` ¶

MCGFormulaResult `dataclass` ¶