Skip to content

statspai.gformula

gformula

Parametric g-formula via Iterative Conditional Expectation (ICE).

Sequential g-computation for longitudinal data with time-varying treatments and time-varying confounding -- the estimator pioneered by Robins (1986) and made tractable in its ICE form by Bang & Robins (2005).

Public API

import statspai as sp result = sp.gformula.ice( ... data=df, ... id_col="id", time_col="t", ... treatment_cols=["A0", "A1", "A2"], ... confounder_cols=["L0", "L1", "L2"], ... outcome_col="Y", ... treatment_strategy=[1, 1, 1], # always-treat ... ) result.summary()

MCGFormulaResult dataclass

Result of one or two Monte-Carlo g-formula arms.

gformula_ice

gformula_ice(*args, **kwargs) -> ICEResult

Alias for :func:ice to match Stata's gformula naming.

gformula_mc

gformula_mc(data: DataFrame, treatment_cols: Sequence[str], confounder_cols: Union[Sequence[str], Sequence[Sequence[str]]], outcome_col: str, *, strategy: Union[Sequence[float], Callable] = (1, 1, 1), control_strategy: Union[Sequence[float], Callable, None] = None, id_col: Optional[str] = None, time_col: Optional[str] = None, n_simulations: int = 10000, bootstrap: int = 200, alpha: float = 0.05, return_trajectories: bool = False, seed: Optional[int] = None) -> MCGFormulaResult

Monte-Carlo parametric g-formula estimate of :math:E[Y(g)].

Fits a conditional density model for each time-varying confounder given the observed history, a regression for the outcome given the full history, and then simulates n_simulations counterfactual trajectories under the user-supplied treatment strategy.

Parameters:

Name Type Description Default
data DataFrame (wide format)

One row per subject. Must contain all treatment, confounder and outcome columns.

required
treatment_cols list[str]

Treatment column names, chronologically ordered.

required
confounder_cols list[str] | list[list[str]]

Either a flat list (same confounders at every timepoint) or a list-of-lists with per-timepoint confounder sets. A confounder whose name ends in an index / time tag can be supplied as a flat list; for true time-varying confounders supply the nested form so the g-formula respects the correct temporal ordering.

required
outcome_col str

Terminal outcome column.

required
strategy Sequence[float] or Callable

Either a static sequence of treatment values (e.g. [1]*K = always-treat), or a callable with signature strategy(t: int, history: dict) -> np.ndarray of length n_sim for dynamic regimes that depend on simulated state.

(1,1,1)
control_strategy optional, same type as ``strategy``

If provided, a second arm is simulated under the control strategy and a contrast (treat − control) is reported.

None
id_col str

Unused when data is already wide; kept for API symmetry with :func:ice.

None
time_col str

Unused when data is already wide; kept for API symmetry with :func:ice.

None
n_simulations int

Monte-Carlo trajectories per arm.

10 000
bootstrap int

Non-parametric bootstrap replicates for SE / CI. Set to 0 to skip inference (point estimate only).

200
alpha float
0.05
return_trajectories bool

Attach the simulated trajectories DataFrame to the result.

False
seed int
None

Returns:

Type Description
MCGFormulaResult

Examples:

Static always-treat strategy:

>>> res = sp.gformula_mc(
...     df,
...     treatment_cols=['A0', 'A1', 'A2'],
...     confounder_cols=[['L0'], ['L1'], ['L2']],
...     outcome_col='Y',
...     strategy=[1, 1, 1],
...     control_strategy=[0, 0, 0],
... )
>>> print(res.summary())

Dynamic regime: treat only when a time-varying biomarker is high.

>>> def dynamic(t, hist):
...     return (hist[f'L{t}'] > 0).astype(float)
>>> res = sp.gformula_mc(..., strategy=dynamic)