statspai.gformula¶
gformula ¶
Parametric g-formula via Iterative Conditional Expectation (ICE).
Sequential g-computation for longitudinal data with time-varying treatments and time-varying confounding -- the estimator pioneered by Robins (1986) and made tractable in its ICE form by Bang & Robins (2005).
Public API
import statspai as sp result = sp.gformula.ice( ... data=df, ... id_col="id", time_col="t", ... treatment_cols=["A0", "A1", "A2"], ... confounder_cols=["L0", "L1", "L2"], ... outcome_col="Y", ... treatment_strategy=[1, 1, 1], # always-treat ... ) result.summary()
MCGFormulaResult
dataclass
¶
Result of one or two Monte-Carlo g-formula arms.
gformula_ice ¶
Alias for :func:ice to match Stata's gformula naming.
gformula_mc ¶
gformula_mc(data: DataFrame, treatment_cols: Sequence[str], confounder_cols: Union[Sequence[str], Sequence[Sequence[str]]], outcome_col: str, *, strategy: Union[Sequence[float], Callable] = (1, 1, 1), control_strategy: Union[Sequence[float], Callable, None] = None, id_col: Optional[str] = None, time_col: Optional[str] = None, n_simulations: int = 10000, bootstrap: int = 200, alpha: float = 0.05, return_trajectories: bool = False, seed: Optional[int] = None) -> MCGFormulaResult
Monte-Carlo parametric g-formula estimate of :math:E[Y(g)].
Fits a conditional density model for each time-varying confounder
given the observed history, a regression for the outcome given the
full history, and then simulates n_simulations counterfactual
trajectories under the user-supplied treatment strategy.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame (wide format)
|
One row per subject. Must contain all treatment, confounder and outcome columns. |
required |
treatment_cols
|
list[str]
|
Treatment column names, chronologically ordered. |
required |
confounder_cols
|
list[str] | list[list[str]]
|
Either a flat list (same confounders at every timepoint) or a list-of-lists with per-timepoint confounder sets. A confounder whose name ends in an index / time tag can be supplied as a flat list; for true time-varying confounders supply the nested form so the g-formula respects the correct temporal ordering. |
required |
outcome_col
|
str
|
Terminal outcome column. |
required |
strategy
|
Sequence[float] or Callable
|
Either a static sequence of treatment values (e.g. |
(1,1,1)
|
control_strategy
|
optional, same type as ``strategy``
|
If provided, a second arm is simulated under the control strategy and a contrast (treat − control) is reported. |
None
|
id_col
|
str
|
Unused when |
None
|
time_col
|
str
|
Unused when |
None
|
n_simulations
|
int
|
Monte-Carlo trajectories per arm. |
10 000
|
bootstrap
|
int
|
Non-parametric bootstrap replicates for SE / CI. Set to 0 to skip inference (point estimate only). |
200
|
alpha
|
float
|
|
0.05
|
return_trajectories
|
bool
|
Attach the simulated trajectories DataFrame to the result. |
False
|
seed
|
int
|
|
None
|
Returns:
| Type | Description |
|---|---|
MCGFormulaResult
|
|
Examples:
Static always-treat strategy:
>>> res = sp.gformula_mc(
... df,
... treatment_cols=['A0', 'A1', 'A2'],
... confounder_cols=[['L0'], ['L1'], ['L2']],
... outcome_col='Y',
... strategy=[1, 1, 1],
... control_strategy=[0, 0, 0],
... )
>>> print(res.summary())
Dynamic regime: treat only when a time-varying biomarker is high.