`statspai.target_trial`¶

target_trial ¶

Target Trial Emulation (sp.target_trial).

JAMA 2022 framework — the unifying language for causal inference from observational data. Use to formalize the target trial before analysis, then delegate estimation to sp.msm / sp.tmle / sp.ltmle.

Quick start

import statspai as sp proto = sp.target_trial.protocol( ... eligibility="age >= 50 and diabetic == 1", ... treatment_strategies=["statin at t0", "no statin"], ... assignment="observational emulation", ... time_zero="date of diabetes diagnosis", ... followup_end="min(death, loss, 5y)", ... outcome="incident MI", ... causal_contrast="per-protocol", ... analysis_plan="clone-censor-weight + pooled logistic + IPCW", ... baseline_covariates=["age", "sex", "bmi", "ldl"], ... ) print(proto.summary())

TargetTrialProtocol `dataclass` ¶

Formal 7-component target trial protocol.

Parameters:

Name	Type	Description	Default
`eligibility`	`str \| list[str] \| Callable`	Entry criteria at time zero. May be a SQL-like string filter, a list of human-readable conditions, or a predicate that takes a DataFrame row and returns bool.	required
`treatment_strategies`	`list[str]`	Named arms being contrasted, e.g. `["initiate statin at t0", "no statin"]`. At least two strategies required.	required
`assignment`	`str`	"randomization" (for RCT) or "observational emulation" (conditional exchangeability assumed given baseline covariates).	required
`time_zero`	`str`	Explicit rule defining time zero — usually the moment of eligibility + treatment assignment alignment. This is the single most important field for preventing immortal time bias.	required
`followup_end`	`str`	Administrative censoring rule (e.g. `"min(event, loss, 2026-01-01)"`).	required
`outcome`	`str`	Primary outcome definition.	required
`causal_contrast`	`str`	One of `"ITT"`, `"per-protocol"`, `"as-treated"`, `"observational-analogue"`.	required
`analysis_plan`	`str`	Which estimator will recover the target estimand (e.g. `"IPW + Cox"`, `"g-formula + pooled logistic"`).	required
`baseline_covariates`	`list[str]`	Covariates measured at or before time zero that render treatment assignment conditionally exchangeable.	`list()`
`time_varying_covariates`	`list[str]`	Post-baseline covariates affected by prior treatment; trigger g-methods (MSM / parametric g-formula / LTMLE).	`list()`
`notes`	`str`	Free-form notes (e.g. known sources of confounding).	`''`

Examples:

>>> import statspai as sp
>>> proto = sp.target_trial_protocol(
...     eligibility="age >= 40 and ldl > 130",
...     treatment_strategies=["initiate statin at t0", "no statin"],
...     assignment="observational emulation",
...     time_zero="first eligible visit",
...     followup_end="min(event, loss, 5 years)",
...     outcome="incident MI within 5y",
...     causal_contrast="ITT",
...     baseline_covariates=["age", "ldl", "diabetes"],
... )
>>> type(proto).__name__
'TargetTrialProtocol'
>>> proto.causal_contrast
'ITT'
>>> print(proto.summary().splitlines()[0])
Target Trial Protocol

summary ¶

summary() -> str

Return a human-readable protocol summary.

TargetTrialResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of target trial emulation.

Attributes:

Name	Type	Description
`protocol`	`TargetTrialProtocol`	The protocol that was emulated.
`estimate`	`float`	Point estimate of the causal contrast under the protocol.
`se`	`float`	Analytic standard error (IPW sandwich or bootstrap).
`ci`	`tuple[float, float]`	95% confidence interval.
`n_eligible`	`int`	Subjects passing the eligibility criterion at time zero.
`n_excluded_immortal`	`int`	Subjects excluded to prevent immortal time bias.
`weights`	`ndarray`	IP weights used (baseline + censoring combined).
`method`	`str`	Which analysis plan was executed.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> proto = sp.target_trial_protocol(
...     eligibility="age >= 40 and ldl > 130",
...     treatment_strategies=["statin", "no statin"],
...     assignment="observational emulation",
...     time_zero="first eligible visit",
...     followup_end="5 years",
...     outcome="incident MI",
... )
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> df = pd.DataFrame({
...     "age": rng.integers(40, 70, n),
...     "ldl": rng.normal(150, 20, n),
...     "statin": rng.integers(0, 2, n),
... })
>>> df["mi"] = (rng.random(n) < 0.2).astype(int)
>>> res = sp.target_trial_emulate(
...     proto, df, outcome_col="mi", treatment_col="statin")
>>> type(res).__name__
'TargetTrialResult'
>>> res.n_eligible
335
>>> res.n_excluded_immortal
65

to_paper ¶

to_paper(fmt: str = 'markdown', title: str | None = None) -> str

Render a structured Methods/Results block.

See :func:statspai.target_trial.to_paper for details.

CloneCensorWeightResult `dataclass` ¶

Bases: ResultProtocolMixin

Cloned, censored and IP-of-censoring-weighted target-trial data.

Produced by :func:clone_censor_weight. cloned_data holds one row per (id, time, strategy) surviving artificial censoring, with an _ipcw weight column; weights_summary reports the mean / min / max weight.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for sid in range(50):
...     for t in range(4):
...         rows.append({"id": sid, "time": t,
...                      "treat": int(rng.random() < 0.6),
...                      "age": 40 + rng.normal()})
>>> df = pd.DataFrame(rows)
>>> strategies = {
...     "always_treat": lambda g: g["treat"].to_numpy() == 1,
...     "never_treat": lambda g: g["treat"].to_numpy() == 0,
... }
>>> res = sp.clone_censor_weight(df, id_col="id", time_col="time",
...                              treatment_col="treat", strategies=strategies)
>>> type(res).__name__
'CloneCensorWeightResult'
>>> res.n_originals
50
>>> res.strategies
['always_treat', 'never_treat']

clone_censor_weight ¶

clone_censor_weight(data: DataFrame, id_col: str, time_col: str, treatment_col: str, strategies: dict[str, Callable[[DataFrame], ndarray]], censor_covariates: Sequence[str] | None = None, stabilize: bool = True) -> CloneCensorWeightResult

Clone-censor-weight each subject across target-trial strategies.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format (one row per subject-time).	required
`id_col`	`str`	Column names identifying subject, time, and treatment exposure.	required
`time_col`	`str`	Column names identifying subject, time, and treatment exposure.	required
`treatment_col`	`str`	Column names identifying subject, time, and treatment exposure.	required
`strategies`	`dict[str, Callable]`	Map strategy name → predicate taking a subject's DataFrame and returning a boolean np.ndarray (True where the observed treatment is consistent with the strategy at that time).	required
`censor_covariates`	`list[str]`	Covariates used to estimate IP-of-censoring weights after cloning. Defaults to all non-key columns.	`None`
`stabilize`	`bool`	Use stabilized IPC weights.	`True`

Returns:

Type	Description
`CloneCensorWeightResult`	`cloned_data` holds one row per (id, time, strategy) surviving artificial censoring, with an `_ipcw` weight column.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for sid in range(50):
...     for t in range(4):
...         rows.append({"id": sid, "time": t,
...                      "treat": int(rng.random() < 0.6),
...                      "age": 40 + rng.normal()})
>>> df = pd.DataFrame(rows)
>>> strategies = {
...     "always_treat": lambda g: g["treat"].to_numpy() == 1,
...     "never_treat": lambda g: g["treat"].to_numpy() == 0,
... }
>>> res = sp.clone_censor_weight(df, id_col="id", time_col="time",
...                              treatment_col="treat", strategies=strategies)
>>> res.n_originals
50
>>> res.strategies
['always_treat', 'never_treat']
>>> "_ipcw" in res.cloned_data.columns
True

immortal_time_check ¶

immortal_time_check(data: DataFrame, id_col: str, time_col: str, treatment_start_col: str, eligibility_time_col: str) -> ImmortalTimeDiagnostic

Flag subjects whose follow-up begins before treatment initiation — the textbook recipe for immortal time bias.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`id_col`	`str`		required
`time_col`	`str`	Follow-up time column.	required
`treatment_start_col`	`str`	Time of treatment initiation (NaN / `-inf` if never treated).	required
`eligibility_time_col`	`str`	The protocol's defined time zero.	required

Returns:

Type	Description
`ImmortalTimeDiagnostic`

Examples:

>>> import pandas as pd
>>> import statspai as sp
>>> data = pd.DataFrame({
...     "id": [1, 2, 3, 4],
...     "fu_time": [12.0, 8.0, 24.0, 6.0],
...     "tx_start": [3.0, 0.0, 5.0, 1.0],
...     "elig_time": [0.0, 0.0, 0.0, 2.0],  # id 4: treated before eligible
... })
>>> diag = sp.immortal_time_check(
...     data, id_col="id", time_col="fu_time",
...     treatment_start_col="tx_start", eligibility_time_col="elig_time")
>>> diag.n_flagged
1
>>> diag.flagged_ids
[4]

to_paper ¶

to_paper(result: TargetTrialResult, *, fmt: Literal['markdown', 'latex', 'text', 'target', 'jama', 'bmj'] = 'markdown', title: Optional[str] = None, journal: Optional[str] = None, authors: Optional[str] = None, funding: Optional[str] = None, registration: Optional[str] = None, data_availability: Optional[str] = None, background: Optional[str] = None, limitations: Optional[str] = None) -> str

Render a target trial emulation result as a structured Methods/Results block.

Parameters:

Name	Type	Description	Default
`result`	`TargetTrialResult`	Output of :func:`sp.target_trial.emulate`.	required
`fmt`	`('markdown', 'latex', 'text', 'target', 'jama', 'bmj')`	`'target'` renders the JAMA/BMJ 2025 TARGET 21-item checklist as Markdown; `'jama'` / `'bmj'` renders a structured JAMA/BMJ-style manuscript that fills in all 21 TARGET items that can be auto-derived from the protocol + result, flagging the remaining items for author attention; other formats render the shorter STROBE-style Methods & Results block.	`'markdown'`
`title`	`Optional[str]`		`None`
`journal`	`Optional[str]`		`None`
`authors`	`Optional[str]`		`None`
`funding`	`Optional[str]`		`None`
`registration`	`Optional[str]`		`None`
`data_availability`	`Optional[str]`		`None`
`background`	`str`	Used by the `'jama'` / `'bmj'` renderer to populate TARGET items that cannot be derived automatically (title, funding, registration, data statement, background narrative, and limitations). Missing values are rendered as `(supply text)` placeholders.	`None`
`limitations`	`str`	Used by the `'jama'` / `'bmj'` renderer to populate TARGET items that cannot be derived automatically (title, funding, registration, data statement, background narrative, and limitations). Missing values are rendered as `(supply text)` placeholders.	`None`

Returns:

Type	Description
`str`

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> proto = sp.target_trial_protocol(
...     eligibility="age >= 40 and ldl > 130",
...     treatment_strategies=["statin", "no statin"],
...     assignment="observational emulation",
...     time_zero="first eligible visit",
...     followup_end="5 years",
...     outcome="incident MI",
... )
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> df = pd.DataFrame({
...     "age": rng.integers(40, 70, n),
...     "ldl": rng.normal(150, 20, n),
...     "statin": rng.integers(0, 2, n),
... })
>>> df["mi"] = (rng.random(n) < 0.2).astype(int)
>>> res = sp.target_trial_emulate(
...     proto, df, outcome_col="mi", treatment_col="statin")
>>> report = sp.target_trial_report(res, fmt="markdown")
>>> bool("Methods" in report)
True

target_checklist ¶

target_checklist(result: TargetTrialResult, *, fmt: Literal['markdown', 'text'] = 'markdown') -> str

Render the TARGET-Statement 21-item checklist as a completed table.

Each item is tagged [AUTO] if we can fill it from the :class:TargetTrialProtocol + :class:TargetTrialResult pair, or [TODO] if the author still needs to supply text (e.g. discussion and funding). Intended for use as manuscript supplementary material — the paper itself still needs hand-written narrative.

Parameters:

Name	Type	Description	Default
`result`	`TargetTrialResult`		required
`fmt`	`('markdown', 'text')`		`'markdown'`

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> proto = sp.target_trial_protocol(
...     eligibility="age >= 40 and ldl > 130",
...     treatment_strategies=["statin", "no statin"],
...     assignment="observational emulation",
...     time_zero="first eligible visit",
...     followup_end="5 years",
...     outcome="incident MI",
... )
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> df = pd.DataFrame({
...     "age": rng.integers(40, 70, n),
...     "ldl": rng.normal(150, 20, n),
...     "statin": rng.integers(0, 2, n),
... })
>>> df["mi"] = (rng.random(n) < 0.2).astype(int)
>>> res = sp.target_trial_emulate(
...     proto, df, outcome_col="mi", treatment_col="statin")
>>> chk = sp.target_trial_checklist(res)
>>> print(chk.splitlines()[0])
# TARGET Statement — 21-item Reporting Checklist

References

Hernán et al. (JAMA 2025; BMJ 2025). TARGET Statement: Transparent Reporting of Observational Studies Emulating a Target Trial.

statspai.target_trial¶

target_trial ¶

TargetTrialProtocol dataclass ¶

summary ¶

TargetTrialResult dataclass ¶

to_paper ¶

CloneCensorWeightResult dataclass ¶

clone_censor_weight ¶

immortal_time_check ¶

to_paper ¶

target_checklist ¶

`statspai.target_trial`¶

TargetTrialProtocol `dataclass` ¶

TargetTrialResult `dataclass` ¶

CloneCensorWeightResult `dataclass` ¶