`statspai.survival`¶

survival ¶

Survival and duration analysis models.

CoxResult ¶

Bases: EconometricResults

Result from Cox proportional hazards estimation.

Extends EconometricResults with survival-specific methods: .plot(), .ph_test(), .baseline_hazard(), .concordance.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 120
>>> x = rng.normal(size=n)
>>> time = rng.exponential(scale=np.exp(-0.5 * x))
>>> event = (rng.random(n) < 0.8).astype(int)
>>> df = pd.DataFrame({"time": time, "status": event, "x": x})
>>> res = sp.cox(data=df, duration="time", event="status", x=["x"])
>>> type(res).__name__
'CoxResult'
>>> list(res.params.index)
['x']
>>> bool(isinstance(float(res.concordance), float))
True
>>> list(res.baseline_hazard().columns)
['time', 'baseline_cumhaz', 'baseline_survival']
>>> isinstance(res.summary(), str)
True

concordance `property` ¶

concordance: float

Harrell's C-statistic (concordance index).

baseline_hazard ¶

baseline_hazard() -> DataFrame

Baseline cumulative hazard (Breslow estimator).

Returns:

Type	Description
`DataFrame`	Columns `time`, `baseline_cumhaz`, `baseline_survival`.

ph_test ¶

ph_test() -> DataFrame

Test the proportional hazards assumption via Schoenfeld residuals.

Computes the correlation of scaled Schoenfeld residuals with time for each covariate and reports a chi-squared test.

Returns:

Type	Description
`DataFrame`	Columns: `variable`, `rho`, `chi2`, `p_value`.

plot ¶

plot(kind: str = 'survival', ax: Any = None, **kwargs: Any) -> Any

Plot survival-related curves.

Parameters:

Name	Type	Description	Default
`kind`	`str`	`'survival'` — baseline survival curve, `'hazard'` — baseline cumulative hazard, `'hr'` — hazard ratio forest plot.	`'survival'`
`ax`	`Axes`		`None`

KMResult ¶

Bases: ResultProtocolMixin

Kaplan-Meier survival analysis result.

Attributes:

Name	Type	Description
`survival_table`	`DataFrame`	Life table with `time`, `n_risk`, `n_event`, `n_censor`, `survival`, `std_err`, `ci_lower`, `ci_upper`.
`median_survival`	`float or dict`	Median survival time (per group if groups present).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 120
>>> df = pd.DataFrame({
...     "time": rng.exponential(10, n).round(2),
...     "status": (rng.random(n) < 0.7).astype(int),
... })
>>> km = sp.kaplan_meier(data=df, duration="time", event="status")
>>> type(km).__name__
'KMResult'
>>> "survival" in km.survival_table.columns
True
>>> bool(0.0 <= km.survival_table["survival"].iloc[-1] <= 1.0)
True
>>> isinstance(km.summary(), str)
True

survival_table `property` ¶

survival_table: DataFrame

Life table (combined if multiple groups).

median_survival `property` ¶

median_survival: Union[float, Dict[str, float]]

Median survival time (scalar or dict by group).

summary ¶

summary() -> str

Formatted summary of the Kaplan-Meier analysis.

plot ¶

plot(ax: Any = None, **kwargs: Any) -> Any

Plot Kaplan-Meier survival curves with confidence bands.

Parameters:

Name	Type	Description	Default
`ax`	`Axes`		`None`

CumIncResult `dataclass` ¶

Bases: ResultProtocolMixin

Cumulative incidence functions for competing risks.

Attributes:

Name	Type	Description
`cif_table`	`DataFrame`	Long table with columns `group` (`"all"` when ungrouped), `cause`, `time`, `cif`, `se`, `ci_lower`, `ci_upper`.
`causes`	`list`	The competing-cause labels (excluding the `0` censoring code).
`gray_test`	`dict or None`	`cause -> {"statistic", "df", "p_value"}` from Gray's K-sample test, or `None` when `group` was not supplied.
`alpha`	`float`	Significance level used for the confidence bands.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> t1 = rng.exponential(scale=1.0, size=n)
>>> t2 = rng.exponential(scale=1.5, size=n)
>>> cens = rng.exponential(scale=2.0, size=n)
>>> time = np.minimum(np.minimum(t1, t2), cens)
>>> status = np.where((t1 <= t2) & (t1 <= cens), 1,
...                   np.where((t2 < t1) & (t2 <= cens), 2, 0))
>>> df = pd.DataFrame({"time": time, "status": status})
>>> ci = sp.cuminc(df, duration="time", event="status")
>>> isinstance(ci, sp.CumIncResult)
True
>>> ci.causes
[1, 2]
>>> at1 = ci.cif_at(1.0, cause=1)  # CIF for cause 1 at time t=1
>>> bool((at1["cif"] >= 0).all())
True

cif_at ¶

cif_at(time: float, cause: Optional[int] = None) -> DataFrame

Cumulative incidence (last value at or before time).

plot ¶

plot(cause: Optional[int] = None, ax: Any = None, **kwargs: Any) -> Any

Step-plot the cumulative incidence function(s).

FineGrayResult `dataclass` ¶

Bases: ResultProtocolMixin

Fine-Gray proportional subdistribution hazards model result.

Attributes:

Name	Type	Description
`params`	`ndarray`	Estimated coefficients (log subdistribution hazard ratios).
`bse`	`ndarray`	Standard errors (model-based, from the inverse information).
`covariates`	`list of str`	Covariate names aligned with `params`.
`cause`	`int`	The cause of interest whose subdistribution was modelled.
`n_obs, n_events`	`int`	Sample size and number of cause-of-interest events.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x = rng.normal(size=n)
>>> t1 = rng.exponential(scale=np.exp(-0.5 * x))
>>> t2 = rng.exponential(scale=1.5)
>>> cens = rng.exponential(scale=2.0)
>>> time = np.minimum(np.minimum(t1, t2), cens)
>>> status = np.where((t1 <= t2) & (t1 <= cens), 1,
...                   np.where((t2 < t1) & (t2 <= cens), 2, 0))
>>> df = pd.DataFrame({"time": time, "status": status, "x": x})
>>> res = sp.finegray(df, duration="time", event="status", x=["x"])
>>> isinstance(res, sp.FineGrayResult)
True
>>> res.covariates
['x']
>>> res.shr.shape   # one subdistribution hazard ratio per covariate
(1,)

shr `property` ¶

shr: ndarray

Subdistribution hazard ratios, exp(coef).

cox ¶

cox(formula: Optional[str] = None, data: DataFrame = None, duration: Optional[str] = None, event: Optional[str] = None, x: Optional[List[str]] = None, ties: str = 'efron', strata: Optional[str] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, hazard_ratio: bool = True, alpha: float = 0.05) -> CoxResult

Cox Proportional Hazards model via partial likelihood.

Parameters:

Name	Type	Description	Default
`formula`	`str`	Formula of the form `'duration ~ x1 + x2'`. If given, `duration` is inferred from the LHS.	`None`
`data`	`DataFrame`	Input data.	`None`
`duration`	`str`	Column name for follow-up time (overrides formula LHS).	`None`
`event`	`str`	Column name for event indicator (1 = event, 0 = censored).	`None`
`x`	`list of str`	Covariate column names (overrides formula RHS).	`None`
`ties`	`str`	Tie-handling method: `'efron'` or `'breslow'`.	``'efron'``
`strata`	`str`	Column name for stratification variable.	`None`
`robust`	`str`	`'hc0'` for sandwich SE.	``'nonrobust'``
`cluster`	`str`	Column name for cluster-robust SE.	`None`
`hazard_ratio`	`bool`	If True, report hazard ratios in the summary alongside coefficients.	`True`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`

Returns:

Type	Description
`CoxResult`	Result object extending `EconometricResults` with `.concordance`, `.baseline_hazard()`, `.ph_test()`, `.plot()`.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 150
>>> age = rng.normal(50, 10, n)
>>> treatment = rng.integers(0, 2, n).astype(float)
>>> time = rng.exponential(
...     scale=np.exp(-0.03 * (age - 50) - 0.5 * treatment))
>>> status = (rng.random(n) < 0.8).astype(int)
>>> df = pd.DataFrame({"time": time, "status": status,
...                    "age": age, "treatment": treatment})
>>> res = sp.cox(formula="time ~ age + treatment", data=df,
...              event="status")
>>> bool(list(res.params.index) == ["age", "treatment"])
True
>>> bool(isinstance(res.summary(), str))
True

kaplan_meier ¶

kaplan_meier(data: DataFrame, duration: str, event: str, group: Optional[str] = None, alpha: float = 0.05) -> KMResult

Kaplan-Meier non-parametric survival function estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`duration`	`str`	Column name for duration / follow-up time.	required
`event`	`str`	Column name for event indicator (1 = event, 0 = censored).	required
`group`	`str`	Column name for group variable (stratification).	`None`
`alpha`	`float`	Significance level for confidence intervals (Greenwood formula).	`0.05`

Returns:

Type	Description
`KMResult`	Object with `.survival_table`, `.median_survival`, `.plot()`, `.summary()`.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 150
>>> df = pd.DataFrame({
...     "time": rng.exponential(10, n).round(2),
...     "status": (rng.random(n) < 0.7).astype(int),
... })
>>> km = sp.kaplan_meier(data=df, duration="time", event="status")
>>> bool("survival" in km.survival_table.columns)
True
>>> bool(isinstance(km.median_survival, float))
True

survreg ¶

survreg(formula: Optional[str] = None, data: DataFrame = None, duration: Optional[str] = None, event: Optional[str] = None, x: Optional[List[str]] = None, dist: str = 'weibull', robust: str = 'nonrobust', cluster: Optional[str] = None, alpha: float = 0.05) -> EconometricResults

Parametric survival model (AFT parameterization).

Parameters:

Name	Type	Description	Default
`formula`	`str`	Formula `'duration ~ x1 + x2'`.	`None`
`data`	`DataFrame`		`None`
`duration`	`str`	Follow-up time column (or formula LHS).	`None`
`event`	`str`	Event indicator column.	`None`
`x`	`list of str`	Covariate columns (or formula RHS).	`None`
`dist`	`str`	Distribution: `'weibull'`, `'exponential'`, `'lognormal'`, `'loglogistic'`.	``'weibull'``
`robust`	`str`		``'nonrobust'``
`cluster`	`str`		`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`EconometricResults`	Fitted parametric survival model. Parameters include covariates and `log(sigma)` (scale).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 150
>>> age = rng.normal(50, 10, n)
>>> treatment = rng.integers(0, 2, n).astype(float)
>>> time = rng.exponential(
...     scale=np.exp(0.02 * (age - 50) + 0.5 * treatment))
>>> status = (rng.random(n) < 0.8).astype(int)
>>> df = pd.DataFrame({"time": time, "status": status,
...                    "age": age, "treatment": treatment})
>>> res = sp.survreg("time ~ age + treatment", data=df,
...                  event="status", dist="weibull")
>>> bool("log(sigma)" in list(res.params.index))
True

logrank_test ¶

logrank_test(data: DataFrame, duration: str, event: str, group: str) -> dict

Log-rank test for equality of survival distributions across groups.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`duration`	`str`	Column names.	required
`event`	`str`	Column names.	required
`group`	`str`	Column names.	required

Returns:

Type	Description
`dict`	Keys: `test_statistic`, `p_value`, `df`, `n_groups`, `expected_events` (per group), `observed_events` (per group).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 150
>>> treatment = rng.integers(0, 2, n)
>>> time = rng.exponential(scale=np.exp(-0.5 * treatment))
>>> status = (rng.random(n) < 0.8).astype(int)
>>> df = pd.DataFrame({"time": time, "status": status,
...                    "treatment": treatment})
>>> res = sp.logrank_test(data=df, duration="time", event="status",
...                       group="treatment")
>>> bool("p_value" in res and "test_statistic" in res)
True

References

mantel1959statistical

cox_frailty ¶

cox_frailty(formula: str, data: DataFrame, cluster: str, alpha: float = 0.05, maxiter: int = 50, tol: float = 1e-06) -> FrailtyResult

Cox proportional hazards with shared gamma frailty.

Parameters:

Name	Type	Description	Default
`formula`	`str`	`"duration + event ~ x1 + x2"` (like R's `Surv(time, event) ~ x`).	required
`data`	`DataFrame`		required
`cluster`	`str`	Column identifying clusters (e.g. hospital, site).	required

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> cluster = np.repeat(np.arange(20), 15)
>>> n = cluster.size
>>> frailty = rng.gamma(5.0, 1 / 5.0, 20)[cluster]
>>> x = rng.normal(size=n)
>>> rate = frailty * np.exp(0.7 * x)
>>> T = rng.exponential(1.0 / rate)
>>> C = rng.exponential(2.0, n)
>>> df = pd.DataFrame({"time": np.minimum(T, C),
...                    "event": (T <= C).astype(int),
...                    "x": x, "hospital": cluster})
>>> res = sp.cox_frailty("time + event ~ x", df, cluster="hospital")
>>> res.var_names
['x']
>>> bool(res.theta > 0)  # gamma frailty precision
True

References

[@therneau2000modeling]

causal_survival_forest ¶

causal_survival_forest(data: DataFrame, time: str, event: str, treat: str, covariates: Sequence[str] | str, horizon: Optional[float] = None, n_trees: int = 200, min_leaf: int = 5, max_depth: Optional[int] = None, propensity_bounds: tuple[float, float] = (0.05, 0.95), random_state: int = 42, alpha: float = 0.05) -> CausalSurvivalForestResult

Fit a causal survival forest and return the RMST ATE plus CATE.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`time`	`str`	Observed time-to-event column (min of true event time and censoring time).	required
`event`	`str`	Event indicator (1 = event observed, 0 = censored).	required
`treat`	`str`	Binary treatment indicator.	required
`covariates`	`sequence of str`		required
`horizon`	`float`	RMST horizon tau. Defaults to the 80th percentile of observed times.	`None`
`n_trees`	`int`	Number of trees in the forest.	`200`
`min_leaf`	`int`	Minimum samples per leaf.	`5`
`max_depth`	`int`	Maximum tree depth.	`None`
`propensity_bounds`	`tuple`	Clip estimated propensity for stability.	`(0.05, 0.95)`
`random_state`	`int`		`42`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`CausalSurvivalForestResult`	`cate` contains the individual RMST effect prediction.

cuminc ¶

cuminc(data: DataFrame, duration: str, event: str, group: Optional[str] = None, alpha: float = 0.05) -> CumIncResult

Cumulative incidence functions for competing risks (Aalen-Johansen).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`duration`	`str`	Column name for the follow-up time.	required
`event`	`str`	Column name for the event indicator. `0` = censored; `1, 2, ...` = competing causes.	required
`group`	`str`	Column name for a grouping variable. When supplied, CIFs are estimated per group and Gray's K-sample test is reported per cause.	`None`
`alpha`	`float`	Significance level for the confidence bands.	`0.05`

Returns:

Type	Description
`CumIncResult`	With `.cif_table`, `.gray_test`, `.summary()`, `.plot()`.

Notes

The cumulative incidence for a single cause is not 1 - KM applied to that cause; treating competing events as censoring over-states the risk. The Aalen-Johansen estimator weights each cause-specific increment by the overall (all-cause) survival probability, so the CIFs of all causes plus the overall survival sum to one at every time.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> arm = rng.integers(0, 2, n)
>>> t1 = rng.exponential(scale=1.0 + 0.5 * arm, size=n)   # cause 1
>>> t2 = rng.exponential(scale=1.5, size=n)               # cause 2
>>> cens = rng.exponential(scale=2.0, size=n)
>>> time = np.minimum(np.minimum(t1, t2), cens)
>>> status = np.where((t1 <= t2) & (t1 <= cens), 1,
...                   np.where((t2 < t1) & (t2 <= cens), 2, 0))
>>> df = pd.DataFrame({"time": time, "status": status, "arm": arm})
>>> ci = sp.cuminc(df, duration="time", event="status", group="arm")
>>> isinstance(ci, sp.CumIncResult)
True
>>> ci.causes
[1, 2]
>>> ci.summary()
>>> ci.plot(cause=1)

finegray ¶

finegray(data: DataFrame, duration: str, event: str, x: Sequence[str], cause: int = 1, alpha: float = 0.05, max_iter: int = 50, tol: float = 1e-07) -> FineGrayResult

Fine & Gray (1999) proportional subdistribution hazards model.

Models the effect of covariates on the cumulative incidence of cause through its subdistribution hazard, so coefficients exponentiate to subdistribution hazard ratios that map monotonically to the CIF (unlike cause-specific Cox coefficients).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Input data.	required
`duration`	`str`	Follow-up-time column.	required
`event`	`str`	Event indicator: `0` = censored, `1, 2, ...` = causes.	required
`x`	`sequence of str`	Covariate column names.	required
`cause`	`int`	Cause of interest (default `1`).	`1`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`max_iter`	`(int, float)`	Newton-Raphson controls.	`50`
`tol`	`(int, float)`	Newton-Raphson controls.	`50`

Returns:

Type	Description
`FineGrayResult`	With `.shr`, `.tidy()`, `.summary()`.

Notes

Subjects who fail from a competing cause are retained in the risk set with time-decaying inverse-probability-of-censoring weights w_i(t) = Ĝ(t) / Ĝ(T_i) (Ĝ = KM estimate of the censoring survival). The weighted partial likelihood is maximised by Newton-Raphson with the Breslow tie approximation. Standard errors are model-based (inverse information); a fully robust sandwich variance that accounts for estimating Ĝ is not yet implemented.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> x = rng.normal(size=n)
>>> t1 = rng.exponential(scale=np.exp(-0.5 * x))   # cause of interest
>>> t2 = rng.exponential(scale=1.5)                # competing cause
>>> cens = rng.exponential(scale=2.0)
>>> time = np.minimum(np.minimum(t1, t2), cens)
>>> status = np.where((t1 <= t2) & (t1 <= cens), 1,
...                   np.where((t2 < t1) & (t2 <= cens), 2, 0))
>>> df = pd.DataFrame({"time": time, "status": status, "x": x})
>>> res = sp.finegray(
...     df, duration="time", event="status", x=["x"], cause=1
... )
>>> res.cause
1
>>> res.tidy()["term"].tolist()
['x']
>>> bool(res.shr[0] > 0)  # subdistribution hazard ratio
True

References

fine1999proportional

statspai.survival¶

survival ¶

CoxResult ¶

concordance property ¶

baseline_hazard ¶

ph_test ¶

plot ¶

KMResult ¶

survival_table property ¶

median_survival property ¶

summary ¶

plot ¶

CumIncResult dataclass ¶

cif_at ¶

plot ¶

FineGrayResult dataclass ¶

shr property ¶

cox ¶

kaplan_meier ¶

survreg ¶

logrank_test ¶

cox_frailty ¶

causal_survival_forest ¶

cuminc ¶

finegray ¶

`statspai.survival`¶

concordance `property` ¶

survival_table `property` ¶

median_survival `property` ¶

CumIncResult `dataclass` ¶

FineGrayResult `dataclass` ¶

shr `property` ¶