Skip to content

statspai.survival

survival

Survival and duration analysis models.

CoxResult

Bases: EconometricResults

Result from Cox proportional hazards estimation.

Extends EconometricResults with survival-specific methods: .plot(), .ph_test(), .baseline_hazard(), .concordance.

concordance property

concordance: float

Harrell's C-statistic (concordance index).

baseline_hazard

baseline_hazard() -> DataFrame

Baseline cumulative hazard (Breslow estimator).

Returns:

Type Description
DataFrame

Columns time, baseline_cumhaz, baseline_survival.

ph_test

ph_test() -> DataFrame

Test the proportional hazards assumption via Schoenfeld residuals.

Computes the correlation of scaled Schoenfeld residuals with time for each covariate and reports a chi-squared test.

Returns:

Type Description
DataFrame

Columns: variable, rho, chi2, p_value.

plot

plot(kind: str = 'survival', ax=None, **kwargs)

Plot survival-related curves.

Parameters:

Name Type Description Default
kind str

'survival' — baseline survival curve, 'hazard' — baseline cumulative hazard, 'hr' — hazard ratio forest plot.

'survival'
ax Axes
None

KMResult

Kaplan-Meier survival analysis result.

Attributes:

Name Type Description
survival_table DataFrame

Life table with time, n_risk, n_event, n_censor, survival, std_err, ci_lower, ci_upper.

median_survival float or dict

Median survival time (per group if groups present).

survival_table property

survival_table: DataFrame

Life table (combined if multiple groups).

median_survival property

median_survival

Median survival time (scalar or dict by group).

summary

summary() -> str

Formatted summary of the Kaplan-Meier analysis.

plot

plot(ax=None, **kwargs)

Plot Kaplan-Meier survival curves with confidence bands.

Parameters:

Name Type Description Default
ax Axes
None

CumIncResult dataclass

Cumulative incidence functions for competing risks.

Attributes:

Name Type Description
cif_table DataFrame

Long table with columns group ("all" when ungrouped), cause, time, cif, se, ci_lower, ci_upper.

causes list

The competing-cause labels (excluding the 0 censoring code).

gray_test dict or None

cause -> {"statistic", "df", "p_value"} from Gray's K-sample test, or None when group was not supplied.

alpha float

Significance level used for the confidence bands.

cif_at

cif_at(time: float, cause: Optional[int] = None) -> DataFrame

Cumulative incidence (last value at or before time).

plot

plot(cause: Optional[int] = None, ax: Any = None, **kwargs: Any) -> Any

Step-plot the cumulative incidence function(s).

FineGrayResult dataclass

Fine-Gray proportional subdistribution hazards model result.

Attributes:

Name Type Description
params ndarray

Estimated coefficients (log subdistribution hazard ratios).

bse ndarray

Standard errors (model-based, from the inverse information).

covariates list of str

Covariate names aligned with params.

cause int

The cause of interest whose subdistribution was modelled.

n_obs, n_events int

Sample size and number of cause-of-interest events.

shr property

shr: ndarray

Subdistribution hazard ratios, exp(coef).

cox

cox(formula: str = None, data: DataFrame = None, duration: str = None, event: str = None, x: list = None, ties: str = 'efron', strata: str = None, robust: str = 'nonrobust', cluster: str = None, hazard_ratio: bool = True, alpha: float = 0.05) -> CoxResult

Cox Proportional Hazards model via partial likelihood.

Parameters:

Name Type Description Default
formula str

Formula of the form 'duration ~ x1 + x2'. If given, duration is inferred from the LHS.

None
data DataFrame

Input data.

None
duration str

Column name for follow-up time (overrides formula LHS).

None
event str

Column name for event indicator (1 = event, 0 = censored).

None
x list of str

Covariate column names (overrides formula RHS).

None
ties str

Tie-handling method: 'efron' or 'breslow'.

``'efron'``
strata str

Column name for stratification variable.

None
robust str

'hc0' for sandwich SE.

``'nonrobust'``
cluster str

Column name for cluster-robust SE.

None
hazard_ratio bool

If True, report hazard ratios in the summary alongside coefficients.

True
alpha float

Significance level for confidence intervals.

0.05

Returns:

Type Description
CoxResult

Result object extending EconometricResults with .concordance, .baseline_hazard(), .ph_test(), .plot().

Examples:

>>> import statspai as sp
>>> res = sp.cox(formula="time ~ age + treatment", data=df, event="status")
>>> print(res.summary())
>>> res.ph_test()
>>> res.plot(kind="survival")

kaplan_meier

kaplan_meier(data: DataFrame, duration: str, event: str, group: str = None, alpha: float = 0.05) -> KMResult

Kaplan-Meier non-parametric survival function estimator.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
duration str

Column name for duration / follow-up time.

required
event str

Column name for event indicator (1 = event, 0 = censored).

required
group str

Column name for group variable (stratification).

None
alpha float

Significance level for confidence intervals (Greenwood formula).

0.05

Returns:

Type Description
KMResult

Object with .survival_table, .median_survival, .plot(), .summary().

Examples:

>>> import statspai as sp
>>> km = sp.kaplan_meier(data=df, duration="time", event="status")
>>> km.plot()
>>> km.median_survival

survreg

survreg(formula: str = None, data: DataFrame = None, duration: str = None, event: str = None, x: list = None, dist: str = 'weibull', robust: str = 'nonrobust', cluster: str = None, alpha: float = 0.05) -> EconometricResults

Parametric survival model (AFT parameterization).

Parameters:

Name Type Description Default
formula str

Formula 'duration ~ x1 + x2'.

None
data DataFrame
None
duration str

Follow-up time column (or formula LHS).

None
event str

Event indicator column.

None
x list of str

Covariate columns (or formula RHS).

None
dist str

Distribution: 'weibull', 'exponential', 'lognormal', 'loglogistic'.

``'weibull'``
robust str
``'nonrobust'``
cluster str
None
alpha float
0.05

Returns:

Type Description
EconometricResults

Fitted parametric survival model. Parameters include covariates and log(sigma) (scale).

Examples:

>>> res = sp.survreg("time ~ age + treatment", data=df, event="status", dist="weibull")
>>> print(res.summary())

logrank_test

logrank_test(data: DataFrame, duration: str, event: str, group: str) -> dict

Log-rank test for equality of survival distributions across groups.

Parameters:

Name Type Description Default
data DataFrame
required
duration str

Column names.

required
event str

Column names.

required
group str

Column names.

required

Returns:

Type Description
dict

Keys: test_statistic, p_value, df, n_groups, expected_events (per group), observed_events (per group).

Examples:

>>> sp.logrank_test(data=df, duration="time", event="status", group="treatment")

cox_frailty

cox_frailty(formula: str, data: DataFrame, cluster: str, alpha: float = 0.05, maxiter: int = 50, tol: float = 1e-06) -> FrailtyResult

Cox proportional hazards with shared gamma frailty.

Parameters:

Name Type Description Default
formula str

"duration + event ~ x1 + x2" (like R's Surv(time, event) ~ x).

required
data DataFrame
required
cluster str

Column identifying clusters (e.g. hospital, site).

required

causal_survival_forest

causal_survival_forest(data: DataFrame, time: str, event: str, treat: str, covariates: Sequence[str], horizon: Optional[float] = None, n_trees: int = 200, min_leaf: int = 5, max_depth: Optional[int] = None, propensity_bounds: tuple = (0.05, 0.95), random_state: int = 42, alpha: float = 0.05) -> CausalSurvivalForestResult

Fit a causal survival forest and return the RMST ATE plus CATE.

Parameters:

Name Type Description Default
data DataFrame
required
time str

Observed time-to-event column (min of true event time and censoring time).

required
event str

Event indicator (1 = event observed, 0 = censored).

required
treat str

Binary treatment indicator.

required
covariates sequence of str
required
horizon float

RMST horizon tau. Defaults to the 80th percentile of observed times.

None
n_trees int

Number of trees in the forest.

200
min_leaf int

Minimum samples per leaf.

5
max_depth int

Maximum tree depth.

None
propensity_bounds tuple

Clip estimated propensity for stability.

(0.05, 0.95)
random_state int
42
alpha float
0.05

Returns:

Type Description
CausalSurvivalForestResult

cate contains the individual RMST effect prediction.

cuminc

cuminc(data: DataFrame, duration: str, event: str, group: Optional[str] = None, alpha: float = 0.05) -> CumIncResult

Cumulative incidence functions for competing risks (Aalen-Johansen).

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
duration str

Column name for the follow-up time.

required
event str

Column name for the event indicator. 0 = censored; 1, 2, ... = competing causes.

required
group str

Column name for a grouping variable. When supplied, CIFs are estimated per group and Gray's K-sample test is reported per cause.

None
alpha float

Significance level for the confidence bands.

0.05

Returns:

Type Description
CumIncResult

With .cif_table, .gray_test, .summary(), .plot().

Notes

The cumulative incidence for a single cause is not 1 - KM applied to that cause; treating competing events as censoring over-states the risk. The Aalen-Johansen estimator weights each cause-specific increment by the overall (all-cause) survival probability, so the CIFs of all causes plus the overall survival sum to one at every time.

Examples:

>>> import statspai as sp
>>> ci = sp.cuminc(df, duration="time", event="status", group="arm")
>>> ci.summary()
>>> ci.plot(cause=1)

finegray

finegray(data: DataFrame, duration: str, event: str, x: Sequence[str], cause: int = 1, alpha: float = 0.05, max_iter: int = 50, tol: float = 1e-07) -> FineGrayResult

Fine & Gray (1999) proportional subdistribution hazards model.

Models the effect of covariates on the cumulative incidence of cause through its subdistribution hazard, so coefficients exponentiate to subdistribution hazard ratios that map monotonically to the CIF (unlike cause-specific Cox coefficients).

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
duration str

Follow-up-time column.

required
event str

Event indicator: 0 = censored, 1, 2, ... = causes.

required
x sequence of str

Covariate column names.

required
cause int

Cause of interest (default 1).

1
alpha float

Significance level for confidence intervals.

0.05
max_iter (int, float)

Newton-Raphson controls.

50
tol (int, float)

Newton-Raphson controls.

50

Returns:

Type Description
FineGrayResult

With .shr, .tidy(), .summary().

Notes

Subjects who fail from a competing cause are retained in the risk set with time-decaying inverse-probability-of-censoring weights w_i(t) = Ĝ(t) / Ĝ(T_i) (Ĝ = KM estimate of the censoring survival). The weighted partial likelihood is maximised by Newton-Raphson with the Breslow tie approximation. Standard errors are model-based (inverse information); a fully robust sandwich variance that accounts for estimating Ĝ is not yet implemented.