Competing risks — cumulative incidence & Fine-Gray¶
When a subject can fail from more than one cause, Kaplan-Meier lies. Treating "death from other causes" as ordinary censoring makes the naïve
1 - KMcurve over-state the risk of the cause you care about, because it implicitly assumes the censored subjects could still have had the event. StatsPAI ships the two standard competing-risks tools: the Aalen-Johansen cumulative incidence function (sp.cuminc) for description and Gray's test, and the Fine-Gray subdistribution hazards model (sp.finegray) for regression [@aalen1978nonparametric; @gray1988class; @fine1999proportional].
Event coding. Throughout, the event column is an integer:
0 = right-censored, and 1, 2, ... = the competing causes. This
matches R's cmprsk and survival conventions.
Scope note: these estimators are validated internally (CIF
self-consistency, analytic-vs-bootstrap variance agreement, Fine-Gray
recovery on simulated data, Gray's-test calibration) but are not yet
parity-certified against R's cmprsk / survival. Validate your
headline number before publication and read
sp.describe_function("cuminc")["limitations"].
1. Why not Kaplan-Meier?¶
In a competing-risks setting the quantity you almost always want is the cumulative incidence function (CIF) — the probability of failing from cause k by time t, accounting for the fact that a competing event removes a subject from ever experiencing cause k. The Aalen-Johansen estimator weights each cause-specific increment by the overall (all-cause) survival:
so that the CIFs of all causes plus the overall survival sum to one at
every time — a property 1 - KM_k does not have.
2. Cumulative incidence — sp.cuminc¶
import statspai as sp
ci = sp.cuminc(df, duration="time", event="status")
print(ci.summary())
print(ci.cif_table.head()) # group / cause / time / cif / se / ci_lower / ci_upper
ci.plot(cause=1) # step CIF curve with the other causes overlaid
Each cause's CIF comes with a delta-method standard error and a confidence band (Marubini-Valsecchi / Klein-Moeschberger variance). To read off the cumulative incidence at a specific horizon:
Comparing groups — Gray's test¶
Pass group= to estimate per-group CIFs and get Gray's (1988) K-sample
test for equality of cumulative incidence, reported per cause:
ci = sp.cuminc(df, duration="time", event="status", group="arm")
print(ci.gray_test[1]) # {'statistic', 'df', 'p_value'} for cause 1
ci.plot(cause=1) # one CIF curve per arm
Gray's test targets the subdistribution hazard, so it answers the clinically relevant question "do the groups differ in cumulative incidence of cause 1?" rather than the cause-specific-hazard question.
3. Regression — sp.finegray¶
The Fine-Gray model puts covariates on the subdistribution hazard, so its coefficients exponentiate to subdistribution hazard ratios (sHR) that map monotonically to the cumulative incidence — a covariate with sHR > 1 increases the cause-of-interest CIF. (Cause-specific Cox coefficients do not have this property, which is why they are easy to misread in a competing-risks setting.)
import statspai as sp
fg = sp.finegray(df, duration="time", event="status",
x=["treatment", "age", "stage"], cause=1)
print(fg.summary())
fg.tidy() # term / coef / shr / std_err / z / p_value / shr_lower / shr_upper
Subjects who fail from a competing cause are kept in the risk set with
time-decaying inverse-probability-of-censoring weights
Ĝ(t)/Ĝ(T_i) (Fine & Gray 1999); the weighted partial likelihood is
maximised by Newton-Raphson.
4. Cause-specific vs. subdistribution — which to report?¶
Both are legitimate and answer different questions:
| Question | Tool |
|---|---|
| "What is the probability of failing from cause 1 by time t?" | sp.cuminc (CIF) |
| "Does treatment change the cumulative incidence of cause 1?" | sp.finegray (sHR) or Gray's test |
| "Does treatment change the rate of cause 1 among those still event-free?" | cause-specific sp.cox (censor competing events) |
A common, defensible reporting choice is to present the CIF curves descriptively and both a cause-specific Cox model and a Fine-Gray model, since they decompose the effect on the rate vs. the effect on the cumulative incidence.
5. Limitations (read before you publish)¶
- Standard errors for
sp.finegrayare model-based (inverse information). A fully robust sandwich variance that accounts for estimating the censoring distribution Ĝ is not yet implemented; for small samples or heavy censoring, validate the SEs against R'scmprsk::crr. - No time-varying covariates in
sp.finegrayyet. - Parity is not yet certified against
cmprsk/survival.