Power & sample size for epidemiological designs¶
Design the study before you run it. StatsPAI's power family already covers econometric designs (RCT, DiD, RD, IV, cluster RCT, OLS); this guide covers the three sample-size questions epidemiologists and clinical trialists ask most: a binary outcome compared across two arms, a time-to-event (log-rank) comparison, and an unmatched case-control study.
Every function returns a PowerResult (with .power, .n,
.summary()), accepts a scalar or an array for the sample-size
argument (for power curves), and solves for the required sample size when
you pass power_target= instead of n.
These calculators are validated by Monte-Carlo agreement with the test
they approximate and against the closed-form Schoenfeld events
requirement (see tests/test_power_study_designs.py).
1. Two proportions (cohort / RCT with a binary outcome)¶
import statspai as sp
# Power for a fixed sample size
r = sp.power_two_proportions(n=200, p1=0.30, p2=0.50)
print(r.power) # ≈ 0.84
# Solve for the total sample size that gives 80% power
r = sp.power_two_proportions(p1=0.30, p2=0.50, power_target=0.80)
print(r.n) # smallest n with power >= 0.80
# Unequal allocation (twice as many controls) and a power curve
import numpy as np
curve = sp.power_two_proportions(n=np.arange(50, 500, 50),
p1=0.30, p2=0.50, ratio=2.0)
print(curve.power)
p1 is the reference (control) outcome probability, p2 the comparison
arm; ratio = n2/n1.
2. Survival / log-rank (Schoenfeld)¶
Power for a time-to-event comparison depends on the number of events, not just the sample size, so you supply the hazard ratio and the overall probability that a subject is observed to have the event:
import statspai as sp
# 80% power to detect HR = 0.5 needs ~66 events (Schoenfeld 1983)
r = sp.power_logrank(hazard_ratio=0.5, prob_event=1.0, power_target=0.80)
print(r.params["n_events"]) # ≈ 66
# If only 60% of subjects are expected to have the event during follow-up:
r = sp.power_logrank(hazard_ratio=0.7, prob_event=0.6, power_target=0.80)
print(r.n) # total sample size (= events / prob_event)
3. Unmatched case-control¶
Parameterised the way case-control studies are actually planned — by the
exposure odds ratio to detect and the exposure prevalence among
controls — with ratio controls per case:
import statspai as sp
# Power with 150 cases, 1:1 controls, OR = 2, 30% control exposure
r = sp.power_case_control(n_cases=150, odds_ratio=2.0,
exposure_prevalence=0.30)
print(r.power)
# Number of cases for 80% power with 2 controls per case
r = sp.power_case_control(odds_ratio=2.0, exposure_prevalence=0.30,
ratio=2.0, power_target=0.80)
print(r.n) # required number of cases
The odds ratio and control exposure prevalence imply the case exposure
prevalence p1 = OR·p0 / (1 + p0(OR−1)), and power is computed as a
two-proportion comparison between cases and controls.
Notes & limitations¶
- All three use the normal approximation; for very small samples or very rare outcomes, prefer an exact calculation and treat these as planning approximations.
- Stepped-wedge and other cluster-period designs are not yet covered here
(cluster RCTs are available via
sp.power_cluster_rct).
Where to next¶
- Competing risks
- Survival analysis (
sp.cox,sp.kaplan_meier,sp.survreg) — see the survival reference page.