Skip to content

statspai.epi

epi

Epidemiology domain primitives (sp.epi).

Fills the gap the article calls out — statspai already has the heavy epidemiological causal machinery (IPW, G-formula, MSM, target trial), but lacked the entry-level statistical primitives that clinicians, epidemiologists, and public-health researchers reach for first.

Modelled after R's epiR, epitools, and fmsb.

import statspai as sp sp.epi.odds_ratio(50, 20, 30, 40) sp.epi.relative_risk(50, 950, 10, 990) sp.epi.mantel_haenszel(tables_2x2xK) sp.epi.direct_standardize(events, pop, standard_weights) sp.epi.bradford_hill(strength=1.0, temporality=1.0, consistency=0.5, ...)

OR2x2Result dataclass

Result of a 2x2 odds-ratio calculation.

RR2x2Result dataclass

Result of a 2x2 relative-risk (risk-ratio) calculation.

RD2x2Result dataclass

Result of a 2x2 risk-difference calculation.

ARResult dataclass

Attributable-risk quantities (Levin 1953, Miettinen 1974).

IRRResult dataclass

Incidence rate ratio from person-time data.

odds_ratio

odds_ratio(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, method: str = 'woolf', alpha: float = 0.05) -> OR2x2Result

Odds ratio from a 2x2 table.

The standard epidemiology 2x2 layout is::

                Outcome+   Outcome-
Exposed            a          b
Unexposed          c          d

Parameters:

Name Type Description Default
a float

Cell counts, or pass a 2x2 array-like as a.

required
b float

Cell counts, or pass a 2x2 array-like as a.

required
c float

Cell counts, or pass a 2x2 array-like as a.

required
d float

Cell counts, or pass a 2x2 array-like as a.

required
method ('woolf', 'exact')

Confidence-interval method. "woolf" uses the asymptotic log-OR standard error; "exact" uses the Fisher-style conditional non-central hypergeometric CI (via :func:scipy.stats.fisher_exact).

"woolf"
alpha float
0.05

Returns:

Type Description
OR2x2Result

Examples:

>>> import statspai as sp
>>> res = sp.epi.odds_ratio(50, 20, 30, 40)
>>> round(res.estimate, 3)
3.333

relative_risk

relative_risk(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> RR2x2Result

Relative risk (risk ratio) with Katz log-RR confidence interval.

Uses the Haldane correction when any cell is zero.

risk_difference

risk_difference(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, method: str = 'wald', alpha: float = 0.05) -> RD2x2Result

Risk difference with Wald or Newcombe CI.

Parameters:

Name Type Description Default
method ('wald', 'newcombe')

Newcombe's hybrid score CI avoids the Wald overshoot problem near 0 or 1.

"wald"

attributable_risk

attributable_risk(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> ARResult

Attributable fractions in exposed + in population (Levin 1953).

Computes: - AF_exposed = (RR - 1) / RR - PAF = P_e * (RR - 1) / [1 + P_e * (RR - 1)]

where P_e is prevalence of exposure. CI for PAF uses the delta method on log(1 - PAF).

incidence_rate_ratio

incidence_rate_ratio(events_exposed: float, pt_exposed: float, events_unexposed: float, pt_unexposed: float, *, alpha: float = 0.05, method: str = 'exact') -> IRRResult

Person-time incidence rate ratio with exact Poisson CI.

Parameters:

Name Type Description Default
events_exposed float

Event counts.

required
events_unexposed float

Event counts.

required
pt_exposed float

Person-time at risk in each group (any time unit, as long as consistent).

required
pt_unexposed float

Person-time at risk in each group (any time unit, as long as consistent).

required
method ('exact', 'wald')

"exact" uses the F-distribution-based Poisson CI (Breslow-Day); "wald" uses log-rate SE.

"exact"

number_needed_to_treat

number_needed_to_treat(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> NNTResult

Number needed to treat (or harm), defined as |1 / RD|.

Propagates the Wald CI for RD. Interpretation convention: negative RD -> NNT-Benefit (treatment reduces risk); positive RD -> NNT-Harm.

prevalence_ratio

prevalence_ratio(*args, **kwargs) -> RR2x2Result

Prevalence ratio (cross-sectional RR); mathematically identical to :func:relative_risk when called on a 2x2 prevalence table. Distinguished for semantic clarity in cross-sectional studies.

mantel_haenszel

mantel_haenszel(tables: Union[Sequence, ndarray], *, measure: str = 'OR', alpha: float = 0.05) -> MantelHaenszelResult

Mantel-Haenszel pooled OR / RR across K strata.

Parameters:

Name Type Description Default
tables array-like, shape ``(K, 2, 2)``

Each stratum's 2x2 table with layout [[a_k, b_k], [c_k, d_k]] (exposure x outcome).

required
measure ('OR', 'RR')

Pooled measure. Use :func:mantel_haenszel_rate for person-time IRR.

"OR"
alpha float

Two-sided CI level.

0.05

Returns:

Type Description
MantelHaenszelResult

breslow_day_test

breslow_day_test(tables: Union[Sequence, ndarray], *, tarone_correction: bool = True) -> tuple[float, float]

Breslow-Day test for homogeneity of the odds ratio across strata.

Parameters:

Name Type Description Default
tables array-like, shape ``(K, 2, 2)``
required
tarone_correction bool

Apply Tarone's correction (recommended; Tarone 1985).

True

Returns:

Name Type Description
chi2 float
p_value float

direct_standardize

direct_standardize(events: Sequence[float], population: Sequence[float], standard_weights: Sequence[float], *, alpha: float = 0.05) -> StandardizedRateResult

Direct standardization of a rate.

The standardized rate is:

r_std = sum_k (w_k * events_k / population_k)

where w_k are the relative weights of a standard population (they are normalized internally to sum to 1).

Parameters:

Name Type Description Default
events array - like

Event counts in each stratum of the study population.

required
population array - like

Denominator (person-time or population size) in each stratum.

required
standard_weights array - like

Standard population size or proportion per stratum. Will be normalized to sum to 1.

required
alpha float
0.05

Returns:

Type Description
StandardizedRateResult
Notes

SE is computed by the delta method on the weighted sum of stratum rates, treating events as Poisson.

indirect_standardize

indirect_standardize(observed: float, events_reference: Sequence[float], population_reference: Sequence[float], population_study: Sequence[float], *, alpha: float = 0.05) -> SMRResult

Indirect standardization -> Standardized Morbidity/Mortality Ratio.

Expected events = sum_k (rate_ref_k * pop_study_k), where rate_ref_k = events_reference_k / population_reference_k. SMR = observed / expected.

CI uses exact Poisson (Byar's approximation / Garwood).

diagnostic_test

diagnostic_test(*args, **kwargs) -> DiagnosticTestResult

Alias for :func:sensitivity_specificity.

sensitivity_specificity

sensitivity_specificity(y_true=None, y_pred=None, *, tp: Optional[int] = None, fn: Optional[int] = None, fp: Optional[int] = None, tn: Optional[int] = None, alpha: float = 0.05) -> DiagnosticTestResult

Sensitivity and specificity with Wilson-score CIs.

Parameters:

Name Type Description Default
y_true array - like

Reference and predicted binary labels (0/1).

None
y_pred array - like

Reference and predicted binary labels (0/1).

None
tp int

Pre-computed confusion cells. Use instead of y_true/ y_pred when you already have counts.

None
fn int

Pre-computed confusion cells. Use instead of y_true/ y_pred when you already have counts.

None
fp int

Pre-computed confusion cells. Use instead of y_true/ y_pred when you already have counts.

None
tn int

Pre-computed confusion cells. Use instead of y_true/ y_pred when you already have counts.

None
alpha float
0.05

roc_curve

roc_curve(y_true, scores, *, alpha: float = 0.05) -> ROCResult

ROC curve with Hanley-McNeil (1982) AUC standard error.

Parameters:

Name Type Description Default
y_true array-like of {0, 1}
required
scores array-like of continuous predictions (higher = more "positive")
required

auc

auc(y_true, scores) -> float

Shortcut: just return the AUC.

cohen_kappa

cohen_kappa(rater_a, rater_b, *, weights: str = 'unweighted', alpha: float = 0.05) -> KappaResult

Cohen's (1960) kappa for two raters.

Parameters:

Name Type Description Default
rater_a array - like

Same-length sequences of category labels from two raters.

required
rater_b array - like

Same-length sequences of category labels from two raters.

required
weights ('unweighted', 'linear', 'quadratic')

Weighting scheme for disagreements across an ordered category scale. "unweighted" recovers the classic Cohen kappa.

"unweighted"