statspai.epi¶
epi ¶
Epidemiology domain primitives (sp.epi).
Fills the gap the article calls out — statspai already has the heavy epidemiological causal machinery (IPW, G-formula, MSM, target trial), but lacked the entry-level statistical primitives that clinicians, epidemiologists, and public-health researchers reach for first.
Modelled after R's epiR, epitools, and fmsb.
import statspai as sp sp.epi.odds_ratio(50, 20, 30, 40) sp.epi.relative_risk(50, 950, 10, 990) sp.epi.mantel_haenszel(tables_2x2xK) sp.epi.direct_standardize(events, pop, standard_weights) sp.epi.bradford_hill(strength=1.0, temporality=1.0, consistency=0.5, ...)
OR2x2Result
dataclass
¶
Result of a 2x2 odds-ratio calculation.
RR2x2Result
dataclass
¶
Result of a 2x2 relative-risk (risk-ratio) calculation.
RD2x2Result
dataclass
¶
Result of a 2x2 risk-difference calculation.
ARResult
dataclass
¶
Attributable-risk quantities (Levin 1953, Miettinen 1974).
IRRResult
dataclass
¶
Incidence rate ratio from person-time data.
odds_ratio ¶
odds_ratio(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, method: str = 'woolf', alpha: float = 0.05) -> OR2x2Result
Odds ratio from a 2x2 table.
The standard epidemiology 2x2 layout is::
Outcome+ Outcome-
Exposed a b
Unexposed c d
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
a
|
float
|
Cell counts, or pass a 2x2 array-like as |
required |
b
|
float
|
Cell counts, or pass a 2x2 array-like as |
required |
c
|
float
|
Cell counts, or pass a 2x2 array-like as |
required |
d
|
float
|
Cell counts, or pass a 2x2 array-like as |
required |
method
|
('woolf', 'exact')
|
Confidence-interval method. "woolf" uses the asymptotic
log-OR standard error; "exact" uses the Fisher-style
conditional non-central hypergeometric CI (via
:func: |
"woolf"
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
OR2x2Result
|
|
Examples:
relative_risk ¶
relative_risk(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> RR2x2Result
Relative risk (risk ratio) with Katz log-RR confidence interval.
Uses the Haldane correction when any cell is zero.
risk_difference ¶
risk_difference(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, method: str = 'wald', alpha: float = 0.05) -> RD2x2Result
Risk difference with Wald or Newcombe CI.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
('wald', 'newcombe')
|
Newcombe's hybrid score CI avoids the Wald overshoot problem near 0 or 1. |
"wald"
|
attributable_risk ¶
attributable_risk(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> ARResult
Attributable fractions in exposed + in population (Levin 1953).
Computes: - AF_exposed = (RR - 1) / RR - PAF = P_e * (RR - 1) / [1 + P_e * (RR - 1)]
where P_e is prevalence of exposure. CI for PAF uses the delta method on log(1 - PAF).
incidence_rate_ratio ¶
incidence_rate_ratio(events_exposed: float, pt_exposed: float, events_unexposed: float, pt_unexposed: float, *, alpha: float = 0.05, method: str = 'exact') -> IRRResult
Person-time incidence rate ratio with exact Poisson CI.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events_exposed
|
float
|
Event counts. |
required |
events_unexposed
|
float
|
Event counts. |
required |
pt_exposed
|
float
|
Person-time at risk in each group (any time unit, as long as consistent). |
required |
pt_unexposed
|
float
|
Person-time at risk in each group (any time unit, as long as consistent). |
required |
method
|
('exact', 'wald')
|
"exact" uses the F-distribution-based Poisson CI (Breslow-Day); "wald" uses log-rate SE. |
"exact"
|
number_needed_to_treat ¶
number_needed_to_treat(a, b: Optional[float] = None, c: Optional[float] = None, d: Optional[float] = None, *, alpha: float = 0.05) -> NNTResult
Number needed to treat (or harm), defined as |1 / RD|.
Propagates the Wald CI for RD. Interpretation convention: negative RD -> NNT-Benefit (treatment reduces risk); positive RD -> NNT-Harm.
prevalence_ratio ¶
prevalence_ratio(*args, **kwargs) -> RR2x2Result
Prevalence ratio (cross-sectional RR); mathematically identical
to :func:relative_risk when called on a 2x2 prevalence table.
Distinguished for semantic clarity in cross-sectional studies.
mantel_haenszel ¶
mantel_haenszel(tables: Union[Sequence, ndarray], *, measure: str = 'OR', alpha: float = 0.05) -> MantelHaenszelResult
Mantel-Haenszel pooled OR / RR across K strata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
array-like, shape ``(K, 2, 2)``
|
Each stratum's 2x2 table with layout
|
required |
measure
|
('OR', 'RR')
|
Pooled measure. Use :func: |
"OR"
|
alpha
|
float
|
Two-sided CI level. |
0.05
|
Returns:
| Type | Description |
|---|---|
MantelHaenszelResult
|
|
breslow_day_test ¶
breslow_day_test(tables: Union[Sequence, ndarray], *, tarone_correction: bool = True) -> tuple[float, float]
Breslow-Day test for homogeneity of the odds ratio across strata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
tables
|
array-like, shape ``(K, 2, 2)``
|
|
required |
tarone_correction
|
bool
|
Apply Tarone's correction (recommended; Tarone 1985). |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
chi2 |
float
|
|
p_value |
float
|
|
direct_standardize ¶
direct_standardize(events: Sequence[float], population: Sequence[float], standard_weights: Sequence[float], *, alpha: float = 0.05) -> StandardizedRateResult
Direct standardization of a rate.
The standardized rate is:
r_std = sum_k (w_k * events_k / population_k)
where w_k are the relative weights of a standard population
(they are normalized internally to sum to 1).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
events
|
array - like
|
Event counts in each stratum of the study population. |
required |
population
|
array - like
|
Denominator (person-time or population size) in each stratum. |
required |
standard_weights
|
array - like
|
Standard population size or proportion per stratum. Will be normalized to sum to 1. |
required |
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
StandardizedRateResult
|
|
Notes
SE is computed by the delta method on the weighted sum of stratum rates, treating events as Poisson.
indirect_standardize ¶
indirect_standardize(observed: float, events_reference: Sequence[float], population_reference: Sequence[float], population_study: Sequence[float], *, alpha: float = 0.05) -> SMRResult
Indirect standardization -> Standardized Morbidity/Mortality Ratio.
Expected events = sum_k (rate_ref_k * pop_study_k), where rate_ref_k = events_reference_k / population_reference_k. SMR = observed / expected.
CI uses exact Poisson (Byar's approximation / Garwood).
diagnostic_test ¶
Alias for :func:sensitivity_specificity.
sensitivity_specificity ¶
sensitivity_specificity(y_true=None, y_pred=None, *, tp: Optional[int] = None, fn: Optional[int] = None, fp: Optional[int] = None, tn: Optional[int] = None, alpha: float = 0.05) -> DiagnosticTestResult
Sensitivity and specificity with Wilson-score CIs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array - like
|
Reference and predicted binary labels (0/1). |
None
|
y_pred
|
array - like
|
Reference and predicted binary labels (0/1). |
None
|
tp
|
int
|
Pre-computed confusion cells. Use instead of |
None
|
fn
|
int
|
Pre-computed confusion cells. Use instead of |
None
|
fp
|
int
|
Pre-computed confusion cells. Use instead of |
None
|
tn
|
int
|
Pre-computed confusion cells. Use instead of |
None
|
alpha
|
float
|
|
0.05
|
roc_curve ¶
ROC curve with Hanley-McNeil (1982) AUC standard error.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
array-like of {0, 1}
|
|
required |
scores
|
array-like of continuous predictions (higher = more "positive")
|
|
required |
cohen_kappa ¶
Cohen's (1960) kappa for two raters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
rater_a
|
array - like
|
Same-length sequences of category labels from two raters. |
required |
rater_b
|
array - like
|
Same-length sequences of category labels from two raters. |
required |
weights
|
('unweighted', 'linear', 'quadratic')
|
Weighting scheme for disagreements across an ordered category scale. "unweighted" recovers the classic Cohen kappa. |
"unweighted"
|