Skip to content

statspai.mendelian

mendelian

Mendelian Randomization methods.

Uses genetic variants as instrumental variables to estimate causal effects in epidemiological/health economics studies.

Core estimators (mr): - IVW (inverse-variance weighted) - MR-Egger regression - Weighted / penalized median

Diagnostics (diagnostics): - Cochran's Q / Rucker's Q' (heterogeneity) - MR-Egger intercept test (directional pleiotropy) - Leave-one-out sensitivity - Steiger directionality test - MR-PRESSO global test + outlier correction - Radial MR

Frontier (frontier, v1.6): - MR-Lap (Burgess-Davies-Thompson 2016 sample-overlap correction) - MR-Clust (Foley et al. 2021 clustered pleiotropy) - GRAPPLE (Wang et al. 2021 profile-likelihood MR) - MR-cML-BIC (Xue-Shen-Pan 2021 constrained maximum likelihood)

MRResult

Results from Mendelian Randomization analysis.

plot

plot(ax=None, **kwargs)

Scatter plot of SNP effects with MR lines.

MVMRResult dataclass

Multivariable MR output.

MediationMRResult dataclass

Two-step MR output.

MRBMAResult dataclass

MR-BMA (Bayesian Model Averaging) output.

MRLapResult dataclass

Output of :func:mr_lap.

Attributes:

Name Type Description
estimate float

Overlap-corrected causal estimate.

se float

First-order SE (identical to IVW SE; overlap correction does not change the variance under the Burgess-Davies-Thompson 2016 approximation).

ci_lower, ci_upper float
p_value float
estimate_ivw float

The uncorrected IVW point estimate (for comparison).

bias_correction float

estimate_ivw - estimate, i.e. the amount subtracted from IVW.

overlap_fraction float
overlap_rho float
f_mean float

Mean first-stage F-statistic across SNPs.

n_snps int

MRClustResult dataclass

Output of :func:mr_clust.

Attributes:

Name Type Description
cluster_estimates DataFrame

One row per cluster, columns [cluster, estimate, se, ci_lower, ci_upper, weight, n_snps]. Cluster 0 is a 'null' component at θ=0 in the Foley et al. 2021 specification.

assignments ndarray

Integer cluster id (0..K-1) for each SNP.

responsibilities ndarray

(n_snps, K) matrix of posterior cluster-membership probabilities.

K int

Selected number of clusters (including the null cluster).

bic dict

BIC per K tried; keys are ints.

loglik float

Final log-likelihood at the selected K.

wald_ratios ndarray

Per-SNP Wald ratio (β_y / β_x).

wald_se ndarray

Per-SNP Wald-ratio first-order SE.

n_snps int

GrappleResult dataclass

Output of :func:grapple.

Attributes:

Name Type Description
estimate float

Profile-likelihood MLE of the causal effect β.

se float

SE from observed Fisher information.

ci_lower, ci_upper float
p_value float
tau2 float

Estimated pleiotropy variance (balanced pleiotropy SD² = τ²).

loglik float

Profile log-likelihood at the MLE.

converged bool
n_snps int

MRcMLResult dataclass

Output of :func:mr_cml.

Attributes:

Name Type Description
estimate float

BIC-selected MR-cML-BIC point estimate.

se float

Profile-likelihood SE at the selected K.

ci_lower, ci_upper, p_value float
K_selected int

Number of SNPs flagged as invalid (pleiotropic) by BIC.

invalid_snps ndarray

Boolean mask of the K_selected SNPs flagged as invalid.

path DataFrame

Full path: one row per K in [0, K_max] with columns [K, estimate, se, loglik, bic].

loglik float

Final log-likelihood at K_selected.

n_snps int

MRRapsResult dataclass

Output of :func:mr_raps.

Attributes:

Name Type Description
estimate float
se float

Sandwich SE from the robust M-estimator.

ci_lower, ci_upper float
p_value float
tau2 float
loglik_robust float

Final robust (Tukey) loss value.

converged bool
tuning_c float
n_snps int

mendelian_randomization

mendelian_randomization(data: DataFrame = None, beta_exposure: str = None, beta_outcome: str = None, se_exposure: str = None, se_outcome: str = None, exposure_name: str = 'Exposure', outcome_name: str = 'Outcome', methods: List[str] = None, alpha: float = 0.05, seed: int = None) -> MRResult

Mendelian Randomization analysis using summary statistics.

Equivalent to R's MendelianRandomization::mr_allmethods() and TwoSampleMR::mr().

Parameters:

Name Type Description Default
data DataFrame

Summary statistics with one row per SNP/instrument.

None
beta_exposure str

Column name for SNP-exposure association beta.

None
beta_outcome str

Column name for SNP-outcome association beta.

None
se_exposure str

Column name for SNP-exposure SE.

None
se_outcome str

Column name for SNP-outcome SE.

None
exposure_name str
'Exposure'
outcome_name str
'Outcome'
methods list of str

MR methods to use. Default: ['ivw', 'egger', 'weighted_median'].

None
alpha float
0.05
seed int
None

Returns:

Type Description
MRResult

Results with .summary(), .plot(), estimates table.

Examples:

>>> import statspai as sp
>>> result = sp.mendelian_randomization(
...     data=snp_stats,
...     beta_exposure='beta_x', beta_outcome='beta_y',
...     se_exposure='se_x', se_outcome='se_y',
...     exposure_name='BMI', outcome_name='T2D',
... )
>>> print(result.summary())
>>> result.plot()

mr_egger

mr_egger(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, alpha: float = 0.05) -> Dict[str, float]

MR-Egger regression.

Allows for directional pleiotropy via a non-zero intercept.

mr_ivw

mr_ivw(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, alpha: float = 0.05) -> Dict[str, float]

Inverse-Variance Weighted (IVW) MR estimator.

Fixed-effects meta-analysis of Wald ratios.

mr_median

mr_median(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, penalized: bool = False, n_boot: int = 1000, alpha: float = 0.05, seed: int = None) -> Dict[str, float]

Weighted median MR estimator.

Consistent when at least 50% of the weight comes from valid instruments.

mr_plot

mr_plot(result: MRResult = None, ax=None, **kwargs)

Scatter plot of SNP-exposure vs SNP-outcome effects with MR lines.

mr_heterogeneity

mr_heterogeneity(beta_exposure: ndarray, beta_outcome: ndarray, se_outcome: ndarray, *, method: str = 'ivw', se_exposure: Optional[ndarray] = None) -> HeterogeneityResult

Cochran's Q (IVW) or Rücker's Q' (Egger) for pleiotropic heterogeneity.

Parameters:

Name Type Description Default
beta_exposure array - like
required
beta_outcome array - like
required
se_outcome array - like
required
method ('ivw', 'egger')
"ivw"
se_exposure array - like

Required for Rücker's Q' (Egger); if omitted we default to first-order weights and raise for method='egger'.

None

mr_pleiotropy_egger

mr_pleiotropy_egger(beta_exposure: ndarray, beta_outcome: ndarray, se_outcome: ndarray) -> PleiotropyResult

Test the MR-Egger intercept for directional (unbalanced) pleiotropy.

H0: intercept == 0 (no directional pleiotropy).

mr_leave_one_out

mr_leave_one_out(beta_exposure: ndarray, beta_outcome: ndarray, se_outcome: ndarray, *, snp_ids: Optional[List[str]] = None, alpha: float = 0.05) -> LeaveOneOutResult

IVW estimate with each SNP dropped in turn.

Identifies SNPs that drive the overall estimate disproportionately.

mr_steiger

mr_steiger(beta_exposure: ndarray, se_exposure: ndarray, n_exposure, beta_outcome: ndarray, se_outcome: ndarray, n_outcome, *, eaf: Optional[ndarray] = None) -> SteigerResult

Steiger test for the direction of the causal effect (Hemani 2017).

Compares the R^2 of the SNPs on the exposure with their R^2 on the outcome. If R^2_exposure > R^2_outcome the assumed causal direction (exposure -> outcome) is supported.

mr_presso

mr_presso(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, n_boot: int = 1000, sig_threshold: float = 0.05, seed: Optional[int] = None) -> MRPressoResult

MR-PRESSO global test + outlier detection + corrected estimate.

Implementation notes

Follows Verbanck et al. (2018). For each SNP i, the "residual sum of squares" (RSS_obs) is computed using leave-one-out predictions. The null distribution is obtained by simulating SNP-outcome effects with the same SEs and no pleiotropy. SNPs with individual test p < sig_threshold are flagged as outliers and re-analyzed.

mr_radial

mr_radial(beta_exposure: ndarray, beta_outcome: ndarray, se_outcome: ndarray, *, snp_ids: Optional[List[str]] = None) -> RadialResult

Radial IVW MR (Bowden et al. 2018).

Reparameterizes each SNP's Wald ratio as a coordinate in a "radial" space. SNPs whose individual chi-square contribution to the Cochran Q exceeds the Bonferroni threshold at alpha=0.05 are flagged as outliers.

mr_mode

mr_mode(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, method: str = 'weighted', n_boot: int = 1000, alpha: float = 0.05, seed: Optional[int] = None) -> ModeBasedResult

Mode-based MR estimator (Hartwig, Davey Smith & Bowden 2017).

Parameters:

Name Type Description Default
method ('weighted', 'simple')

weighted uses IVW weights when finding the mode; simple uses unit weights (robust but less efficient).

"weighted"
n_boot int

Bootstrap replicates for SE.

1000

mr_f_statistic

mr_f_statistic(beta_exposure: ndarray, se_exposure: ndarray, *, n_samples: Optional[int] = None) -> FStatisticResult

Per-SNP F-statistic as an instrument-strength summary.

Uses the standard summary-stat approximation F_i = (beta_i / se_i)^2, which is valid for large GWAS where the first-stage regression is asymptotically Normal. Flags weak-IV risk when any SNP's F falls below 10 (Staiger-Stock 1997).

Parameters:

Name Type Description Default
n_samples int

GWAS sample size. If provided, R^2 is reported via R^2 = F / (F + n - 2); otherwise a small-sample approximation R^2 ≈ F / (F + n_snps - 2) is used.

None

mr_funnel_plot

mr_funnel_plot(beta_exposure: ndarray, beta_outcome: ndarray, se_outcome: ndarray, *, snp_ids: Optional[List[str]] = None, ax=None)

Funnel plot of SNP-specific Wald ratios vs. precision.

An asymmetric funnel around the IVW estimate suggests directional horizontal pleiotropy. Returns the matplotlib axis.

mr_scatter_plot

mr_scatter_plot(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, ax=None)

Classic MR scatter plot with IVW and MR-Egger lines.

mr_multivariable

mr_multivariable(snp_associations: DataFrame, *, outcome: str = 'beta_y', outcome_se: str = 'se_y', exposures: Optional[Sequence[str]] = None, alpha: float = 0.05) -> MVMRResult

Multivariable Mendelian Randomization (Sanderson et al. 2019).

Parameters:

Name Type Description Default
snp_associations DataFrame

One row per SNP. Must contain:

  • outcome — outcome-SNP association (β_Y).
  • outcome_se — its standard error.
  • exposure columns beta_X1, beta_X2, ... (configurable via the exposures argument).
required
outcome str
'beta_y'
outcome_se str
'beta_y'
exposures sequence of str

Exposure-beta column names. If omitted, all beta_* columns other than the outcome are used.

None
alpha float
0.05

Returns:

Type Description
MVMRResult

Contains direct-effect estimates for each exposure and conditional F-statistics (Sanderson-Windmeijer instrument-strength diagnostic).

Notes

Model:

β_Y = Σ_j α_j β_{X_j} + ε,  ε ~ Normal(0, σ_Y²)

fitted by weighted least squares with weights 1 / se_y². The α_j are the direct causal effects, holding other exposures fixed.

References

Sanderson, E., Davey Smith, G., Windmeijer, F. & Bowden, J. (2019). "An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings." IJE 48(3). [@sanderson2019examination]

mr_mediation

mr_mediation(snp_associations: DataFrame, *, beta_exposure: str = 'beta_x', se_exposure: str = 'se_x', beta_mediator: str = 'beta_m', se_mediator: str = 'se_m', beta_outcome: str = 'beta_y', se_outcome: str = 'se_y', exposure_name: str = 'exposure', mediator_name: str = 'mediator', outcome_name: str = 'outcome') -> MediationMRResult

Two-step MR mediation: decompose total effect into direct + indirect.

Parameters:

Name Type Description Default
snp_associations DataFrame

One row per SNP. Must contain SNP-associations with the exposure, mediator and outcome (both beta and SE).

required
beta_exposure str
'beta_x'
se_exposure str
'beta_x'
beta_mediator str
'beta_x'
se_mediator str
'beta_x'
beta_outcome str
'beta_x'
se_outcome str
'beta_x'

Returns:

Type Description
MediationMRResult
Notes

Two-step approach (Burgess et al. 2015):

  1. Step 1 — total effect (IVW): α_total = IVW(β_Y ~ β_X).
  2. Step 2a — exposure → mediator: α_XM = IVW(β_M ~ β_X).
  3. Step 2b — direct effect via MVMR: α_direct = MVMR(β_Y ~ β_X, β_M).
  4. Indirect: α_indirect = α_total − α_direct.

Delta-method SE for the indirect effect combines SEs from steps 1 and 3.

References

Burgess, S., Daniel, R. M., Butterworth, A. S. & Thompson, S. G. (2015). "Network Mendelian randomization: using genetic variants as instrumental variables to investigate mediation in causal pathways." IJE 44(2). [@burgess2015network]

mr_bma

mr_bma(snp_associations: DataFrame, *, outcome: str = 'beta_y', outcome_se: str = 'se_y', exposures: Optional[Sequence[str]] = None, max_model_size: Optional[int] = None, prior_inclusion: float = 0.5) -> MRBMAResult

Mendelian Randomization with Bayesian Model Averaging.

Parameters:

Name Type Description Default
snp_associations DataFrame
required
outcome see :func:`mr_multivariable`
'beta_y'
outcome_se see :func:`mr_multivariable`
'beta_y'
exposures see :func:`mr_multivariable`
'beta_y'
max_model_size int

Maximum number of exposures in any one model. Defaults to len(exposures). Set a smaller value for very high-dimensional exposure sets (restricts to at most k at a time).

None
prior_inclusion float

Prior probability that each exposure is in the true causal model.

0.5

Returns:

Type Description
MRBMAResult

Marginal inclusion probabilities and top posterior models.

Notes

Iterates over the 2^k − 1 non-empty subsets of exposures. For each model, fits the MVMR WLS regression and computes the BIC:

BIC_M = n * log(RSS_M / n) + k_M * log(n)

Model posterior ∝ exp(-BIC_M / 2) * prior(M).

Runs quickly for k ≤ 12. For more exposures use max_model_size.

References

Zuber, V., Colijn, J. M., Staley, J. R. & Burgess, S. (2020). "Selecting likely causal risk factors from high-throughput experiments using multivariable Mendelian randomization." Nature Communications 11, 29. [@zuber2020selecting]

mr_lap

mr_lap(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, overlap_fraction: float = 1.0, overlap_rho: float = 0.0, alpha: float = 0.05) -> MRLapResult

Sample-overlap-corrected inverse-variance-weighted MR.

Corrects the first-order bias introduced when the exposure and outcome GWAS share participants. Under Burgess, Davies & Thompson (2016) Eq. 8, the bias in two-sample IVW satisfies

.. math::

E[\hat\beta_{IVW}] - \beta \approx \frac{p \, \rho_{obs}}{F_{mean}}

where :math:p is the overlap fraction (share of samples appearing in both GWAS), :math:\rho_{obs} is the phenotypic correlation of exposure and outcome in the overlapping sub-sample, and :math:F_{mean} is the mean first-stage F-statistic of the SNPs.

When overlap_fraction=0 (clean two-sample design) the correction vanishes and the estimate equals standard IVW.

Parameters:

Name Type Description Default
beta_exposure ndarray

Per-SNP GWAS effect sizes.

required
beta_outcome ndarray

Per-SNP GWAS effect sizes.

required
se_exposure ndarray

Per-SNP GWAS standard errors.

required
se_outcome ndarray

Per-SNP GWAS standard errors.

required
overlap_fraction float

Share of the outcome sample that also appears in the exposure sample. 0 = clean two-sample; 1 = full overlap; in between = partial overlap. Must be in [0, 1].

1.0
overlap_rho float

Phenotypic correlation between exposure and outcome measurements in the overlapping sub-sample. In practice estimated by LD-score regression (ldsc --rg). Must be in [-1, 1].

0.0
alpha float

Two-sided significance level for the CI and p-value.

0.05

Returns:

Type Description
class:`MRLapResult`
Notes

This implements the closed-form Burgess-Davies-Thompson (2016) correction. The more elaborate MR-Lap of Mounier & Kutalik (2023) additionally regresses bias on a non-linear function of (F_mean, overlap_fraction) to absorb weak-instrument curvature; that refinement is deferred to a future release. When the user- supplied overlap_rho is obtained from LD-score regression this simplified correction captures most of the overlap bias on real-world GWAS scales (F > 30).

The SE is kept equal to the naive-IVW SE: the correction shifts the point estimate only. Monte-Carlo studies (Burgess 2016 Figs. 2-3) show coverage remains nominal under this conservative choice because the correction itself has second-order variance.

Examples:

>>> import statspai as sp
>>> res = sp.mr_lap(bx, by, sx, sy,
...                 overlap_fraction=0.4, overlap_rho=0.18)
>>> print(res.summary())

mr_clust

mr_clust(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, K_range: Tuple[int, int] = (1, 5), include_null: bool = True, alpha: float = 0.05, seed: int = 0, max_iter: int = 200, tol: float = 1e-06) -> MRClustResult

Clustered Mendelian randomization via Gaussian mixture on Wald ratios.

Implements the MR-Clust estimator of Foley, Mason, Kirk & Burgess (2021) Bioinformatics 37(4). Assigns each SNP to one of K clusters based on its Wald ratio, where cluster 0 is a 'null' component (SNPs with zero effect on the outcome). Cluster means represent distinct causal-effect estimates — multiple non-null clusters are evidence of heterogeneous / pathway-specific pleiotropy.

Model

For SNP i in cluster k:

.. math::

r_i = \frac{\beta_{y,i}}{\beta_{x,i}}, \qquad r_i \mid z_i = k \ \sim\ \mathcal{N}(\theta_k, \tau_i^2)

with :math:\tau_i = s_{y,i} / |\beta_{x,i}| the first-order Wald SE. When include_null=True cluster 0 is fixed at :math:\theta_0 = 0. Parameters fitted by EM; K selected by BIC.

Parameters:

Name Type Description Default
beta_exposure ndarray
required
beta_outcome ndarray
required
se_exposure ndarray
required
se_outcome ndarray
required
K_range (int, int)

Inclusive range of cluster counts to try. K=1 with include_null=True is "everything null" and K=1 without is a single cluster at median ratio.

(1, 5)
include_null bool

If True the first cluster is fixed at 0 (null).

True
alpha float

For per-cluster CIs.

0.05
seed int

EM initialisation rng.

0
max_iter EM controls.
200
tol EM controls.
200

Returns:

Type Description
class:`MRClustResult`

Examples:

>>> import statspai as sp
>>> res = sp.mr_clust(bx, by, sx, sy, K_range=(1, 4))
>>> print(res.summary())

mr_cml

mr_cml(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, K_max: Optional[int] = None, alpha: float = 0.05, max_iter: int = 200, tol: float = 1e-08) -> MRcMLResult

Constrained maximum-likelihood MR (MR-cML-BIC).

Implements MR-cML with BIC-selected sparsity (Xue, Shen & Pan 2021, AJHG 108(7)). The model lets each SNP have its own pleiotropy effect :math:r_i subject to :math:\|r\|_0 \le K; the number of pleiotropic SNPs K is selected by BIC.

Model

.. math::

\beta_{x,i}^{obs} &= \beta_{x,i}^{true} + e_i, \quad e_i \sim \mathcal{N}(0, s_{x,i}^2) \ \beta_{y,i}^{obs} &= \beta\,\beta_{x,i}^{true} + r_i + \epsilon_i, \quad \epsilon_i \sim \mathcal{N}(0, s_{y,i}^2) \ & \text{with } |r|_0 \le K.

Jointly MLE over :math:(\beta, \{\beta_{x,i}^{true}\}, r) by block-coordinate descent; repeat for K=0..K_max and select K* minimising BIC = -2 logL + K log(n).

Parameters:

Name Type Description Default
beta_exposure ndarray
required
beta_outcome ndarray
required
se_exposure ndarray
required
se_outcome ndarray
required
K_max int

Maximum pleiotropy cardinality to try. Defaults to n_snps - 3 (ensures at least 3 valid SNPs). Must be in [0, n_snps - 1].

None
alpha float
0.05
max_iter inner block-CD controls.
200
tol inner block-CD controls.
200

Returns:

Type Description
class:`MRcMLResult`
Notes

When K_max=0 and the DGP is free of pleiotropy MR-cML reduces to a measurement-error-corrected IVW (equivalent to the Bowden 2019 attenuation-corrected IVW). Compare with :func:grapple for a random-effects formulation of the same pleiotropy-robust target.

Examples:

>>> import statspai as sp
>>> res = sp.mr_cml(bx, by, sx, sy)
>>> print(res.summary())

mr_raps

mr_raps(beta_exposure: ndarray, beta_outcome: ndarray, se_exposure: ndarray, se_outcome: ndarray, *, tuning_c: float = 4.685, alpha: float = 0.05, beta_init: Optional[float] = None, tau2_init: float = 0.0001) -> MRRapsResult

Robust Adjusted Profile Score MR (Zhao et al. 2020).

Profile-likelihood MR with a Tukey biweight loss instead of Gaussian, making the estimator resistant to a small fraction of gross pleiotropy outliers. Same weak-instrument correction as :func:grapple (i.e. includes :math:\beta^2 s_x^2).

Parameters:

Name Type Description Default
beta_exposure ndarray
required
beta_outcome ndarray
required
se_exposure ndarray
required
se_outcome ndarray
required
tuning_c float

Tukey tuning constant. 4.685 gives 95% asymptotic efficiency under Gaussian errors; smaller values → more robust but less efficient. Zhao et al. 2020 recommend c=4.685 as default.

4.685
alpha float
0.05
beta_init float

Default IVW.

None
tau2_init float
1e-4

Returns:

Type Description
class:`MRRapsResult`
Notes

SE is computed from the sandwich formula for M-estimators:

.. math::

\mathrm{Var}(\hat\beta) \approx J^{-1} V J^{-1},

where :math:J = -\sum \psi'(r_i) \partial r_i / \partial\beta and :math:V = \sum \psi(r_i)^2 (\partial r_i/\partial\beta)^2. Evaluated numerically at the MLE with :math:\tau^2 profiled out.

Examples:

>>> import statspai as sp
>>> res = sp.mr_raps(bx, by, sx, sy)
>>> print(res.summary())

mr_available_methods

mr_available_methods() -> list[str]

Return the full list of registered MR method names (incl. aliases).