Skip to content

statspai.multilevel

multilevel

Multilevel / mixed-effects models.

Exports

mixed, MixedResult Linear mixed effects (Gaussian response) meglm, melogit, mepoisson, menbreg, megamma, meologit, MEGLMResult Generalised linear mixed models via Laplace approximation or adaptive Gauss-Hermite quadrature (nAGQ). icc Intra-class correlation with 95% CI lrtest Likelihood-ratio test between two fitted mixed models (with chi-bar² boundary correction when variance components are being tested)

MixedResult dataclass

Container for sp.mixed() estimation results.

The public attributes mirror Stata mixed and R lme4::lmer output; the underscored fields carry implementation state used by methods like predict() and r_squared().

params property

params: Series

Alias used by the unified StatsPAI result contract.

bse property

bse: Series

Fixed-effect standard errors (Stata style alias).

n_params property

n_params: int

Total free parameters (fixed + variance components + sigma²).

conf_int

conf_int(alpha: Optional[float] = None) -> DataFrame

Wald confidence intervals for the fixed effects.

ranef

ranef(conditional_se: bool = False) -> DataFrame

Return BLUPs (posterior means of the random effects).

Parameters:

Name Type Description Default
conditional_se bool

If True, return a second DataFrame of posterior standard errors computed from the conditional variance Var(u_j | y) = G − G Z_j' V_j^{−1} Z_j G.

False

predict

predict(data: Optional[DataFrame] = None, include_random: bool = True) -> Series

Fitted / predicted values.

include_random=True returns X β̂ + Z û (conditional on the realised random effects), which is the Stata predict , fitted default. include_random=False returns the marginal / population prediction X β̂.

For unseen groups (i.e. groups absent from training data), the random-effect contribution is treated as zero, which is the best-linear unbiased prediction when nothing is known about the new cluster.

r_squared

r_squared() -> Dict[str, float]

Nakagawa & Schielzeth (2013) marginal and conditional R².

Definitions, in our Gaussian LMM:

σ²_f  = variance of the fixed-effect predictions Xβ
σ²_r  = sum of random-effect variance contributions
       (trace of G weighted by Z'Z/n for random slopes)
σ²_ε  = residual variance

R²_marginal    = σ²_f / (σ²_f + σ²_r + σ²_ε)
R²_conditional = (σ²_f + σ²_r) / (σ²_f + σ²_r + σ²_ε)

wald_test

wald_test(restrictions: 'Sequence[str] | np.ndarray | pd.DataFrame') -> Dict[str, float]

Wald joint test of linear restrictions on the fixed effects.

restrictions can be:

  • A list of parameter names — tests those coefficients are zero.
  • A 2-D array / DataFrame R with R β = 0 (rows = restrictions, columns ordered as self.fixed_effects.index).

summary

summary() -> str

Stata-style formatted summary.

plot

plot(kind: str = 'caterpillar', variable: Optional[str] = None, **kwargs)

Quick diagnostic plots.

kind='caterpillar' produces a forest plot of BLUPs with posterior-SE intervals (the lme4 standard) for a chosen random effect (default: intercept). kind='residuals' plots the conditional residuals against fitted values.

MEGLMResult dataclass

Container for GLMM fits (meglm, melogit, mepoisson, megamma, menbreg, meologit).

dispersion property

dispersion: Optional[float]

Estimated dispersion (φ for gamma, α for nbinomial); None otherwise.

odds_ratios

odds_ratios() -> DataFrame

Exponentiated fixed effects with Wald CIs (binomial / ordinal).

incidence_rate_ratios

incidence_rate_ratios() -> DataFrame

Exponentiated coefficients for log-link count GLMMs (Poisson / NB).

to_latex

to_latex() -> str

Booktabs LaTeX fragment; mirrors MixedResult.to_latex.

plot

plot(kind: str = 'caterpillar', variable: Optional[str] = None, **kwargs)

Diagnostic plot for the GLMM fit.

Only kind='caterpillar' is currently supported: a forest plot of the BLUPs for one random effect (default: the intercept). Posterior SEs are not returned by meglm at present, so the error bars are omitted.

predict

predict(data: Optional[DataFrame] = None, include_random: bool = True, type: str = 'response') -> Series

Predict response or linear predictor.

type='response' returns μ = g⁻¹(η); type='linear' returns η itself. include_random=True adds Z û for groups seen at fit time.

meglm

meglm(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, family: str = 'gaussian', x_random: Optional[Sequence[str]] = None, cov_type: str = 'unstructured', trials: Optional[str] = None, offset: Optional[str] = None, nAGQ: int = 1, maxiter: int = 300, tol: float = 1e-06, alpha: float = 0.05) -> MEGLMResult

Fit a generalised linear mixed model.

Parameters:

Name Type Description Default
data DataFrame

Long-format dataframe.

required
y str

Outcome column. For binomial models this is the number of successes; pair it with trials= to model proportions.

required
x_fixed Sequence[str]

Fixed-effect regressors (intercept added automatically).

required
group str

Grouping variable for random effects.

required
family str

'gaussian', 'binomial', 'poisson', 'gamma', or 'nbinomial' (alias 'negbin').

'gaussian'
x_random Optional[Sequence[str]]

Random-slope variables; defaults to random intercept only.

None
cov_type str

Random-effect covariance: 'unstructured' (default), 'diagonal', 'identity'.

'unstructured'
trials Optional[str]

Column of trial counts for binomial responses. Defaults to 1 (Bernoulli).

None
offset Optional[str]

Column of fixed offsets added to the linear predictor (e.g. log(exposure) for Poisson rate models).

None
nAGQ int

Number of adaptive Gauss-Hermite quadrature points per scalar random effect. 1 (default) ≡ Laplace approximation. Use nAGQ=7 to match Stata meglm intpoints(7); values > 1 require a single scalar random effect (no random slopes).

1
maxiter int

Optimisation controls / CI width.

300
tol int

Optimisation controls / CI width.

300
alpha int

Optimisation controls / CI width.

300

Returns:

Type Description
MEGLMResult

melogit

melogit(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, trials: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects logistic regression (Stata melogit).

mepoisson

mepoisson(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects Poisson regression (Stata mepoisson).

menbreg

menbreg(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects negative-binomial regression (Stata menbreg).

megamma

megamma(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects Gamma GLMM with log link (Stata meglm family(gamma)).

meologit

meologit(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, cov_type: str = 'unstructured', offset: Optional[str] = None, nAGQ: int = 1, maxiter: int = 300, tol: float = 1e-06, alpha: float = 0.05) -> MEGLMResult

Random-effects ordinal logit (Stata meologit, R ordinal::clmm).

The outcome y is treated as ordered categorical with K levels. If y is numeric it is coerced to integer codes 1..K based on sorted unique values. Other dtypes (str, pandas categorical) are coerced via pd.Categorical and ordered by appearance.

No intercept enters β — its role is taken by the K-1 thresholds. Returns an :class:MEGLMResult with family='ordinal' and thresholds populated.

icc

icc(result, component: str = '_cons', alpha: float = 0.05, n_boot: int = 0, seed: Optional[int] = None) -> ICCResult

Intra-class correlation for a fitted mixed model.

Parameters:

Name Type Description Default
result

A MixedResult returned by :func:statspai.mixed.

required
component str

Name of the random-effect variance to put in the numerator. Defaults to the random intercept ("_cons").

'_cons'
alpha float

Significance level for the confidence interval. Default 0.05.

0.05
n_boot int

Number of parametric bootstrap replicates used to compute the CI. 0 (default) uses the delta-method approximation on the log-variance scale, which is faster and usually within a few decimals of the parametric-bootstrap answer for moderate N.

0
seed Optional[int]

RNG seed forwarded to :func:numpy.random.default_rng.

None

Returns:

Type Description
ICCResult

lrtest

lrtest(restricted: Any, full: Any, boundary: 'bool | None' = None) -> LRTestResult

Likelihood-ratio test comparing a restricted and a full model.

Parameters:

Name Type Description Default
restricted Any

Two fitted mixed models. full should strictly nest restricted — i.e. the parameter space of the restricted model is a subset of the full model's.

required
full Any

Two fitted mixed models. full should strictly nest restricted — i.e. the parameter space of the restricted model is a subset of the full model's.

required
boundary 'bool | None'

Whether to apply the χ̄² boundary correction. When None (default) we infer it from whether the restriction touches a variance component — the only parameters that live on the boundary of their support.

None

Returns:

Type Description
LRTestResult

Raises:

Type Description
ValueError

If the two fits have different response families (e.g. one is binomial, the other Poisson) — their log-likelihoods are not comparable and a naive LR statistic is meaningless.

ValueError

If either fit used REML and the fixed-effect design differs between the two models. REML log-likelihoods are only comparable across fits that share the same fixed-effect design matrix; always use ML for LR tests of fixed effects.