`statspai.multilevel`¶

multilevel ¶

Multilevel / mixed-effects models.

Exports

mixed, MixedResult Linear mixed effects (Gaussian response) meglm, melogit, mepoisson, menbreg, megamma, meologit, MEGLMResult Generalised linear mixed models via Laplace approximation or adaptive Gauss-Hermite quadrature (nAGQ). icc Intra-class correlation with 95% CI lrtest Likelihood-ratio test between two fitted mixed models (with chi-bar² boundary correction when variance components are being tested)

MixedResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for sp.mixed() estimation results.

The public attributes mirror Stata mixed and R lme4::lmer output; the underscored fields carry implementation state used by methods like predict() and r_squared().

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> g = np.repeat(np.arange(30), 10)
>>> x = rng.normal(size=300)
>>> y = 2.0 + 0.5 * x + rng.normal(0, 1.0, 30)[g] + rng.normal(0, 0.5, 300)
>>> df = pd.DataFrame({"y": y, "x": x, "school": g})
>>> res = sp.mixed(df, y="y", x_fixed=["x"], group="school")
>>> type(res).__name__
'MixedResult'
>>> bool("x" in res.fixed_effects.index)
True
>>> bool(0.0 <= res.icc <= 1.0)  # share of variance at the school level
True

params `property` ¶

params: Series

Alias used by the unified StatsPAI result contract.

bse `property` ¶

bse: Series

Fixed-effect standard errors (Stata style alias).

n_params `property` ¶

n_params: int

Total free parameters (fixed + variance components + sigma²).

conf_int ¶

conf_int(alpha: Optional[float] = None) -> DataFrame

Wald confidence intervals for the fixed effects.

ranef ¶

ranef(conditional_se: bool = False) -> DataFrame

Return BLUPs (posterior means of the random effects).

Parameters:

Name	Type	Description	Default
`conditional_se`	`bool`	If `True`, return a second DataFrame of posterior standard errors computed from the conditional variance `Var(u_j \| y) = G − G Z_j' V_j^{−1} Z_j G`.	`False`

predict ¶

predict(data: Optional[DataFrame] = None, include_random: bool = True) -> Series

Fitted / predicted values.

include_random=True returns X β̂ + Z û (conditional on the realised random effects), which is the Stata predict , fitted default. include_random=False returns the marginal / population prediction X β̂.

For unseen groups (i.e. groups absent from training data), the random-effect contribution is treated as zero, which is the best-linear unbiased prediction when nothing is known about the new cluster.

r_squared ¶

r_squared() -> Dict[str, float]

Nakagawa & Schielzeth (2013) marginal and conditional R².

Definitions, in our Gaussian LMM:

σ²_f  = variance of the fixed-effect predictions Xβ
σ²_r  = sum of random-effect variance contributions
       (trace of G weighted by Z'Z/n for random slopes)
σ²_ε  = residual variance

R²_marginal    = σ²_f / (σ²_f + σ²_r + σ²_ε)
R²_conditional = (σ²_f + σ²_r) / (σ²_f + σ²_r + σ²_ε)

wald_test ¶

wald_test(restrictions: 'Sequence[str] | np.ndarray | pd.DataFrame') -> Dict[str, float]

Wald joint test of linear restrictions on the fixed effects.

restrictions can be:

A list of parameter names — tests those coefficients are zero.
A 2-D array / DataFrame R with R β = 0 (rows = restrictions, columns ordered as self.fixed_effects.index).

summary ¶

summary() -> str

Stata-style formatted summary.

plot ¶

plot(kind: str = 'caterpillar', variable: Optional[str] = None, **kwargs: Any) -> Any

Quick diagnostic plots.

kind='caterpillar' produces a forest plot of BLUPs with posterior-SE intervals (the lme4 standard) for a chosen random effect (default: intercept). kind='residuals' plots the conditional residuals against fitted values.

MEGLMResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for GLMM fits (meglm, melogit, mepoisson, megamma, menbreg, meologit).

Exposes the fitted fixed_effects / random_effects, variance_components, BLUPs, and information criteria (aic / bic). Family-specific helpers include :meth:odds_ratios (binomial), :meth:incidence_rate_ratios (Poisson / NB), plus :meth:summary, :meth:predict, :meth:to_latex, and :meth:plot.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> p = 1.0 / (1.0 + np.exp(-(-0.3 + 0.8 * x + u)))
>>> y = (rng.uniform(size=240) < p).astype(float)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.melogit(df, y="y", x_fixed=["x"], group="gid")
>>> isinstance(res, sp.MEGLMResult)
True
>>> bool(np.isfinite(res.aic))
True

dispersion `property` ¶

dispersion: Optional[float]

Estimated dispersion (φ for gamma, α for nbinomial); None otherwise.

odds_ratios ¶

odds_ratios() -> DataFrame

Exponentiated fixed effects with Wald CIs (binomial / ordinal).

incidence_rate_ratios ¶

incidence_rate_ratios() -> DataFrame

Exponentiated coefficients for log-link count GLMMs (Poisson / NB).

to_latex ¶

to_latex(*, caption: Optional[str] = None, label: Optional[str] = None) -> str

Booktabs LaTeX fragment; mirrors MixedResult.to_latex.

plot ¶

plot(kind: str = 'caterpillar', variable: Optional[str] = None, **kwargs: Any) -> tuple[Any, Any]

Diagnostic plot for the GLMM fit.

Only kind='caterpillar' is currently supported: a forest plot of the BLUPs for one random effect (default: the intercept). Posterior SEs are not returned by meglm at present, so the error bars are omitted.

predict ¶

predict(data: Optional[DataFrame] = None, include_random: bool = True, type: str = 'response') -> Series

Predict response or linear predictor.

type='response' returns μ = g⁻¹(η); type='linear' returns η itself. include_random=True adds Z û for groups seen at fit time.

meglm ¶

meglm(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, family: str = 'gaussian', x_random: Optional[Sequence[str]] = None, cov_type: str = 'unstructured', trials: Optional[str] = None, offset: Optional[str] = None, nAGQ: int = 1, maxiter: int = 300, tol: float = 1e-08, alpha: float = 0.05) -> MEGLMResult

Fit a generalised linear mixed model.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format dataframe.	required
`y`	`str`	Outcome column. For binomial models this is the number of successes; pair it with `trials=` to model proportions.	required
`x_fixed`	`Sequence[str]`	Fixed-effect regressors (intercept added automatically).	required
`group`	`str`	Grouping variable for random effects.	required
`family`	`str`	`'gaussian'`, `'binomial'`, `'poisson'`, `'gamma'`, or `'nbinomial'` (alias `'negbin'`).	`'gaussian'`
`x_random`	`Optional[Sequence[str]]`	Random-slope variables; defaults to random intercept only.	`None`
`cov_type`	`str`	Random-effect covariance: `'unstructured'` (default), `'diagonal'`, `'identity'`.	`'unstructured'`
`trials`	`Optional[str]`	Column of trial counts for binomial responses. Defaults to 1 (Bernoulli).	`None`
`offset`	`Optional[str]`	Column of fixed offsets added to the linear predictor (e.g. `log(exposure)` for Poisson rate models).	`None`
`nAGQ`	`int`	Number of adaptive Gauss-Hermite quadrature points per scalar random effect. `1` (default) ≡ Laplace approximation. Use `nAGQ=7` to match Stata `meglm intpoints(7)`; values `> 1` require a single scalar random effect (no random slopes).	`1`
`maxiter`	`int`	Optimisation controls / CI width. The default `tol=1e-8` keeps the Laplace fixed-effect solution aligned with lme4/Stata reference likelihood optima on the parity fixtures. For AGHQ (`nAGQ > 1`), the default optimiser budget is internally tightened to `maxiter=5000` and `tol=1e-12`; explicit user-supplied controls are respected.	`300`
`tol`	`int`	Optimisation controls / CI width. The default `tol=1e-8` keeps the Laplace fixed-effect solution aligned with lme4/Stata reference likelihood optima on the parity fixtures. For AGHQ (`nAGQ > 1`), the default optimiser budget is internally tightened to `maxiter=5000` and `tol=1e-12`; explicit user-supplied controls are respected.	`300`
`alpha`	`int`	Optimisation controls / CI width. The default `tol=1e-8` keeps the Laplace fixed-effect solution aligned with lme4/Stata reference likelihood optima on the parity fixtures. For AGHQ (`nAGQ > 1`), the default optimiser budget is internally tightened to `maxiter=5000` and `tol=1e-12`; explicit user-supplied controls are respected.	`300`

Returns:

Type	Description
`MEGLMResult`

References

breslow1993

Examples:

Random-intercept logistic GLMM on simulated grouped binary data:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> p = 1.0 / (1.0 + np.exp(-(-0.3 + 0.8 * x + u)))
>>> y = (rng.uniform(size=240) < p).astype(float)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.meglm(df, y="y", x_fixed=["x"], group="gid", family="binomial")
>>> res.family, res.link
('binomial', 'logit')
>>> bool(res.fixed_effects["x"] > 0)  # recovers the positive slope
True

melogit ¶

melogit(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, trials: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects logistic regression (Stata melogit).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> p = 1.0 / (1.0 + np.exp(-(-0.3 + 0.8 * x + u)))
>>> y = (rng.uniform(size=240) < p).astype(float)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.melogit(df, y="y", x_fixed=["x"], group="gid")
>>> res.family
'binomial'
>>> bool(res.fixed_effects["x"] > 0)
True

mepoisson ¶

mepoisson(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects Poisson regression (Stata mepoisson).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> y = rng.poisson(np.exp(0.2 + 0.5 * x + u)).astype(float)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.mepoisson(df, y="y", x_fixed=["x"], group="gid")
>>> res.family, res.link
('poisson', 'log')
>>> bool(res.fixed_effects["x"] > 0)
True

menbreg ¶

menbreg(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects negative-binomial regression (Stata menbreg).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> y = rng.poisson(np.exp(0.2 + 0.5 * x + u)).astype(float)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.menbreg(df, y="y", x_fixed=["x"], group="gid")
>>> res.family
'nbinomial'
>>> res.dispersion is not None  # NB-2 overdispersion alpha
True

megamma ¶

megamma(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, offset: Optional[str] = None, nAGQ: int = 1, **kw: Any) -> MEGLMResult

Random-effects Gamma GLMM with log link (Stata meglm family(gamma)).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(20), 12)
>>> x = rng.normal(size=240)
>>> u = rng.normal(0, 0.7, 20)[g]
>>> y = np.exp(0.3 + 0.4 * x + u) * rng.gamma(4.0, 0.25, size=240)
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.megamma(df, y="y", x_fixed=["x"], group="gid")
>>> res.family, res.link
('gamma', 'log')
>>> res.dispersion is not None  # phi estimated by ML
True

meologit ¶

meologit(data: DataFrame, y: str, x_fixed: Sequence[str], group: str, x_random: Optional[Sequence[str]] = None, cov_type: str = 'unstructured', offset: Optional[str] = None, nAGQ: int = 1, maxiter: int = 300, tol: float = 1e-06, alpha: float = 0.05) -> MEGLMResult

Random-effects ordinal logit (Stata meologit, R ordinal::clmm).

The outcome y is treated as ordered categorical with K levels. If y is numeric it is coerced to integer codes 1..K based on sorted unique values. Other dtypes (str, pandas categorical) are coerced via pd.Categorical and ordered by appearance.

No intercept enters β — its role is taken by the K-1 thresholds. Returns an :class:MEGLMResult with family='ordinal' and thresholds populated.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(25), 12)
>>> x = rng.normal(size=300)
>>> u = rng.normal(0, 0.6, 25)[g]
>>> latent = 0.9 * x + u + rng.logistic(size=300)
>>> y = np.ones(300, dtype=int)
>>> y[latent > -1.0] = 2
>>> y[latent > 1.0] = 3
>>> df = pd.DataFrame({"y": y, "x": x, "gid": g})
>>> res = sp.meologit(df, y="y", x_fixed=["x"], group="gid")
>>> isinstance(res, sp.MEGLMResult)
True
>>> res.family
'ordinal'
>>> bool(res.fixed_effects["x"] > 0)  # recovers the positive slope
True

icc ¶

icc(result: Any, component: str = '_cons', alpha: float = 0.05, n_boot: int = 0, seed: Optional[int] = None) -> ICCResult

Intra-class correlation for a fitted mixed model.

Parameters:

Name	Type	Description	Default
`result`	`Any`	A `MixedResult` returned by :func:`statspai.mixed`.	required
`component`	`str`	Name of the random-effect variance to put in the numerator. Defaults to the random intercept (`"_cons"`).	`'_cons'`
`alpha`	`float`	Significance level for the confidence interval. Default 0.05.	`0.05`
`n_boot`	`int`	Number of parametric bootstrap replicates used to compute the CI. `0` (default) uses the delta-method approximation on the log-variance scale, which is faster and usually within a few decimals of the parametric-bootstrap answer for moderate N.	`0`
`seed`	`Optional[int]`	RNG seed forwarded to :func:`numpy.random.default_rng`.	`None`

Returns:

Type	Description
`ICCResult`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> g = np.repeat(np.arange(30), 10)
>>> x = rng.normal(size=300)
>>> y = 2.0 + 0.5 * x + rng.normal(0, 1.0, 30)[g] + rng.normal(0, 0.5, 300)
>>> df = pd.DataFrame({"y": y, "x": x, "school": g})
>>> res = sp.mixed(df, y="y", x_fixed=["x"], group="school")
>>> rho = sp.icc(res)
>>> bool(0.0 <= float(rho) <= 1.0)  # share of variance at the school level
True

lrtest ¶

lrtest(restricted: Any, full: Any, boundary: 'bool | None' = None) -> LRTestResult

Likelihood-ratio test comparing a restricted and a full model.

Parameters:

Name	Type	Description	Default
`restricted`	`Any`	Two fitted mixed models. `full` should strictly nest `restricted` — i.e. the parameter space of the restricted model is a subset of the full model's.	required
`full`	`Any`	Two fitted mixed models. `full` should strictly nest `restricted` — i.e. the parameter space of the restricted model is a subset of the full model's.	required
`boundary`	`'bool \| None'`	Whether to apply the χ̄² boundary correction. When `None` (default) we infer it from whether the restriction touches a variance component — the only parameters that live on the boundary of their support.	`None`

Returns:

Type	Description
`LRTestResult`

Raises:

Type	Description
`ValueError`	If the two fits have different response families (e.g. one is binomial, the other Poisson) — their log-likelihoods are not comparable and a naive LR statistic is meaningless.
`ValueError`	If either fit used REML and the fixed-effect design differs between the two models. REML log-likelihoods are only comparable across fits that share the same fixed-effect design matrix; always use ML for LR tests of fixed effects.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(42)
>>> g = np.repeat(np.arange(30), 10)
>>> x = rng.normal(size=300)
>>> y = 2.0 + 0.5 * x + rng.normal(0, 1.0, 30)[g] + rng.normal(0, 0.5, 300)
>>> df = pd.DataFrame({"y": y, "x": x, "school": g})
>>> restricted = sp.mixed(df, y="y", x_fixed=["x"], group="school",
...                       method="ml")
>>> full = sp.mixed(df, y="y", x_fixed=["x"], group="school",
...                 x_random=["x"], method="ml")
>>> res = sp.lrtest(restricted, full)  # test the random slope on x
>>> bool(0.0 <= res.p_value <= 1.0)
True

statspai.multilevel¶

multilevel ¶

MixedResult dataclass ¶

params property ¶

bse property ¶

n_params property ¶

conf_int ¶

ranef ¶

predict ¶

r_squared ¶

wald_test ¶

summary ¶

plot ¶

MEGLMResult dataclass ¶

dispersion property ¶

odds_ratios ¶

incidence_rate_ratios ¶

to_latex ¶

plot ¶

predict ¶

meglm ¶

melogit ¶

mepoisson ¶

menbreg ¶

megamma ¶

meologit ¶

icc ¶

lrtest ¶

`statspai.multilevel`¶

MixedResult `dataclass` ¶

params `property` ¶

bse `property` ¶

n_params `property` ¶

MEGLMResult `dataclass` ¶

dispersion `property` ¶