Skip to content

statspai.inference

inference

Inference module for StatsPAI.

Provides robust inference methods for supported estimator results: - Wild Cluster Bootstrap (Cameron, Gelbach & Miller 2008) - Randomization Inference (Fisher 1935; Young 2019)

FisherResult

Result container for Fisher's exact permutation test.

Attributes:

Name Type Description
statistic float

Observed test statistic.

p_value float

Two-sided permutation p-value.

p_one_sided float

One-sided (greater) permutation p-value.

ci tuple of float

Confidence interval from Hodges-Lehmann inversion.

perm_dist ndarray

Full permutation distribution of the test statistic.

statistic_type str

Name of the test statistic used.

n_perm int

Number of permutations performed.

n_obs int

Number of observations.

n_treated int

Number of treated units.

summary

summary() -> str

Return a formatted summary string.

plot

plot(ax=None, figsize=(8, 5))

Plot the permutation distribution with observed value.

Parameters:

Name Type Description Default
ax matplotlib Axes

Axes to plot on. If None, creates a new figure.

None
figsize tuple

Figure size if creating a new figure.

(8, 5)

Returns:

Type Description
tuple of (fig, ax)

BootstrapResult dataclass

Container for bootstrap inference results.

PATEEstimator

Population Average Treatment Effect estimator.

fit

fit() -> CausalResult

Estimate PATE and return a CausalResult.

MetaAnalysisResult dataclass

Result of a summary-data meta-analysis.

Attributes:

Name Type Description
estimate float

Pooled effect under the chosen model (random-effects by default).

se, ci (float, tuple)

Standard error and confidence interval of the pooled effect.

p_value float

Two-sided p-value for the pooled effect being zero.

method str

"fixed" or "DL" (DerSimonian-Laird random effects).

fixed_estimate, fixed_se float

Fixed-effect pooled estimate and SE (always reported).

random_estimate, random_se float

Random-effects pooled estimate and SE (always reported).

tau2, q, q_df, q_pvalue, i2, h2 float

Between-study variance and heterogeneity statistics.

prediction_interval tuple or None

Random-effects prediction interval (where a future study's true effect is expected to lie); None when fewer than 3 studies.

weights ndarray

Per-study weights under the chosen model (normalised to sum to 1).

egger_test

egger_test() -> Dict[str, float]

Egger's regression test for funnel-plot asymmetry.

Regresses the standard normal deviate y_i / se_i on precision 1 / se_i; a non-zero intercept indicates small-study effects. Requires at least 3 studies.

forest_plot

forest_plot(ax: Any = None, **kwargs: Any) -> Any

Forest plot: per-study effects + CIs with the pooled diamond.

funnel_plot

funnel_plot(ax: Any = None, **kwargs: Any) -> Any

Funnel plot: effect size vs. standard error (precision).

wild_cluster_bootstrap

wild_cluster_bootstrap(data: DataFrame, y: str, x: List[str], cluster: str, test_var: Optional[str] = None, h0: float = 0, n_boot: int = 999, weight_type: str = 'rademacher', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]

Wild Cluster Bootstrap p-value and confidence interval.

Provides valid inference when the number of clusters is small, where conventional cluster-robust standard errors are unreliable.

Implements the WCR (Wild Cluster Restricted) bootstrap from Cameron, Gelbach & Miller (2008), with Rademacher weights (default) or Webb (2014) 6-point weights for very few clusters.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
y str

Outcome variable.

required
x list of str

All regressors (including the variable being tested).

required
cluster str

Cluster variable name.

required
test_var str

Variable to test. Default: last variable in x.

None
h0 float

Null hypothesis value for the test variable coefficient.

0
n_boot int

Number of bootstrap replications. Use odd number for exact p-value computation.

999
weight_type str

Bootstrap weight distribution: - 'rademacher': ±1 with equal probability. Standard choice. - 'webb': 6-point distribution from Webb (2014). Recommended when G < 12 clusters. - 'mammen': Mammen (1993) 2-point distribution.

'rademacher'
seed int

Random seed for reproducibility.

None
alpha float

Significance level for the confidence interval.

0.05

Returns:

Type Description
dict

Keys: - beta_hat: OLS point estimate of test_var - se_cluster: conventional cluster-robust SE - t_stat: t-statistic under H0 - p_boot: wild cluster bootstrap p-value (two-sided) - ci_boot: bootstrap percentile-t confidence interval - n_clusters: number of clusters - n_boot: number of replications used - weight_type: weight distribution used - recommendation: human-readable guidance

Examples:

>>> result = wild_cluster_bootstrap(
...     df, y='wage', x=['education', 'experience'],
...     cluster='state', test_var='education')
>>> print(f"Bootstrap p = {result['p_boot']:.4f}")
>>> print(f"Bootstrap CI = {result['ci_boot']}")
Notes

The WCR bootstrap imposes the null H0: β_test = h0 when generating bootstrap samples. This yields better size control than the unrestricted wild bootstrap, especially with few clusters.

For G < 12 clusters, Webb (2014) weights are recommended over Rademacher. The function will auto-warn if Rademacher is used with very few clusters.

See Cameron, Gelbach & Miller (2008, REStat) Section III for the full algorithm description.

ri_test

ri_test(data: DataFrame, y: str, treat: str, stat: str = 'diff_means', n_perms: int = 1000, cluster: Optional[str] = None, seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]

Randomization inference p-value.

Computes the Fisher exact p-value by permuting the treatment vector and recalculating the test statistic.

Parameters:

Name Type Description Default
data DataFrame
required
y str

Outcome variable.

required
treat str

Binary treatment indicator (0/1).

required
stat str or callable

Test statistic: - 'diff_means': difference in means (Y_bar_1 - Y_bar_0) - 'ks': Kolmogorov-Smirnov statistic - 't': t-statistic - A callable f(Y, D) -> float for custom statistics.

'diff_means'
n_perms int

Number of random permutations. Use 10000+ for publications.

1000
cluster str

Cluster-level permutation (permute treatment at cluster level).

None
seed int

Random seed.

None
alpha float
0.05

Returns:

Type Description
dict

'observed': observed test statistic 'p_value': two-sided randomization p-value 'p_one_sided': one-sided (greater) p-value 'n_perms': number of permutations used 'perm_distribution': array of permuted statistics

Examples:

>>> result = sp.ri_test(df, y='outcome', treat='treatment',
...                     n_perms=5000, seed=42)
>>> print(f"RI p-value: {result['p_value']:.4f}")
Notes

Under the sharp null H0: Y_i(1) = Y_i(0) for all i, the treatment assignment is the only source of randomness. The RI p-value is the fraction of permutation statistics at least as extreme as the observed statistic.

For cluster-randomized experiments, treatment is permuted at the cluster level (all units in a cluster get the same permuted status).

See Young (2019, QJE) for why RI p-values should be reported alongside asymptotic p-values in experimental papers.

fisher_exact

fisher_exact(data: DataFrame, y: str, treatment: str, statistic: str = 'ate', controls: Optional[List[str]] = None, n_perm: int = 10000, stratify: Optional[str] = None, cluster: Optional[str] = None, seed: Optional[int] = None, alpha: float = 0.05) -> FisherResult

Fisher's exact randomization test with enhanced features.

Computes a permutation-based p-value and Hodges-Lehmann confidence interval for the treatment effect under the sharp null hypothesis.

Parameters:

Name Type Description Default
data DataFrame

Input data.

required
y str

Outcome variable name.

required
treatment str

Binary treatment variable name (0/1).

required
statistic str

Test statistic to use: - 'ate': Average treatment effect (difference in means). - 'ks': Kolmogorov-Smirnov statistic. - 'rank_sum': Wilcoxon rank-sum statistic.

'ate'
controls list of str

Control variables for covariate-adjusted inference. When provided, the test statistic is computed on residuals from regressing Y on controls.

None
n_perm int

Number of random permutations.

10000
stratify str

Variable for stratified permutation (permute within strata).

None
cluster str

Variable for cluster-level randomization (permute by cluster).

None
seed int

Random seed for reproducibility.

None
alpha float

Significance level for the Hodges-Lehmann confidence interval.

0.05

Returns:

Type Description
FisherResult

Object with statistic, p_value, ci, perm_dist, and methods for summary() and plot().

Examples:

>>> result = sp.fisher_exact(
...     data=df, y="outcome", treatment="treated",
...     statistic="ate", n_perm=10000, seed=42)
>>> result.summary()
>>> result.plot()
>>> # Stratified permutation
>>> result = sp.fisher_exact(
...     data=df, y="outcome", treatment="treated",
...     stratify="block", n_perm=5000)
>>> # Covariate-adjusted
>>> result = sp.fisher_exact(
...     data=df, y="outcome", treatment="treated",
...     controls=["age", "income"], n_perm=10000)
Notes

Under the sharp null H0: Y_i(1) = Y_i(0) for all i, treatment assignment is the only source of randomness. The p-value is the proportion of permuted statistics at least as extreme as observed.

The Hodges-Lehmann CI is constructed by inverting the permutation test: find the set of tau_0 values for which the test of Y - tau_0 * D does not reject at level alpha.

For stratified experiments, treatment is permuted within strata. For cluster-randomized experiments, treatment is permuted at the cluster level.

See Young (2019, QJE) for why randomization p-values should accompany asymptotic p-values in experimental papers.

jackknife_se

jackknife_se(result: EconometricResults, data: DataFrame, cluster: str, alpha: float = 0.05) -> EconometricResults

Leave-one-cluster-out jackknife standard errors.

Computes cluster jackknife variance by re-estimating the model G times, each time dropping one cluster. Valid when G is small and conventional cluster-robust SEs are unreliable.

Parameters:

Name Type Description Default
result EconometricResults

A fitted regression result from sp.regress().

required
data DataFrame

The original data used for estimation.

required
cluster str

Name of the cluster variable in data.

required
alpha float

Significance level for confidence intervals.

0.05

Returns:

Type Description
EconometricResults

New results object with jackknife standard errors, t-statistics based on effective DoF, and adjusted p-values.

Examples:

>>> result = sp.regress("y ~ x1 + x2", data=df, cluster="state")
>>> jk = sp.jackknife_se(result, data=df, cluster="state")
>>> print(jk.summary())
Notes

The cluster jackknife variance is:

V_jk = ((G-1)/G) * sum_g (theta_g - theta_bar)^2

where theta_g is the estimate from dropping cluster g, and theta_bar is the mean of the leave-one-out estimates.

The effective degrees of freedom are G - 1, which gives wider confidence intervals than asymptotic cluster-robust SEs.

cr2_se

cr2_se(result: EconometricResults, data: DataFrame, cluster: str, alpha: float = 0.05) -> EconometricResults

CR2 bias-corrected cluster-robust standard errors (Bell & McCaffrey 2002).

Applies a bias correction to the cluster-robust variance estimator that accounts for leverage and improves finite-sample performance. The correction factor uses the inverse square root of (I - H_gg), where H_gg is the within-cluster hat matrix block.

Parameters:

Name Type Description Default
result EconometricResults

A fitted regression result from sp.regress().

required
data DataFrame

The original data used for estimation.

required
cluster str

Name of the cluster variable in data.

required
alpha float

Significance level for confidence intervals.

0.05

Returns:

Type Description
EconometricResults

New results object with CR2-corrected standard errors.

Notes

The CR2 estimator replaces the raw residuals e_g with adjusted residuals:

e*_g = (I - H_gg)^{-1/2} e_g

where H_gg = X_g (X'X)^{-1} X_g' is the within-cluster block of the hat matrix. This removes the downward bias in the conventional cluster-robust variance estimator.

See Bell & McCaffrey (2002) and Pustejovsky & Tipton (2018).

wild_cluster_boot

wild_cluster_boot(result: EconometricResults, data: DataFrame, cluster: str, variable: str, n_boot: int = 999, weight_type: str = 'rademacher', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]

Wild cluster bootstrap t-test for a single coefficient.

Implements the WCR (Wild Cluster Restricted) bootstrap from Cameron, Gelbach & Miller (2008), designed for inference with few clusters. Imposes the null hypothesis when generating bootstrap samples for better finite-sample performance.

Parameters:

Name Type Description Default
result EconometricResults

A fitted regression result from sp.regress().

required
data DataFrame

The original data used for estimation.

required
cluster str

Name of the cluster variable.

required
variable str

Name of the coefficient to test (H0: beta = 0).

required
n_boot int

Number of bootstrap replications (use odd number).

999
weight_type str

Bootstrap weight distribution: - 'rademacher': +/-1 with equal probability. - 'webb': Webb (2014) 6-point distribution for G < 12. - 'mammen': Mammen (1993) 2-point distribution.

'rademacher'
seed int

Random seed for reproducibility.

None
alpha float

Significance level for confidence interval.

0.05

Returns:

Type Description
dict

Keys: - beta_hat: point estimate - se_cluster: conventional cluster-robust SE - t_stat: observed t-statistic - p_boot: bootstrap p-value (two-sided) - ci_boot: bootstrap percentile-t confidence interval - t_distribution: array of bootstrap t-statistics - n_clusters: number of clusters - n_boot: number of replications

Examples:

>>> result = sp.regress("y ~ x1 + x2", data=df, cluster="state")
>>> boot = sp.wild_cluster_boot(result, data=df, cluster="state",
...                              variable="x1", n_boot=999)
>>> print(f"Bootstrap p = {boot['p_boot']:.4f}")
>>> print(f"Bootstrap CI = {boot['ci_boot']}")
Notes

The WCR bootstrap imposes H0: beta_variable = 0 when constructing bootstrap samples. Under few clusters, this yields much better size control than unrestricted wild bootstrap or cluster-robust SEs.

For G < 12, use weight_type='webb' per Webb (2014).

subcluster_wild_bootstrap

subcluster_wild_bootstrap(data: DataFrame, y: str, x: List[str], cluster: str, subcluster: Optional[str] = None, test_var: Optional[str] = None, h0: float = 0.0, n_boot: int = 999, weight_type: str = 'webb', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]

Subcluster wild cluster bootstrap for few-treated-clusters.

When treatment varies within cluster (or when you want to re-expand the randomization at a finer grain), sign-flips happen at the sub-cluster level. SEs are still computed clustered at the coarse cluster level.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
x list of str
required
cluster str

Primary cluster column (for SE computation).

required
subcluster str or None

Finer grouping at which sign-flips occur. If None, every observation is its own subcluster (pure Rademacher at obs level).

None
test_var str or None

Parameter to test; default last element of x.

None
h0 float

Null value.

0.0
n_boot int

Bootstrap replications.

999
weight_type ('rademacher', 'webb', 'mammen')

Distribution of sign flips. 'webb' (6-point) recommended when treatment has <= 5 treated clusters.

'rademacher'
seed int
None
alpha float
0.05

Returns:

Type Description
dict with ``p_boot``, ``ci_boot``, ``beta_hat``, ``t_stat``,
``se_cluster``, ``n_sub``, ``recommendation``.

wild_cluster_ci_inv

wild_cluster_ci_inv(data: DataFrame, y: str, x: List[str], cluster: str, test_var: Optional[str] = None, n_boot: int = 999, weight_type: str = 'webb', alpha: float = 0.05, grid_size: int = 41, grid_span: float = 6.0, seed: Optional[int] = None) -> Dict[str, Any]

Confidence interval via bootstrap p-value inversion.

Runs the wild cluster bootstrap repeatedly across a grid of null values and finds the boundary where the two-sided bootstrap p-value equals alpha. This yields a WCR-inverted CI with better small-cluster coverage than the percentile-t CI.

The grid is centered on the OLS point estimate with half-width grid_span * se_cluster and grid_size evenly-spaced points. Linear interpolation refines the boundary.

Shares the data / model arguments of :func:subcluster_wild_bootstrap; the grid-specific parameters are documented below.

Parameters:

Name Type Description Default
grid_size int

Number of null-value grid points to evaluate (odd preferred).

41
grid_span float

Half-width of the search grid in units of cluster-robust SE.

6.0

Returns:

Type Description
dict with ``ci``, ``p_grid`` (grid_size,), ``h0_grid``, ``beta_hat``,
``se_cluster``.

multiway_cluster_vcov

multiway_cluster_vcov(X: ndarray, resid: ndarray, clusters: Union[ndarray, List[ndarray]], df_adjust: bool = True, n_params: Optional[int] = None, psd_correct: bool = True) -> ndarray

Compute N-way cluster-robust variance of an OLS coefficient vector.

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))

Design matrix used in the regression.

required
resid (ndarray, shape(n))

OLS residuals.

required
clusters ndarray or list of ndarrays

One or more cluster variables, one per dimension. Non-numeric labels are supported.

required
df_adjust bool

If True, apply the G/(G-1) * (n-1)/(n-k) CR1 finite-sample correction per component variance. If False, uses raw sandwich (useful when the caller has already degreed-freedom adjusted).

True
n_params int

Override for the k used in DOF adjustment; useful when FEs have been absorbed (pass total absorbed DOF here).

None
psd_correct bool

Project V onto PSD cone by zeroing negative eigenvalues.

True

Returns:

Type Description
(ndarray, shape(k, k))

Multiway-cluster-robust variance-covariance matrix.

cluster_robust_se

cluster_robust_se(X: ndarray, resid: ndarray, clusters: Union[ndarray, List[ndarray]], **kwargs) -> ndarray

Return cluster-robust standard errors (diagonal sqrt of vcov).

cr3_jackknife_vcov

cr3_jackknife_vcov(X: ndarray, y: ndarray, cluster: ndarray) -> ndarray

CR3 cluster-jackknife variance (delete-one-cluster).

Drops each cluster in turn, refits OLS on the reduced sample, and forms the empirical variance of the leave-one-out coefficient vector. Scales by (G-1)/G. Often more conservative than CR1/CR2 and useful when clusters are unbalanced (Bell-McCaffrey 2002).

Complexity is O(G · k²) refits; suitable when G <= a few hundred.

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))
required
y (ndarray, shape(n))
required
cluster (ndarray, shape(n))
required

Returns:

Type Description
(ndarray, shape(k, k))