statspai.inference¶
inference ¶
Inference module for StatsPAI.
Provides robust inference methods for supported estimator results: - Wild Cluster Bootstrap (Cameron, Gelbach & Miller 2008) - Randomization Inference (Fisher 1935; Young 2019)
FisherResult ¶
Result container for Fisher's exact permutation test.
Attributes:
| Name | Type | Description |
|---|---|---|
statistic |
float
|
Observed test statistic. |
p_value |
float
|
Two-sided permutation p-value. |
p_one_sided |
float
|
One-sided (greater) permutation p-value. |
ci |
tuple of float
|
Confidence interval from Hodges-Lehmann inversion. |
perm_dist |
ndarray
|
Full permutation distribution of the test statistic. |
statistic_type |
str
|
Name of the test statistic used. |
n_perm |
int
|
Number of permutations performed. |
n_obs |
int
|
Number of observations. |
n_treated |
int
|
Number of treated units. |
plot ¶
Plot the permutation distribution with observed value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
ax
|
matplotlib Axes
|
Axes to plot on. If None, creates a new figure. |
None
|
figsize
|
tuple
|
Figure size if creating a new figure. |
(8, 5)
|
Returns:
| Type | Description |
|---|---|
tuple of (fig, ax)
|
|
BootstrapResult
dataclass
¶
Container for bootstrap inference results.
PATEEstimator ¶
Population Average Treatment Effect estimator.
MetaAnalysisResult
dataclass
¶
Result of a summary-data meta-analysis.
Attributes:
| Name | Type | Description |
|---|---|---|
estimate |
float
|
Pooled effect under the chosen model (random-effects by default). |
se, ci |
(float, tuple)
|
Standard error and confidence interval of the pooled effect. |
p_value |
float
|
Two-sided p-value for the pooled effect being zero. |
method |
str
|
|
fixed_estimate, fixed_se |
float
|
Fixed-effect pooled estimate and SE (always reported). |
random_estimate, random_se |
float
|
Random-effects pooled estimate and SE (always reported). |
tau2, q, q_df, q_pvalue, i2, h2 |
float
|
Between-study variance and heterogeneity statistics. |
prediction_interval |
tuple or None
|
Random-effects prediction interval (where a future study's true
effect is expected to lie); |
weights |
ndarray
|
Per-study weights under the chosen model (normalised to sum to 1). |
egger_test ¶
Egger's regression test for funnel-plot asymmetry.
Regresses the standard normal deviate y_i / se_i on precision
1 / se_i; a non-zero intercept indicates small-study effects.
Requires at least 3 studies.
forest_plot ¶
Forest plot: per-study effects + CIs with the pooled diamond.
funnel_plot ¶
Funnel plot: effect size vs. standard error (precision).
wild_cluster_bootstrap ¶
wild_cluster_bootstrap(data: DataFrame, y: str, x: List[str], cluster: str, test_var: Optional[str] = None, h0: float = 0, n_boot: int = 999, weight_type: str = 'rademacher', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]
Wild Cluster Bootstrap p-value and confidence interval.
Provides valid inference when the number of clusters is small, where conventional cluster-robust standard errors are unreliable.
Implements the WCR (Wild Cluster Restricted) bootstrap from Cameron, Gelbach & Miller (2008), with Rademacher weights (default) or Webb (2014) 6-point weights for very few clusters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable. |
required |
x
|
list of str
|
All regressors (including the variable being tested). |
required |
cluster
|
str
|
Cluster variable name. |
required |
test_var
|
str
|
Variable to test. Default: last variable in |
None
|
h0
|
float
|
Null hypothesis value for the test variable coefficient. |
0
|
n_boot
|
int
|
Number of bootstrap replications. Use odd number for exact p-value computation. |
999
|
weight_type
|
str
|
Bootstrap weight distribution:
- |
'rademacher'
|
seed
|
int
|
Random seed for reproducibility. |
None
|
alpha
|
float
|
Significance level for the confidence interval. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
- |
Examples:
>>> result = wild_cluster_bootstrap(
... df, y='wage', x=['education', 'experience'],
... cluster='state', test_var='education')
>>> print(f"Bootstrap p = {result['p_boot']:.4f}")
>>> print(f"Bootstrap CI = {result['ci_boot']}")
Notes
The WCR bootstrap imposes the null H0: β_test = h0 when generating bootstrap samples. This yields better size control than the unrestricted wild bootstrap, especially with few clusters.
For G < 12 clusters, Webb (2014) weights are recommended over Rademacher. The function will auto-warn if Rademacher is used with very few clusters.
See Cameron, Gelbach & Miller (2008, REStat) Section III for the full algorithm description.
ri_test ¶
ri_test(data: DataFrame, y: str, treat: str, stat: str = 'diff_means', n_perms: int = 1000, cluster: Optional[str] = None, seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]
Randomization inference p-value.
Computes the Fisher exact p-value by permuting the treatment vector and recalculating the test statistic.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
Outcome variable. |
required |
treat
|
str
|
Binary treatment indicator (0/1). |
required |
stat
|
str or callable
|
Test statistic:
- |
'diff_means'
|
n_perms
|
int
|
Number of random permutations. Use 10000+ for publications. |
1000
|
cluster
|
str
|
Cluster-level permutation (permute treatment at cluster level). |
None
|
seed
|
int
|
Random seed. |
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
>>> result = sp.ri_test(df, y='outcome', treat='treatment',
... n_perms=5000, seed=42)
>>> print(f"RI p-value: {result['p_value']:.4f}")
Notes
Under the sharp null H0: Y_i(1) = Y_i(0) for all i, the treatment assignment is the only source of randomness. The RI p-value is the fraction of permutation statistics at least as extreme as the observed statistic.
For cluster-randomized experiments, treatment is permuted at the cluster level (all units in a cluster get the same permuted status).
See Young (2019, QJE) for why RI p-values should be reported alongside asymptotic p-values in experimental papers.
fisher_exact ¶
fisher_exact(data: DataFrame, y: str, treatment: str, statistic: str = 'ate', controls: Optional[List[str]] = None, n_perm: int = 10000, stratify: Optional[str] = None, cluster: Optional[str] = None, seed: Optional[int] = None, alpha: float = 0.05) -> FisherResult
Fisher's exact randomization test with enhanced features.
Computes a permutation-based p-value and Hodges-Lehmann confidence interval for the treatment effect under the sharp null hypothesis.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Input data. |
required |
y
|
str
|
Outcome variable name. |
required |
treatment
|
str
|
Binary treatment variable name (0/1). |
required |
statistic
|
str
|
Test statistic to use:
- |
'ate'
|
controls
|
list of str
|
Control variables for covariate-adjusted inference. When provided, the test statistic is computed on residuals from regressing Y on controls. |
None
|
n_perm
|
int
|
Number of random permutations. |
10000
|
stratify
|
str
|
Variable for stratified permutation (permute within strata). |
None
|
cluster
|
str
|
Variable for cluster-level randomization (permute by cluster). |
None
|
seed
|
int
|
Random seed for reproducibility. |
None
|
alpha
|
float
|
Significance level for the Hodges-Lehmann confidence interval. |
0.05
|
Returns:
| Type | Description |
|---|---|
FisherResult
|
Object with statistic, p_value, ci, perm_dist, and methods for summary() and plot(). |
Examples:
>>> result = sp.fisher_exact(
... data=df, y="outcome", treatment="treated",
... statistic="ate", n_perm=10000, seed=42)
>>> result.summary()
>>> result.plot()
>>> # Stratified permutation
>>> result = sp.fisher_exact(
... data=df, y="outcome", treatment="treated",
... stratify="block", n_perm=5000)
>>> # Covariate-adjusted
>>> result = sp.fisher_exact(
... data=df, y="outcome", treatment="treated",
... controls=["age", "income"], n_perm=10000)
Notes
Under the sharp null H0: Y_i(1) = Y_i(0) for all i, treatment assignment is the only source of randomness. The p-value is the proportion of permuted statistics at least as extreme as observed.
The Hodges-Lehmann CI is constructed by inverting the permutation test: find the set of tau_0 values for which the test of Y - tau_0 * D does not reject at level alpha.
For stratified experiments, treatment is permuted within strata. For cluster-randomized experiments, treatment is permuted at the cluster level.
See Young (2019, QJE) for why randomization p-values should accompany asymptotic p-values in experimental papers.
jackknife_se ¶
jackknife_se(result: EconometricResults, data: DataFrame, cluster: str, alpha: float = 0.05) -> EconometricResults
Leave-one-cluster-out jackknife standard errors.
Computes cluster jackknife variance by re-estimating the model G times, each time dropping one cluster. Valid when G is small and conventional cluster-robust SEs are unreliable.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
EconometricResults
|
A fitted regression result from |
required |
data
|
DataFrame
|
The original data used for estimation. |
required |
cluster
|
str
|
Name of the cluster variable in |
required |
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
New results object with jackknife standard errors, t-statistics based on effective DoF, and adjusted p-values. |
Examples:
>>> result = sp.regress("y ~ x1 + x2", data=df, cluster="state")
>>> jk = sp.jackknife_se(result, data=df, cluster="state")
>>> print(jk.summary())
Notes
The cluster jackknife variance is:
V_jk = ((G-1)/G) * sum_g (theta_g - theta_bar)^2
where theta_g is the estimate from dropping cluster g, and theta_bar is the mean of the leave-one-out estimates.
The effective degrees of freedom are G - 1, which gives wider confidence intervals than asymptotic cluster-robust SEs.
cr2_se ¶
cr2_se(result: EconometricResults, data: DataFrame, cluster: str, alpha: float = 0.05) -> EconometricResults
CR2 bias-corrected cluster-robust standard errors (Bell & McCaffrey 2002).
Applies a bias correction to the cluster-robust variance estimator that accounts for leverage and improves finite-sample performance. The correction factor uses the inverse square root of (I - H_gg), where H_gg is the within-cluster hat matrix block.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
EconometricResults
|
A fitted regression result from |
required |
data
|
DataFrame
|
The original data used for estimation. |
required |
cluster
|
str
|
Name of the cluster variable in |
required |
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
New results object with CR2-corrected standard errors. |
Notes
The CR2 estimator replaces the raw residuals e_g with adjusted residuals:
e*_g = (I - H_gg)^{-1/2} e_g
where H_gg = X_g (X'X)^{-1} X_g' is the within-cluster block of the hat matrix. This removes the downward bias in the conventional cluster-robust variance estimator.
See Bell & McCaffrey (2002) and Pustejovsky & Tipton (2018).
wild_cluster_boot ¶
wild_cluster_boot(result: EconometricResults, data: DataFrame, cluster: str, variable: str, n_boot: int = 999, weight_type: str = 'rademacher', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]
Wild cluster bootstrap t-test for a single coefficient.
Implements the WCR (Wild Cluster Restricted) bootstrap from Cameron, Gelbach & Miller (2008), designed for inference with few clusters. Imposes the null hypothesis when generating bootstrap samples for better finite-sample performance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
EconometricResults
|
A fitted regression result from |
required |
data
|
DataFrame
|
The original data used for estimation. |
required |
cluster
|
str
|
Name of the cluster variable. |
required |
variable
|
str
|
Name of the coefficient to test (H0: beta = 0). |
required |
n_boot
|
int
|
Number of bootstrap replications (use odd number). |
999
|
weight_type
|
str
|
Bootstrap weight distribution:
- |
'rademacher'
|
seed
|
int
|
Random seed for reproducibility. |
None
|
alpha
|
float
|
Significance level for confidence interval. |
0.05
|
Returns:
| Type | Description |
|---|---|
dict
|
Keys:
- |
Examples:
>>> result = sp.regress("y ~ x1 + x2", data=df, cluster="state")
>>> boot = sp.wild_cluster_boot(result, data=df, cluster="state",
... variable="x1", n_boot=999)
>>> print(f"Bootstrap p = {boot['p_boot']:.4f}")
>>> print(f"Bootstrap CI = {boot['ci_boot']}")
Notes
The WCR bootstrap imposes H0: beta_variable = 0 when constructing bootstrap samples. Under few clusters, this yields much better size control than unrestricted wild bootstrap or cluster-robust SEs.
For G < 12, use weight_type='webb' per Webb (2014).
subcluster_wild_bootstrap ¶
subcluster_wild_bootstrap(data: DataFrame, y: str, x: List[str], cluster: str, subcluster: Optional[str] = None, test_var: Optional[str] = None, h0: float = 0.0, n_boot: int = 999, weight_type: str = 'webb', seed: Optional[int] = None, alpha: float = 0.05) -> Dict[str, Any]
Subcluster wild cluster bootstrap for few-treated-clusters.
When treatment varies within cluster (or when you want to re-expand
the randomization at a finer grain), sign-flips happen at the
sub-cluster level. SEs are still computed clustered at the coarse
cluster level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
x
|
list of str
|
|
required |
cluster
|
str
|
Primary cluster column (for SE computation). |
required |
subcluster
|
str or None
|
Finer grouping at which sign-flips occur. If |
None
|
test_var
|
str or None
|
Parameter to test; default last element of |
None
|
h0
|
float
|
Null value. |
0.0
|
n_boot
|
int
|
Bootstrap replications. |
999
|
weight_type
|
('rademacher', 'webb', 'mammen')
|
Distribution of sign flips. |
'rademacher'
|
seed
|
int
|
|
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
dict with ``p_boot``, ``ci_boot``, ``beta_hat``, ``t_stat``,
|
|
``se_cluster``, ``n_sub``, ``recommendation``.
|
|
wild_cluster_ci_inv ¶
wild_cluster_ci_inv(data: DataFrame, y: str, x: List[str], cluster: str, test_var: Optional[str] = None, n_boot: int = 999, weight_type: str = 'webb', alpha: float = 0.05, grid_size: int = 41, grid_span: float = 6.0, seed: Optional[int] = None) -> Dict[str, Any]
Confidence interval via bootstrap p-value inversion.
Runs the wild cluster bootstrap repeatedly across a grid of null
values and finds the boundary where the two-sided bootstrap p-value
equals alpha. This yields a WCR-inverted CI with better
small-cluster coverage than the percentile-t CI.
The grid is centered on the OLS point estimate with half-width
grid_span * se_cluster and grid_size evenly-spaced points.
Linear interpolation refines the boundary.
Shares the data / model arguments of :func:subcluster_wild_bootstrap;
the grid-specific parameters are documented below.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
grid_size
|
int
|
Number of null-value grid points to evaluate (odd preferred). |
41
|
grid_span
|
float
|
Half-width of the search grid in units of cluster-robust SE. |
6.0
|
Returns:
| Type | Description |
|---|---|
dict with ``ci``, ``p_grid`` (grid_size,), ``h0_grid``, ``beta_hat``,
|
|
``se_cluster``.
|
|
multiway_cluster_vcov ¶
multiway_cluster_vcov(X: ndarray, resid: ndarray, clusters: Union[ndarray, List[ndarray]], df_adjust: bool = True, n_params: Optional[int] = None, psd_correct: bool = True) -> ndarray
Compute N-way cluster-robust variance of an OLS coefficient vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(ndarray, shape(n, k))
|
Design matrix used in the regression. |
required |
resid
|
(ndarray, shape(n))
|
OLS residuals. |
required |
clusters
|
ndarray or list of ndarrays
|
One or more cluster variables, one per dimension. Non-numeric labels are supported. |
required |
df_adjust
|
bool
|
If True, apply the G/(G-1) * (n-1)/(n-k) CR1 finite-sample correction per component variance. If False, uses raw sandwich (useful when the caller has already degreed-freedom adjusted). |
True
|
n_params
|
int
|
Override for the |
None
|
psd_correct
|
bool
|
Project V onto PSD cone by zeroing negative eigenvalues. |
True
|
Returns:
| Type | Description |
|---|---|
(ndarray, shape(k, k))
|
Multiway-cluster-robust variance-covariance matrix. |
cluster_robust_se ¶
cluster_robust_se(X: ndarray, resid: ndarray, clusters: Union[ndarray, List[ndarray]], **kwargs) -> ndarray
Return cluster-robust standard errors (diagonal sqrt of vcov).
cr3_jackknife_vcov ¶
CR3 cluster-jackknife variance (delete-one-cluster).
Drops each cluster in turn, refits OLS on the reduced sample, and forms the empirical variance of the leave-one-out coefficient vector. Scales by (G-1)/G. Often more conservative than CR1/CR2 and useful when clusters are unbalanced (Bell-McCaffrey 2002).
Complexity is O(G · k²) refits; suitable when G <= a few hundred.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
X
|
(ndarray, shape(n, k))
|
|
required |
y
|
(ndarray, shape(n))
|
|
required |
cluster
|
(ndarray, shape(n))
|
|
required |
Returns:
| Type | Description |
|---|---|
(ndarray, shape(k, k))
|
|