Skip to content

statspai.fast

fast

statspai.fast — performance-instrumentation and native-kernel home.

Contents (v1.8 / Phase 1+):

  • :func:hdfe_bench — wall-time + correctness regression harness.
  • :func:demean — multi-way HDFE within-transform with Aitken acceleration, backed by a Rust kernel (NumPy fallback).

The module exposes building blocks that Phase 2+ (PPML / GLM HDFE), Phase 3 (sp.within), and Phase 5 (Polars/Arrow direct) sit on top of.

HDFEBenchResult dataclass

Output of :func:hdfe_bench.

Attributes:

Name Type Description
results list of dict

One row per (backend, n) combination with wall_time_s, relative_to_numpy, and correctness metadata.

backends dict of name -> _Backend

Availability / notes for each backend probed.

reference str

Name of the reference backend used for correctness comparison (always 'numpy').

DemeanInfo dataclass

Diagnostics returned from :func:demean.

Attributes:

Name Type Description
n int

Original row count (before any singleton drop).

n_kept int

Rows retained after singleton pruning.

n_dropped int

n - n_kept.

iters list of int

AP iterations per output column.

converged list of bool

Whether each column hit the tolerance threshold within max_iter.

max_dx list of float

Final max|dx| per column (post-stop).

keep_mask ndarray of bool, shape (n,)

True for rows that survived singleton pruning (input alignment).

backend str

"rust" or "numpy".

accel str

"aitken" or "none".

n_fe list of int

Surviving group cardinality per FE dimension (post-prune, post-densification).

FePoisResult dataclass

Outcome of :func:fepois.

Attributes mirror pyfixest / fixest naming so that downstream code that reaches for .coef() / .se() / .tidy() keeps working.

FeolsResult dataclass

Outcome of :func:feols.

Attribute names mirror :class:FePoisResult so downstream code using .coef() / .se() / .tidy() keeps working.

WithinTransformer

Reusable within-transform for a fixed FE structure.

Build once with sp.fast.within(data, fe), then call :meth:transform for any X you want residualised against the same FEs. Singleton pruning happens at construction; keep_mask surfaces which input rows survived.

Parameters:

Name Type Description Default
data DataFrame

Source DataFrame. Required when fe is given as a list of column names. Otherwise can be omitted.

None
fe DataFrame | ndarray (n, K) | list of column names | list of arrays

Fixed-effect specification. Strings, ints, categoricals are all accepted (factorised internally).

None
drop_singletons bool

Iteratively prune rows whose FE level appears once.

True
accel ('aitken', 'none')
"aitken"
max_iter int
1000
tol float
1e-8
accel_period int
5
backend ('auto', 'rust', 'numpy')
"auto"

transform

transform(X: Union[ndarray, Series, DataFrame], *, already_masked: bool = False) -> Tuple[ndarray, DemeanInfo]

Residualise X against the cached FE structure.

Returns (X_dem, info) where X_dem has shape (n_kept, ...) and info has per-column convergence stats.

transform_columns

transform_columns(data: DataFrame, columns: Sequence[str], *, already_masked: bool = False) -> DataFrame

Residualise specified DataFrame columns; return a new DataFrame.

Useful when feeding the within-residualised X into a downstream estimator that prefers DataFrames (DML, Lasso, etc.).

BootTestResult dataclass

Outcome of :func:boottest.

BootWaldResult dataclass

Outcome of :func:boottest_wald (joint hypothesis bootstrap).

WaldTestResult dataclass

Outcome of a cluster-robust Wald test under CR2 sandwich.

Mirrors the contents of R clubSandwich::Wald_test(...) so that cross-language replication can compare any field directly.

Attributes:

Name Type Description
test str

Test variant. "HTZ" only in v1; future variants ("HTA", "KZ", "Naive", "EDF") may be added.

q int

Number of restrictions (rank of R).

eta float

Pustejovsky-Tipton 2018 §3.2 moment-matching DOF (η).

F_stat float

Hotelling-T²-scaled F statistic (η - q + 1) / (η · q) · Q.

p_value float

Right-tail probability 1 - F_{q, η-q+1}.cdf(F_stat).

Q float

Raw cluster-robust Wald statistic (Rβ̂ - r)' V_R^{-1} (Rβ̂ - r).

R (ndarray, shape(q, k))

Restriction matrix.

r (ndarray, shape(q))

Null value.

V_R (ndarray, shape(q, q))

R · V^CR2 · R^T.

EventStudyResult dataclass

Outcome of :func:event_study.

FeolsBootstrapResult dataclass

Outcome of :func:feols_jax_bootstrap.

Attributes:

Name Type Description
coef Series

Point estimate from the original (un-resampled) data — same as :func:feols_jax would return.

se_boot Series

Bootstrap standard errors (per-coefficient std-dev across the n_boot resamples).

ci_lower, ci_upper Series

Percentile-method bootstrap confidence intervals at level 1 - ci_alpha.

boot_betas DataFrame

(n_boot, p) table of per-resample coefficient vectors. Useful for custom CI methods (BCa, basic bootstrap, etc.) and for permutation-style inference.

n_boot int
bootstrap_type str

'pairs' or 'cluster'.

backend str

Always 'statspai-jax-bootstrap'.

hdfe_bench

hdfe_bench(n_list: Sequence[int] = (1000, 10000, 100000), n_groups: int = 50, repeat: int = 3, seed: int = 0, atol: float = 1e-10) -> HDFEBenchResult

Benchmark group-demean wall time across available backends.

Parameters:

Name Type Description Default
n_list sequence of int

Sample sizes to time at.

(1000, 10000, 100000)
n_groups int

Number of FE groups for the synthetic DGP. Code uniformly over [0, n_groups).

50
repeat int

Number of timing repetitions per (backend, n). The reported wall_time_s is the minimum across repeats.

3
seed int

RNG seed for the synthetic DGP.

0
atol float

Absolute tolerance used when checking that each non-reference backend agrees with the NumPy reference.

1e-10

Returns:

Type Description
HDFEBenchResult

Results + backend availability metadata.

i

i(var: Union[Series, ndarray], ref: object = None, *, drop_first: bool = True, prefix: Optional[str] = None) -> DataFrame

One-hot indicators for var with an optional reference category dropped.

Mirrors fixest's i(var, ref=...).

Parameters:

Name Type Description Default
var Series or ndarray

Categorical variable (numeric or string). Cast via pd.Categorical.

required
ref scalar

Level to drop (the reference). If None and drop_first is True, drops the first level alphabetically/sortedly.

None
drop_first bool

If ref is None, whether to drop the first level (matches statsmodels / patsy convention).

True
prefix str

Column name prefix. Defaults to the Series name (or "i" for ndarrays).

None

Returns:

Type Description
DataFrame

Columns named "<prefix>::<level>" for each non-reference level. Float64 values (so they can stack with continuous regressors).

fe_interact

fe_interact(*cols: Union[Series, ndarray, Sequence]) -> ndarray

Combine K FE columns into a single int64 code per (c_1, ..., c_K) tuple.

Mirrors fixest's i^j operator: fe_interact(df['firm'], df['year']) produces one integer code per unique (firm, year) combination, suitable to drop into the fe slot of any HDFE estimator.

Returns:

Type Description
ndarray of int64

sw

sw(*specs: Iterable[str]) -> List[List[str]]

Stepwise: each spec emitted independently.

sw(['x1'], ['x2'], ['x1', 'x2']) → three separate variable lists.

csw

csw(*specs: Iterable[str]) -> List[List[str]]

Cumulative-stepwise: each spec is the union of itself and all prior.

csw(['x1'], ['x2'], ['x3'])[['x1'], ['x1', 'x2'], ['x1', 'x2', 'x3']].

crve

crve(X: ndarray, residuals: ndarray, cluster: ndarray, *, weights: Optional[ndarray] = None, bread: Optional[ndarray] = None, type: str = 'cr1', extra_df: int = 0) -> ndarray

Cluster-robust variance-covariance matrix for OLS / WLS.

Computes the sandwich

V = (X'WX)^{-1} M (X'WX)^{-1}

where the meat M depends on the requested type:

  • "cr1" (Liang-Zeger): M = Σ_g (X_g' u_g)(X_g' u_g)', then scaled by the small-sample factor c = (G/(G-1)) * (n-1)/(n - k - extra_df).
  • "cr2" (Bell-McCaffrey): replaces the cluster scores with X_g' (I_{n_g} - H_gg)^{-1/2} u_g, where the cluster-leverage matrix H_gg = X_g (X' W X)^{-1} X_g' diag(w_g) adjusts each cluster for the rest of the sample. Recommended for designs with few clusters (G < 50). No additional small-sample multiplier.
  • "cr3" (jackknife): same outer sum as CR1, scaled by (G-1)/G. Closer to leave-one-cluster-out variance; reduces over-rejection in few-cluster settings. extra_df is ignored for CR3.
HDFE callers must pass extra_df

When X has been FE-residualised (sp.fast.event_study, DML / Lasso pipelines built on sp.fast.within), the small-sample factor's denominator must subtract the absorbed FE rank in addition to k = X.shape[1]. Otherwise CR1 SEs are systematically too small (matches the reghdfe / fixest convention). Pass extra_df = sum(G_k - 1) over the K FE dimensions to recover the correct factor. extra_df is only consumed by CR1.

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))

Regressors (already FE-residualised if applicable).

required
residuals (ndarray, shape(n))

Model residuals.

required
cluster (ndarray, shape(n))

Cluster identifiers (any dtype factorisable by pd.factorize).

required
weights (ndarray, shape(n))

Observation weights. Default: 1.

None
bread (ndarray, shape(k, k))

Pre-computed inverse Hessian / (X'WX)^{-1}. Saves one solve.

None
type ('cr1', 'cr2', 'cr3')
"cr1"
extra_df int

Additional residual-DOF charge for CR1's small-sample factor. Ignored for CR2 / CR3.

0
References

Bell, R. M., McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28(2), 169–179.

boottest

boottest(X: ndarray, y: ndarray, cluster: ndarray, *, null_coef: int = 0, null_value: float = 0.0, weights: str = 'rademacher', B: int = 9999, seed: Optional[int] = None, obs_weights: Optional[ndarray] = None, extra_df: int = 0) -> BootTestResult

Wild cluster bootstrap of H0: β[null_coef] = null_value (Davidson-Flachaire / MacKinnon 2019).

Algorithm
  1. Fit the unrestricted model β_full = (X'WX)^{-1} X'Wy.
  2. Compute the t-statistic of β[null_coef] using CR1 cluster SE.
  3. Fit the restricted model β_R that imposes β[null_coef] = null_value (closed-form via partialling out the constrained column).
  4. For b = 1..B: a. Draw cluster-level weights v_g ~ {Rademacher | Webb6}. b. Form y_b = X β_R + v_{g(i)} u_R,i where u_R = y - X β_R. c. Refit OLS on y_b → β_b; compute t_b under the same CR1 formula.
  5. p-value = 2 * min(P(t_b > t_obs), P(t_b <= t_obs)) clipped to [0, 1].

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))
required
y (ndarray, shape(n))
required
cluster (ndarray, shape(n))

Cluster ids.

required
null_coef int

Zero-based index of the coefficient under H0.

0
null_value float

Hypothesised value (default 0).

0.0
weights ('rademacher', 'webb')

Wild-weight distribution.

"rademacher"
B int

Number of bootstrap replications.

9999
seed int

RNG seed for reproducibility.

None
obs_weights (ndarray, shape(n))

Observation weights for WLS. Default uniform.

None
extra_df int

Additional residual-DOF charge in the CR1 small-sample factor (used both for the observed t-stat and uniformly across the bootstrap replicas). Pass Σ(G_k - 1) when X is FE-residualised — same convention as :func:crve. Without it, bootstrap p-values for HDFE designs are biased downward because the t-pivot uses an under-corrected SE.

0

Returns:

Type Description
BootTestResult

boottest_wald

boottest_wald(X: ndarray, y: ndarray, cluster: ndarray, R: ndarray, r: Optional[ndarray] = None, *, weights: str = 'rademacher', B: int = 9999, seed: Optional[int] = None, obs_weights: Optional[ndarray] = None, extra_df: int = 0) -> BootWaldResult

Wild cluster bootstrap of the joint linear hypothesis R β = r.

Computes the Wald statistic W = (R β - r)' (R V R')^{-1} (R β - r) using the CR1 cluster-robust variance, then bootstraps under the null by imposing the constraint R β_R = r on the restricted fit and drawing wild cluster weights as in :func:boottest. The p-value is one-sided on the upper tail of the Wald distribution.

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))
required
y (ndarray, shape(n))
required
cluster (ndarray, shape(n))

Cluster ids.

required
R (ndarray, shape(q, k))

Restriction matrix; full row rank required (q ≤ k).

required
r (ndarray, shape(q))

Right-hand side. Default zero vector.

None
weights ('rademacher', 'webb')
"rademacher"
B int
9999
seed int
None
obs_weights ndarray
None
extra_df int

Additional residual-DOF charge in the CR1 small-sample factor (used both for the observed Wald and uniformly across bootstrap replicas). Pass Σ(G_k - 1) for FE-residualised X — same convention as :func:crve and :func:boottest.

0

Returns:

Type Description
BootWaldResult

cluster_dof_bm

cluster_dof_bm(X: ndarray, cluster: ndarray, *, contrast: ndarray, weights: Optional[ndarray] = None, bread: Optional[ndarray] = None) -> float

Bell-McCaffrey / Imbens-Kolesar Satterthwaite DOF for a single linear contrast under CR2 cluster-robust inference.

For testing H0: c'β = c0 with the CR2 sandwich variance, the cluster-robust t-statistic is approximated by t_{ν_BM} where

ν_BM = (Σ_g λ_g)² / Σ_g λ_g²

and λ_g = c' V_g c is the cluster-g contribution to the CR2 sandwich under IID null errors:

V_g = bread X_g'^T W_g (I - H_gg)^{-1} W_g X_g bread,
H_gg = X_g · bread · X_g'^T · diag(w_g).

Equivalently, with A_g = (I - H_gg)^{-1/2} and v = X_g · bread · c, λ_g = ||A_g · W_g · v||² — the form used by this implementation for numerical stability (eigendecomposition of the symmetrised I - 0.5(H + H') with eigenvalue floor 1e-12, matching :func:crve's CR2 path).

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))

Regressors (already FE-residualised if applicable).

required
cluster (ndarray, shape(n))

Cluster identifiers.

required
contrast (ndarray, shape(k))

Linear restriction c being tested.

required
weights (ndarray, shape(n))

Observation weights. Default 1.

None
bread (ndarray, shape(k, k))

Pre-computed inverse Hessian / (X'WX)^{-1}.

None

Returns:

Type Description
float

ν_BM, bounded between 1 and G - 1 for well-conditioned problems. Use this in place of the naive G - 1 cluster DOF when calling scipy.stats.t for finite-sample inference.

Notes

Only ships the single-contrast Satterthwaite DOF. Multi-contrast (Wald / matrix Satterthwaite) DOF is the natural follow-up and lives next to :func:boottest_wald in scope.

Convention vs clubSandwich

Both this implementation and R's clubSandwich::coef_test(..., test="Satterthwaite") are "Bell-McCaffrey-style" Satterthwaite approximations. The CR2 variance matches clubSandwich to the visible precision (verified separately). The DOF formula has documented implementation variants:

  • StatsPAI / BM 2002 simplified: ν = (Σ λ_g)² / Σ λ_g² — exactly what falls out of 2 [E(V̂)]² / Var(V̂) under a Gaussian IID null on residuals.
  • clubSandwich (Pustejovsky-Tipton 2018): adds cross-cluster adjustment terms involving the working covariance Φ over the full residual vector.

On a typical panel the two agree within ~5–10%. clubSandwich is slightly more conservative (smaller ν); StatsPAI's value is between that and the naive G - 1. Use boottest for tight finite-sample inference if the difference matters.

References

Bell, R. M., McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28(2), 169–179.

Imbens, G. W., Kolesár, M. (2016). Robust standard errors in small samples: some practical advice. Review of Economics and Statistics 98(4), 701–712.

Pustejovsky, J. E., Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics 36(4), 672–683.

cluster_dof_wald_bm

cluster_dof_wald_bm(X: ndarray, cluster: ndarray, *, R: ndarray, weights: Optional[ndarray] = None, bread: Optional[ndarray] = None) -> float

Bell-McCaffrey / Pustejovsky-Tipton matrix Satterthwaite DOF for a multi-restriction Wald test under CR2 cluster-robust inference.

Generalises :func:cluster_dof_bm from a 1-D contrast to a q-row restriction matrix R. For testing H0: R β = r with the CR2 sandwich variance, the cluster-robust Wald statistic W = (Rβ̂ - r)' (R V_CR2 R')^{-1} (Rβ̂ - r) is approximated by a (scaled) F_{q, ν_W} distribution where

ν_W = [tr(M)]² / tr(M²)

and M = Σ_g E_g E_g' with cluster contributions

E_g = R · bread · X_g'^T · A_g · sqrt(W_g)    (q × n_g)
A_g = (I_{n_g} - H_gg)^{-1/2}
H_gg = X_g · bread · X_g'^T · diag(w_g).

For q = 1 this collapses to ν = (Σ λ_g)² / Σ λ_g² — bit-identical to :func:cluster_dof_bm.

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))

Regressors (already FE-residualised if applicable).

required
cluster (ndarray, shape(n))

Cluster identifiers.

required
R (ndarray, shape(q, k))

Restriction matrix; full row rank required (q ≤ k).

required
weights (ndarray, shape(n))

Observation weights. Default 1.

None
bread (ndarray, shape(k, k))

Pre-computed inverse Hessian / (X'WX)^{-1}.

None

Returns:

Type Description
float

ν_W. For q ≥ 1, bounded below by q and above by q · (G - q) / G × something — in practice well-behaved problems have ν_W in (q, G - q + 1).

Notes

Not equivalent to clubSandwich::Wald_test(..., test="HTZ"). The HTZ test uses a Hotelling-T² approximation with the more elaborate Pustejovsky-Tipton 2018 §3.2 moment-matching DOF, which is typically much smaller (more conservative) than the BM 2002 §3 simplified (Σ tr)² / Σ ||·||_F² formula implemented here. On moderate panels the two can differ by 50–100%.

For the clubSandwich-equivalent HTZ DOF, see :func:cluster_dof_wald_htz (matches R clubSandwich::Wald_test(test="HTZ") to rtol < 1e-8). For the strictest small-G inference prefer :func:boottest_wald, which avoids any asymptotic approximation.

References

Bell, R. M., McCaffrey, D. F. (2002). Bias reduction in standard errors for linear regression with multi-stage samples. Survey Methodology 28(2), 169–179.

Pustejovsky, J. E., Tipton, E. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics 36(4), 672–683.

cluster_dof_wald_htz

cluster_dof_wald_htz(X: ndarray, cluster: ndarray, *, R: ndarray, weights: Optional[ndarray] = None, bread: Optional[ndarray] = None) -> float

clubSandwich-equivalent HTZ Wald DOF (Pustejovsky-Tipton 2018 §3.2).

For testing H0: R β = r with the CR2 sandwich variance, the cluster-robust Wald statistic Q = (Rβ̂ - r)' V_R^{-1} (Rβ̂ - r) is approximated by Hotelling's T² with denominator DOF η such that (η - q + 1) / (η · q) · Q ~ F(q, η - q + 1).

The DOF η is computed by moment-matching the first two moments of V_R = R V^CR2 R^T under the working covariance Φ = I (OLS+CR2 path; clubSandwich's default).

Equivalent to R::clubSandwich::Wald_test(test="HTZ")$df_denom to rtol < 1e-8 on the verified fixtures (tests/fixtures/htz_clubsandwich.json).

Parameters:

Name Type Description Default
X (ndarray, shape(n, k))

Regressors (already FE-residualised if applicable).

required
cluster (ndarray, shape(n))

Cluster identifiers.

required
R (ndarray, shape(q, k))

Restriction matrix; full row rank required (q ≤ k).

required
weights (ndarray, shape(n))

Observation weights. Must be all 1 in v1 (Φ = I working cov).

None
bread (ndarray, shape(k, k))

Pre-computed inverse Hessian / (X'WX)^{-1}.

None

Returns:

Type Description
float

η, the HTZ moment-matching DOF.

Raises:

Type Description
ValueError

If R is rank-deficient, G ≤ q, or η ≤ q − 1 (Hotelling-T² scaling diverges).

References

Pustejovsky and Tipton (2018) — see pustejovsky2018small in paper.bib for the canonical citation.

cluster_wald_htz

cluster_wald_htz(X: ndarray, residuals: ndarray, cluster: ndarray, *, R: ndarray, beta: ndarray, r: Optional[ndarray] = None, weights: Optional[ndarray] = None, bread: Optional[ndarray] = None) -> WaldTestResult

clubSandwich-equivalent HTZ Wald test under CR2 cluster-robust sandwich.

Tests H0: R β = r with the Pustejovsky-Tipton 2018 §3.2 moment-matching DOF and Hotelling-T² scaling. Equivalent to R::clubSandwich::Wald_test(fit, constraints=R, vcov="CR2", test="HTZ") to rtol < 1e-8 on the verified fixtures.

torch_device_info

torch_device_info() -> str

One-line diagnostic mirroring :func:statspai.fast.jax_device_info.

Safe to call even when torch is missing — returns a clear status string instead of raising. Useful for CLI sp doctor-style health checks.

demean_polars

demean_polars(df: Any, X_cols: Sequence[str], fe_cols: Sequence[str], **kwargs: Any) -> Tuple[ndarray, DemeanInfo]

Within-transform a polars DataFrame's columns by HDFE.

Parameters:

Name Type Description Default
df DataFrame or LazyFrame
required
X_cols list of str

Continuous columns to residualise.

required
fe_cols list of str

Fixed-effect columns. Any dtype factorisable by pd.factorize.

required
**kwargs forwarded to :func:`sp.fast.demean`.
{}

Returns:

Type Description
(X_dem, info) : same as :func:`sp.fast.demean`.

fepois_polars

fepois_polars(df: Any, formula: str, **kwargs: Any) -> FePoisResult

Fit a Poisson HDFE model directly on a polars DataFrame.

Internally collects the LazyFrame (if needed), materialises the referenced columns as a pandas DataFrame, and dispatches to sp.fast.fepois. Materialisation is on a projected subset of columns — we don't pull the whole frame into pandas.