Performance: an honest cross-library benchmark¶
StatsPAI's position on performance is simple: publish where we win and where we lose, on one machine, with the numbers proving every backend computes the same estimator. Selectively quoting only the regimes where a library is fast is the cheapest way to lose an econometrician's trust, so the benchmark that backs this page is built to make that impossible — it always prints the rows where StatsPAI is slower.
The harness¶
benchmarks/cross_library/hdfe_benchmark.py
runs a single two-way (unit × time) fixed-effects OLS on one generated panel and
times four backends head-to-head:
| backend | what it is |
|---|---|
sp.absorb_ols |
StatsPAI's own HDFE kernel (alternating projections) |
sp.feols |
StatsPAI's feols API (pyfixest-backed) |
pyfixest |
pyfixest directly |
R fixest |
R fixest::feols via Rscript |
Design choices that keep it trustworthy:
- Correctness gates speed. Every row carries the estimated slope and SE, and
the report states the maximum relative disagreement against a reference. A
fast wrong answer is reported as
✗ DISAGREE, never as a win. - Median + IQR over repeats, after a warm-up. One untimed call absorbs JIT/import cost; the reported time is the median with the inter-quartile range, so warm-up and noise cannot masquerade as signal.
- No silent scope. A missing baseline (no
pyfixest, noRscript+fixest) is recorded asskippedwith the reason — never dropped invisibly. - Provenance embedded. Library versions, Python, platform, and CPU count are written into every result file.
Reproduce it yourself — and please do before quoting any number:
python benchmarks/cross_library/hdfe_benchmark.py \
--scales 10000,100000,1000000 --repeats 7 \
--json benchmarks/cross_library/results.json \
--markdown benchmarks/cross_library/RESULTS.md
What we see (and what we don't)¶
On a representative balanced two-way panel (Apple-silicon arm64, the run checked
into benchmarks/cross_library/RESULTS.md):
- All four backends agree to ~1e-11 on the slope and SE. This is the precondition for any timing claim to be meaningful.
- At n ≈ 10k–100k balanced panels, StatsPAI's native
sp.absorb_olskernel is the fastest backend, with Rfixestandpyfixesta small multiple behind.sp.feolscarries a thin wrapper overhead on top of pyfixest, so it trails pyfixest slightly — that overhead is reported, not hidden.
What this run does not establish, and where we are explicitly not claiming a win:
- Very large n (≥ 1M) and high-cardinality fixed effects. R
fixestis a mature, heavily-optimised C++ engine; at large scale it is competitive with or faster than the StatsPAI/pyfixest paths in our broader profiling. Treat the small-to-medium-panel lead above as exactly that — a small-to-medium-panel lead — until you have re-run the harness at your own target scale. - Unbalanced panels, weights, IV, and clustered VCOV at scale. The harness covers the canonical two-way FE OLS case; other regimes need their own runs.
- Memory. Peak RSS is recorded best-effort but is process-cumulative, so treat it as indicative, not authoritative.
The honest summary: StatsPAI is numerically in lockstep with the reference
implementations, and its native kernel is genuinely fast on small-to-medium
balanced panels — but R fixest remains the one to beat at the largest scales,
and we say so rather than quietly choosing scales where we look best.