Skip to content

statspai.panel

panel

Unified panel regression module for StatsPAI.

Provides a single entry point panel() covering all panel estimators:

Static models — FE, RE, Between, First Difference, Pooled OLS, Two-way FE Correlated RE — Mundlak (1978), Chamberlain (1982) Dynamic panel — Arellano-Bond, Blundell-Bond (System GMM) HDFE absorption — high-dimensional fixed-effects OLS (Stata's reghdfe / R's fixest)

All results return PanelResults with built-in diagnostics:

result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t') result.hausman_test() # FE vs RE result.bp_lm_test() # Pooled vs RE result.f_test_effects() # Joint significance of FE result.compare('re') # Side-by-side comparison

References

Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data. Mundlak, Y. (1978). "On the Pooling of Time Series and Cross Section Data." Chamberlain, G. (1982). "Multivariate Regression Models for Panel Data." Arellano, M. and Bond, S. (1991). "Some Tests of Specification for Panel Data." Blundell, R. and Bond, S. (1998). "Initial Conditions and Moment Restrictions." Hausman, J.A. (1978). "Specification Tests in Econometrics." Breusch, T.S. and Pagan, A.R. (1980). "The Lagrange Multiplier Test." Pesaran, M.H. (2004). "General Diagnostic Tests for Cross Section Dependence." Correia, S. (2017). "Linear Models with High-Dimensional Fixed Effects."

PanelResults

Bases: EconometricResults

Panel regression results with built-in diagnostics.

Extends EconometricResults with panel-specific tests that can be called directly on the result object:

result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t') result.hausman_test() # FE vs RE result.bp_lm_test() # Pooled vs RE (Breusch-Pagan LM) result.f_test_effects() # Joint significance of entity FE result.compare('re') # Compare with RE side by side

hausman_test

hausman_test(alpha: float = 0.05) -> Dict[str, Any]

Hausman (1978) specification test: FE vs RE.

Under H0 (RE consistent), both FE and RE are consistent but RE is efficient. Under H1, only FE is consistent.

Returns:

Type Description
dict

'statistic', 'df', 'pvalue', 'recommendation', 'interpretation'

bp_lm_test

bp_lm_test() -> Dict[str, Any]

Breusch-Pagan (1980) Lagrange Multiplier test for random effects.

Tests H0: Var(alpha_i) = 0 (Pooled OLS is appropriate) vs H1: Var(alpha_i) > 0 (Random Effects needed).

Returns:

Type Description
dict

'statistic', 'df', 'pvalue', 'recommendation', 'interpretation'

f_test_effects

f_test_effects() -> Dict[str, Any]

F-test for joint significance of entity fixed effects.

Tests H0: all alpha_i = 0 (entity effects not needed).

Returns:

Type Description
dict

'statistic', 'df1', 'df2', 'pvalue', 'interpretation'

pesaran_cd_test

pesaran_cd_test() -> Dict[str, Any]

Pesaran (2004) CD test for cross-sectional dependence in residuals.

Returns:

Type Description
dict

'statistic', 'pvalue', 'interpretation'

plot

plot(type: str = 'coef', **kwargs)

Generate panel-specific plots.

Parameters:

Name Type Description Default
type str

'coef' — Coefficient forest plot (default) 'effects' — Distribution of entity fixed effects 'residuals' — Residual diagnostics (2x2 grid) 'hausman' — Visual FE vs RE comparison

'coef'
**kwargs

Passed to the underlying plot function.

{}

Returns:

Type Description
(fig, ax)

plot_effects

plot_effects(**kwargs)

Shortcut for .plot(type='effects'). Distribution of entity FE.

plot_residuals

plot_residuals(**kwargs)

Shortcut for .plot(type='residuals'). Residual diagnostics (2x2).

plot_hausman

plot_hausman(**kwargs)

Shortcut for .plot(type='hausman'). Visual FE vs RE comparison.

compare

compare(method: str, **kwargs) -> PanelCompareResults

Re-estimate with a different method and compare side by side.

Parameters:

Name Type Description Default
method str

Alternative method to compare against.

required

Returns:

Type Description
PanelCompareResults

Side-by-side comparison with diagnostics.

PanelCompareResults

Side-by-side comparison of two panel models.

plot

plot(variables: Optional[List[str]] = None, **kwargs)

Side-by-side coefficient comparison plot.

Returns:

Type Description
(fig, ax)

PanelRegression

Deprecated: use panel() directly. Kept for backward compatibility.

Absorber

Reusable HDFE demean operator.

Build once from a DataFrame's FE columns; reuse demean to sweep any outcome / regressor vector or matrix. Useful when fitting many models that share the same absorbing FEs (e.g. event-study coefficient paths).

Parameters:

Name Type Description Default
fe_data DataFrame or ndarray(n, K)

FE columns. Must have no NaN.

required
weights ndarray(n)

Observation weights. If given, weighted means are used.

None
drop_singletons bool

If True, singleton observations (FE groups of size 1) are pruned before building the absorber. keep_mask stores the surviving rows.

True
tol float

Convergence threshold on max |dx| per iteration.

1e-8
maxiter int

Maximum alternating-projection iterations.

10000
accelerate bool

Enable Irons-Tuck Δ² acceleration.

True
solver ('map', 'lsmr', 'lsqr')

Within-transformation backend. "map" uses alternating projections with Irons-Tuck acceleration (default, typically fastest on well-conditioned panels). "lsmr" / "lsqr" delegate to scipy.sparse.linalg.lsmr / lsqr on the sparse FE design matrix — more robust for ill-conditioned or highly nested FE structures. See the migration guide for how this maps to pyreghdfe.

"map"

Attributes:

Name Type Description
keep_mask ndarray of bool

Rows retained after singleton pruning. Callers must apply this mask to y, X, and any weights before passing to demean.

n_kept int

Number of surviving observations.

n_dropped int

Number of singleton observations removed.

n_fe list of int

Number of groups per FE dimension (post-prune).

demean

demean(x: ndarray, copy: bool = True, already_masked: bool = False) -> ndarray

Within-transform x by sweeping out all absorbed FEs.

Parameters:

Name Type Description Default
x (ndarray, shape(n) or (n, p))

Variable(s) to residualize. n must equal either the full input size (then keep_mask is applied) or n_kept (when already_masked=True).

required
copy bool

If True, operate on a copy; if False, modify x in place. Callers passing fresh arrays can set False to save memory.

True
already_masked bool

Skip application of keep_mask.

False

Returns:

Type Description
ndarray

Residualized x with shape (n_kept,) or (n_kept, p).

residualize

residualize(x: ndarray, copy: bool = True) -> ndarray

Alias for demean — returns FE-residualized version of x.

FEOLSResult dataclass

Result of sp.feols().

Attributes:

Name Type Description
params Series

Coefficient estimates indexed by regressor name.

std_errors Series

Standard errors indexed by regressor name.

vcov ndarray

Variance-covariance matrix of the coefficients.

tvalues, pvalues Series
conf_int_lower, conf_int_upper Series
residuals ndarray

In-sample residuals (after FE absorption).

fitted_within ndarray

Predicted values from X β (excludes FE contribution).

n_obs int
n_singletons_dropped int
n_fe List[int]

Number of groups per absorbed FE dimension.

dof_fe int

Degrees of freedom consumed by the FEs.

df_resid int
r2_within float
se_type str

'iid' | 'cluster' | 'multiway_cluster' | 'wild_cluster'

cluster_info dict

Metadata (cluster names, counts).

formula str
absorber Absorber

Reusable absorber (includes keep_mask to subset rows).

panel_compare

panel_compare(data: DataFrame, formula: str, entity: str, time: str, methods: Optional[List[str]] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> DataFrame

Estimate the same model with multiple methods and compare.

Parameters:

Name Type Description Default
data DataFrame
required
formula str
required
entity str
required
time str
required
methods list of str

Methods to compare. Default: ['pooled', 'fe', 're', 'twoway', 'mundlak'].

None
robust str

Passed to each panel() call.

'nonrobust'
cluster str

Passed to each panel() call.

'nonrobust'

Returns:

Type Description
DataFrame

Comparison table with coefficients, SEs, and diagnostics.

Examples:

>>> comparison = sp.panel_compare(
...     df, "wage ~ edu + exp", entity='id', time='year',
...     methods=['pooled', 'fe', 're', 'twoway', 'mundlak']
... )
>>> print(comparison)

balance_panel

balance_panel(data: DataFrame, entity: str, time: str) -> DataFrame

Balance a panel by keeping only units observed in every time period.

Parameters:

Name Type Description Default
data DataFrame

Panel data in long format.

required
entity str

Entity (unit) identifier column.

required
time str

Time period column.

required

Returns:

Type Description
DataFrame

Balanced panel (same column order, sorted by entity then time).

Examples:

>>> import statspai as sp
>>> balanced = sp.balance_panel(df, entity='id', time='year')
>>> balanced.groupby('id')['year'].count().nunique()  # all same count
1

panel_logit

panel_logit(data: DataFrame, y: str, x: list, id: str = 'id', time: str = 'time', method: str = 'fe', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: str = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults

Panel logit model.

Parameters:

Name Type Description Default
data DataFrame

Panel data in long format.

required
y str

Binary dependent variable (0/1).

required
x list of str

Regressors.

required
id str

Unit and time identifier columns.

'id'
time str

Unit and time identifier columns.

'id'
method str

'fe' (conditional FE logit), 're' (random effects), 'cre' (Mundlak).

'fe'
n_quadrature int

Gauss-Hermite quadrature points (RE/CRE only).

12
robust str

'nonrobust' or 'robust'.

'nonrobust'
cluster str or None

Column for cluster-robust SEs.

None
maxiter int

Maximum optimizer iterations.

200
tol float

Gradient tolerance.

1e-08
alpha float

Significance level for confidence intervals.

0.05

Returns:

Type Description
EconometricResults

panel_probit

panel_probit(data: DataFrame, y: str, x: list, id: str = 'id', time: str = 'time', method: str = 're', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: str = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults

Panel probit model.

Parameters:

Name Type Description Default
data DataFrame

Panel data in long format.

required
y str

Binary dependent variable (0/1).

required
x list of str

Regressors.

required
id str

Unit and time identifier columns.

'id'
time str

Unit and time identifier columns.

'id'
method str

're' (random effects) or 'cre' (Mundlak). FE probit not supported (incidental parameters problem).

're'
n_quadrature int

Gauss-Hermite quadrature points.

12
robust str

'nonrobust' or 'robust'.

'nonrobust'
cluster str or None

Column for cluster-robust SEs.

None
maxiter int

Maximum optimizer iterations.

200
tol float

Gradient tolerance.

1e-08
alpha float

Significance level for confidence intervals.

0.05

Returns:

Type Description
EconometricResults

plot_within_between

plot_within_between(data: DataFrame, variables: List[str], entity: str, ax=None, figsize: tuple = (8, 6), color: str = '#2C3E50', title: Optional[str] = None) -> Tuple

Bar chart comparing within vs between variation for each variable.

Helps assess which estimator is appropriate: - High between / low within → BE may be efficient - High within / low between → FE captures the action - Similar → both capture similar information

Parameters:

Name Type Description Default
data DataFrame
required
variables list of str

Variables to decompose.

required
entity str

Entity identifier column.

required
ax matplotlib Axes
None
figsize tuple
(8, 6)
color str
'#2C3E50'
title str
None

Returns:

Type Description
(fig, ax)

demean

demean(x: ndarray, fe: Union[DataFrame, ndarray], weights: Optional[ndarray] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, solver: str = 'map') -> Tuple[ndarray, ndarray]

Return the within-transformed x and the singleton keep mask.

Convenience wrapper around :class:Absorber. See Absorber for the solver kwarg semantics.

absorb_ols

absorb_ols(y: ndarray, X: ndarray, fe: Union[DataFrame, ndarray], weights: Optional[ndarray] = None, cluster: Optional[Union[ndarray, List[ndarray]]] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, return_absorber: bool = False, solver: str = 'map') -> dict

OLS with absorbed high-dimensional fixed effects (reghdfe-style).

Solves y = X β + Σ_k α_{g_k} + ε by sweeping out the FEs from both y and X (Frisch-Waugh-Lovell) and running OLS on residuals.

Parameters:

Name Type Description Default
y (ndarray, shape(n))
required
X (ndarray, shape(n, p))

Regressors excluding the absorbed FEs and the constant (the constant is absorbed by any FE dimension).

required
fe DataFrame or ndarray(n, K)

Fixed-effect columns.

required
weights ndarray(n)

Observation weights.

None
cluster ndarray or list of ndarrays

One-way or multi-way cluster variables for robust SEs. If provided, returns cluster-robust SEs (one-way: Liang-Zeger sandwich; multi-way: inclusion-exclusion Cameron-Gelbach-Miller).

None
drop_singletons bool
True
tol (float, int)

Demean convergence controls.

1e-08
maxiter (float, int)

Demean convergence controls.

1e-08
return_absorber bool

If True, also return the Absorber object for reuse.

False
solver ('map', 'lsmr', 'lsqr')

Within-transformation backend. See :class:Absorber.

"map"

Returns:

Type Description
dict with keys:

coef (p,), se (p,), vcov (p,p), resid (n_kept,), n (n_kept), df_resid, dof_fe, r2_within, n_singletons_dropped, converged, iters, absorber (if requested)

hdfe_feols

hdfe_feols(formula: str, data: DataFrame, *, weights: Optional[Union[str, ndarray]] = None, cluster: Optional[Union[str, List[str]]] = None, se_type: Optional[str] = None, wild: bool = False, wild_n_boot: int = 999, wild_weight_type: str = 'webb', wild_seed: Optional[int] = None, alpha: float = 0.05, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000) -> FEOLSResult

reghdfe-style OLS with high-dimensional fixed effects.

Parameters:

Name Type Description Default
formula str

"y ~ x1 + x2 | fe1 + fe2 + fe3". The | fe... part is optional.

required
data DataFrame
required
weights str or ndarray

Observation weights. Column name or raw array.

None
cluster str or list

One-way or multi-way cluster column(s).

None
se_type ('iid', 'cluster', 'multiway_cluster', 'wild_cluster')

Override automatic inference of SE type. Usually inferred from cluster / wild.

'iid'
wild bool

If True (and cluster is given), return wild-cluster-bootstrap p-values / CIs alongside classical cluster SE. Applied variable- by-variable. Only supported with a single cluster column.

False
wild_n_boot int

Bootstrap replications.

999
wild_weight_type ('rademacher', 'webb', 'mammen')
'rademacher'
wild_seed int
None
alpha float
0.05
drop_singletons bool
True
tol convergence controls for the absorber.
1e-08
maxiter convergence controls for the absorber.
1e-08

Returns:

Type Description
FEOLSResult

Examples:

>>> import statspai as sp
>>> res = sp.feols("lwage ~ educ + exper | firm + year", data=df,
...                cluster='firm')
>>> print(res.summary())

panel

panel(data: Any = None, formula: Optional[str] = None, entity: Optional[str] = None, time: Optional[str] = None, *, method: str = 'fe', **kwargs: Any)

Unified panel-regression dispatcher.

Parameters:

Name Type Description Default
data DataFrame
None
formula str

Patsy-style outcome ~ regressors specification.

None
entity str

Cross-section identifier column.

None
time str

Time identifier column.

None
method str

Estimator family:

  • Static: fe / fixed, re / random, be / between, fd / first_difference, pooled / pooled_ols, twoway / two_way.
  • Correlated random effects: mundlak, chamberlain.
  • Dynamic GMM: ab / arellano_bond / gmm, system / blundell_bond / system_gmm.
  • HDFE absorption: hdfe / feols / reghdfe (high-dimensional fixed-effects OLS).
``'fe'``
**kwargs Any

Forwarded to the chosen estimator. Classical methods accept robust / cluster / weights / alpha / balance / lags / gmm_lags. HDFE accepts cluster / se_type / wild / alpha etc.

{}

Returns:

Type Description
Result object whose type depends on ``method``.

Examples:

>>> # Default: within (FE) estimator
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year')
>>> # Random effects with Hausman test
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year',
...              method='re')
>>> # Friendly alias (case insensitive)
>>> r = sp.panel(df, "wage ~ exp", entity='id', time='year',
...              method='Fixed')
>>> # Arellano-Bond dynamic panel
>>> r = sp.panel(df, "wage ~ wage_lag + edu", entity='id', time='year',
...              method='gmm', lags=1)
>>> # HDFE absorption (multiple FEs in formula)
>>> r = sp.panel(df, "wage ~ exp | id + year", method='hdfe',
...              cluster='id')