statspai.panel¶
panel ¶
Unified panel regression module for StatsPAI.
Provides a single entry point panel() covering all panel estimators:
Static models — FE, RE, Between, First Difference, Pooled OLS, Two-way FE Correlated RE — Mundlak (1978), Chamberlain (1982) Dynamic panel — Arellano-Bond, Blundell-Bond (System GMM) HDFE absorption — high-dimensional fixed-effects OLS (Stata's reghdfe / R's fixest)
All results return PanelResults with built-in diagnostics:
result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t') result.hausman_test() # FE vs RE result.bp_lm_test() # Pooled vs RE result.f_test_effects() # Joint significance of FE result.compare('re') # Side-by-side comparison
References
Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data. Mundlak, Y. (1978). "On the Pooling of Time Series and Cross Section Data." Chamberlain, G. (1982). "Multivariate Regression Models for Panel Data." Arellano, M. and Bond, S. (1991). "Some Tests of Specification for Panel Data." Blundell, R. and Bond, S. (1998). "Initial Conditions and Moment Restrictions." Hausman, J.A. (1978). "Specification Tests in Econometrics." Breusch, T.S. and Pagan, A.R. (1980). "The Lagrange Multiplier Test." Pesaran, M.H. (2004). "General Diagnostic Tests for Cross Section Dependence." Correia, S. (2017). "Linear Models with High-Dimensional Fixed Effects."
PanelResults ¶
Bases: EconometricResults
Panel regression results with built-in diagnostics.
Extends EconometricResults with panel-specific tests that can be called directly on the result object:
result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t') result.hausman_test() # FE vs RE result.bp_lm_test() # Pooled vs RE (Breusch-Pagan LM) result.f_test_effects() # Joint significance of entity FE result.compare('re') # Compare with RE side by side
hausman_test ¶
Hausman (1978) specification test: FE vs RE.
Under H0 (RE consistent), both FE and RE are consistent but RE is efficient. Under H1, only FE is consistent.
Returns:
| Type | Description |
|---|---|
dict
|
'statistic', 'df', 'pvalue', 'recommendation', 'interpretation' |
bp_lm_test ¶
Breusch-Pagan (1980) Lagrange Multiplier test for random effects.
Tests H0: Var(alpha_i) = 0 (Pooled OLS is appropriate) vs H1: Var(alpha_i) > 0 (Random Effects needed).
Returns:
| Type | Description |
|---|---|
dict
|
'statistic', 'df', 'pvalue', 'recommendation', 'interpretation' |
f_test_effects ¶
F-test for joint significance of entity fixed effects.
Tests H0: all alpha_i = 0 (entity effects not needed).
Returns:
| Type | Description |
|---|---|
dict
|
'statistic', 'df1', 'df2', 'pvalue', 'interpretation' |
pesaran_cd_test ¶
Pesaran (2004) CD test for cross-sectional dependence in residuals.
Returns:
| Type | Description |
|---|---|
dict
|
'statistic', 'pvalue', 'interpretation' |
plot ¶
Generate panel-specific plots.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
type
|
str
|
|
'coef'
|
**kwargs
|
Passed to the underlying plot function. |
{}
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
plot_effects ¶
Shortcut for .plot(type='effects'). Distribution of entity FE.
plot_residuals ¶
Shortcut for .plot(type='residuals'). Residual diagnostics (2x2).
plot_hausman ¶
Shortcut for .plot(type='hausman'). Visual FE vs RE comparison.
compare ¶
compare(method: str, **kwargs) -> PanelCompareResults
Re-estimate with a different method and compare side by side.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
method
|
str
|
Alternative method to compare against. |
required |
Returns:
| Type | Description |
|---|---|
PanelCompareResults
|
Side-by-side comparison with diagnostics. |
PanelCompareResults ¶
Side-by-side comparison of two panel models.
plot ¶
Side-by-side coefficient comparison plot.
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
PanelRegression ¶
Deprecated: use panel() directly. Kept for backward compatibility.
Absorber ¶
Reusable HDFE demean operator.
Build once from a DataFrame's FE columns; reuse demean to sweep any
outcome / regressor vector or matrix. Useful when fitting many models
that share the same absorbing FEs (e.g. event-study coefficient paths).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fe_data
|
DataFrame or ndarray(n, K)
|
FE columns. Must have no NaN. |
required |
weights
|
ndarray(n)
|
Observation weights. If given, weighted means are used. |
None
|
drop_singletons
|
bool
|
If True, singleton observations (FE groups of size 1) are pruned
before building the absorber. |
True
|
tol
|
float
|
Convergence threshold on max |dx| per iteration. |
1e-8
|
maxiter
|
int
|
Maximum alternating-projection iterations. |
10000
|
accelerate
|
bool
|
Enable Irons-Tuck Δ² acceleration. |
True
|
solver
|
('map', 'lsmr', 'lsqr')
|
Within-transformation backend. |
"map"
|
Attributes:
| Name | Type | Description |
|---|---|---|
keep_mask |
ndarray of bool
|
Rows retained after singleton pruning. Callers must apply this
mask to |
n_kept |
int
|
Number of surviving observations. |
n_dropped |
int
|
Number of singleton observations removed. |
n_fe |
list of int
|
Number of groups per FE dimension (post-prune). |
demean ¶
Within-transform x by sweeping out all absorbed FEs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
x
|
(ndarray, shape(n) or (n, p))
|
Variable(s) to residualize. |
required |
copy
|
bool
|
If True, operate on a copy; if False, modify |
True
|
already_masked
|
bool
|
Skip application of |
False
|
Returns:
| Type | Description |
|---|---|
ndarray
|
Residualized |
residualize ¶
Alias for demean — returns FE-residualized version of x.
FEOLSResult
dataclass
¶
Result of sp.feols().
Attributes:
| Name | Type | Description |
|---|---|---|
params |
Series
|
Coefficient estimates indexed by regressor name. |
std_errors |
Series
|
Standard errors indexed by regressor name. |
vcov |
ndarray
|
Variance-covariance matrix of the coefficients. |
tvalues, pvalues |
Series
|
|
conf_int_lower, conf_int_upper |
Series
|
|
residuals |
ndarray
|
In-sample residuals (after FE absorption). |
fitted_within |
ndarray
|
Predicted values from X β (excludes FE contribution). |
n_obs |
int
|
|
n_singletons_dropped |
int
|
|
n_fe |
List[int]
|
Number of groups per absorbed FE dimension. |
dof_fe |
int
|
Degrees of freedom consumed by the FEs. |
df_resid |
int
|
|
r2_within |
float
|
|
se_type |
str
|
'iid' | 'cluster' | 'multiway_cluster' | 'wild_cluster' |
cluster_info |
dict
|
Metadata (cluster names, counts). |
formula |
str
|
|
absorber |
Absorber
|
Reusable absorber (includes |
panel_compare ¶
panel_compare(data: DataFrame, formula: str, entity: str, time: str, methods: Optional[List[str]] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs) -> DataFrame
Estimate the same model with multiple methods and compare.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
formula
|
str
|
|
required |
entity
|
str
|
|
required |
time
|
str
|
|
required |
methods
|
list of str
|
Methods to compare. Default: ['pooled', 'fe', 're', 'twoway', 'mundlak']. |
None
|
robust
|
str
|
Passed to each |
'nonrobust'
|
cluster
|
str
|
Passed to each |
'nonrobust'
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Comparison table with coefficients, SEs, and diagnostics. |
Examples:
balance_panel ¶
Balance a panel by keeping only units observed in every time period.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
entity
|
str
|
Entity (unit) identifier column. |
required |
time
|
str
|
Time period column. |
required |
Returns:
| Type | Description |
|---|---|
DataFrame
|
Balanced panel (same column order, sorted by entity then time). |
Examples:
panel_logit ¶
panel_logit(data: DataFrame, y: str, x: list, id: str = 'id', time: str = 'time', method: str = 'fe', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: str = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults
Panel logit model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
y
|
str
|
Binary dependent variable (0/1). |
required |
x
|
list of str
|
Regressors. |
required |
id
|
str
|
Unit and time identifier columns. |
'id'
|
time
|
str
|
Unit and time identifier columns. |
'id'
|
method
|
str
|
'fe' (conditional FE logit), 're' (random effects), 'cre' (Mundlak). |
'fe'
|
n_quadrature
|
int
|
Gauss-Hermite quadrature points (RE/CRE only). |
12
|
robust
|
str
|
'nonrobust' or 'robust'. |
'nonrobust'
|
cluster
|
str or None
|
Column for cluster-robust SEs. |
None
|
maxiter
|
int
|
Maximum optimizer iterations. |
200
|
tol
|
float
|
Gradient tolerance. |
1e-08
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
panel_probit ¶
panel_probit(data: DataFrame, y: str, x: list, id: str = 'id', time: str = 'time', method: str = 're', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: str = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults
Panel probit model.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Panel data in long format. |
required |
y
|
str
|
Binary dependent variable (0/1). |
required |
x
|
list of str
|
Regressors. |
required |
id
|
str
|
Unit and time identifier columns. |
'id'
|
time
|
str
|
Unit and time identifier columns. |
'id'
|
method
|
str
|
're' (random effects) or 'cre' (Mundlak). FE probit not supported (incidental parameters problem). |
're'
|
n_quadrature
|
int
|
Gauss-Hermite quadrature points. |
12
|
robust
|
str
|
'nonrobust' or 'robust'. |
'nonrobust'
|
cluster
|
str or None
|
Column for cluster-robust SEs. |
None
|
maxiter
|
int
|
Maximum optimizer iterations. |
200
|
tol
|
float
|
Gradient tolerance. |
1e-08
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
plot_within_between ¶
plot_within_between(data: DataFrame, variables: List[str], entity: str, ax=None, figsize: tuple = (8, 6), color: str = '#2C3E50', title: Optional[str] = None) -> Tuple
Bar chart comparing within vs between variation for each variable.
Helps assess which estimator is appropriate: - High between / low within → BE may be efficient - High within / low between → FE captures the action - Similar → both capture similar information
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
variables
|
list of str
|
Variables to decompose. |
required |
entity
|
str
|
Entity identifier column. |
required |
ax
|
matplotlib Axes
|
|
None
|
figsize
|
tuple
|
|
(8, 6)
|
color
|
str
|
|
'#2C3E50'
|
title
|
str
|
|
None
|
Returns:
| Type | Description |
|---|---|
(fig, ax)
|
|
demean ¶
demean(x: ndarray, fe: Union[DataFrame, ndarray], weights: Optional[ndarray] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, solver: str = 'map') -> Tuple[ndarray, ndarray]
Return the within-transformed x and the singleton keep mask.
Convenience wrapper around :class:Absorber. See Absorber for
the solver kwarg semantics.
absorb_ols ¶
absorb_ols(y: ndarray, X: ndarray, fe: Union[DataFrame, ndarray], weights: Optional[ndarray] = None, cluster: Optional[Union[ndarray, List[ndarray]]] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, return_absorber: bool = False, solver: str = 'map') -> dict
OLS with absorbed high-dimensional fixed effects (reghdfe-style).
Solves y = X β + Σ_k α_{g_k} + ε by sweeping out the FEs from
both y and X (Frisch-Waugh-Lovell) and running OLS on residuals.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
(ndarray, shape(n))
|
|
required |
X
|
(ndarray, shape(n, p))
|
Regressors excluding the absorbed FEs and the constant (the constant is absorbed by any FE dimension). |
required |
fe
|
DataFrame or ndarray(n, K)
|
Fixed-effect columns. |
required |
weights
|
ndarray(n)
|
Observation weights. |
None
|
cluster
|
ndarray or list of ndarrays
|
One-way or multi-way cluster variables for robust SEs. If provided, returns cluster-robust SEs (one-way: Liang-Zeger sandwich; multi-way: inclusion-exclusion Cameron-Gelbach-Miller). |
None
|
drop_singletons
|
bool
|
|
True
|
tol
|
(float, int)
|
Demean convergence controls. |
1e-08
|
maxiter
|
(float, int)
|
Demean convergence controls. |
1e-08
|
return_absorber
|
bool
|
If True, also return the |
False
|
solver
|
('map', 'lsmr', 'lsqr')
|
Within-transformation backend. See :class: |
"map"
|
Returns:
| Type | Description |
|---|---|
dict with keys:
|
|
hdfe_feols ¶
hdfe_feols(formula: str, data: DataFrame, *, weights: Optional[Union[str, ndarray]] = None, cluster: Optional[Union[str, List[str]]] = None, se_type: Optional[str] = None, wild: bool = False, wild_n_boot: int = 999, wild_weight_type: str = 'webb', wild_seed: Optional[int] = None, alpha: float = 0.05, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000) -> FEOLSResult
reghdfe-style OLS with high-dimensional fixed effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
|
required |
data
|
DataFrame
|
|
required |
weights
|
str or ndarray
|
Observation weights. Column name or raw array. |
None
|
cluster
|
str or list
|
One-way or multi-way cluster column(s). |
None
|
se_type
|
('iid', 'cluster', 'multiway_cluster', 'wild_cluster')
|
Override automatic inference of SE type. Usually inferred from
|
'iid'
|
wild
|
bool
|
If True (and |
False
|
wild_n_boot
|
int
|
Bootstrap replications. |
999
|
wild_weight_type
|
('rademacher', 'webb', 'mammen')
|
|
'rademacher'
|
wild_seed
|
int
|
|
None
|
alpha
|
float
|
|
0.05
|
drop_singletons
|
bool
|
|
True
|
tol
|
convergence controls for the absorber.
|
|
1e-08
|
maxiter
|
convergence controls for the absorber.
|
|
1e-08
|
Returns:
| Type | Description |
|---|---|
FEOLSResult
|
|
Examples:
panel ¶
panel(data: Any = None, formula: Optional[str] = None, entity: Optional[str] = None, time: Optional[str] = None, *, method: str = 'fe', **kwargs: Any)
Unified panel-regression dispatcher.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
None
|
formula
|
str
|
Patsy-style outcome ~ regressors specification. |
None
|
entity
|
str
|
Cross-section identifier column. |
None
|
time
|
str
|
Time identifier column. |
None
|
method
|
str
|
Estimator family:
|
``'fe'``
|
**kwargs
|
Any
|
Forwarded to the chosen estimator. Classical methods accept
|
{}
|
Returns:
| Type | Description |
|---|---|
Result object whose type depends on ``method``.
|
|
Examples:
>>> # Default: within (FE) estimator
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year')
>>> # Random effects with Hausman test
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year',
... method='re')
>>> # Friendly alias (case insensitive)
>>> r = sp.panel(df, "wage ~ exp", entity='id', time='year',
... method='Fixed')