`statspai.panel`¶

panel ¶

Unified panel regression module for StatsPAI.

Provides a single entry point panel() covering all panel estimators:

Static models — FE, RE, Between, First Difference, Pooled OLS, Two-way FE Correlated RE — Mundlak (1978), Chamberlain (1982) Dynamic panel — Arellano-Bond, Blundell-Bond (System GMM) HDFE absorption — high-dimensional fixed-effects OLS (Stata's reghdfe / R's fixest)

All results return PanelResults with built-in diagnostics:

result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t') result.hausman_test() # FE vs RE result.bp_lm_test() # Pooled vs RE result.f_test_effects() # Joint significance of FE result.compare('re') # Side-by-side comparison

References

Wooldridge, J.M. (2010). Econometric Analysis of Cross Section and Panel Data. Mundlak, Y. (1978). "On the Pooling of Time Series and Cross Section Data." Chamberlain, G. (1982). "Multivariate Regression Models for Panel Data." Arellano, M. and Bond, S. (1991). "Some Tests of Specification for Panel Data." Blundell, R. and Bond, S. (1998). "Initial Conditions and Moment Restrictions." Hausman, J.A. (1978). "Specification Tests in Econometrics." Breusch, T.S. and Pagan, A.R. (1980). "The Lagrange Multiplier Test." Pesaran, M.H. (2004). "General Diagnostic Tests for Cross Section Dependence." Correia, S. (2017). "Linear Models with High-Dimensional Fixed Effects."

FEOLSResult `dataclass` ¶

Bases: ResultProtocolMixin

Result of sp.feols().

Attributes:

Name	Type	Description
`params`	`Series`	Coefficient estimates indexed by regressor name.
`std_errors`	`Series`	Standard errors indexed by regressor name.
`vcov`	`ndarray`	Variance-covariance matrix of the coefficients.
`tvalues, pvalues`	`Series`
`conf_int_lower, conf_int_upper`	`Series`
`residuals`	`ndarray`	In-sample residuals (after FE absorption).
`fitted_within`	`ndarray`	Predicted values from X β (excludes FE contribution).
`n_obs`	`int`
`n_singletons_dropped`	`int`
`n_fe`	`List[int]`	Number of groups per absorbed FE dimension.
`dof_fe`	`int`	Degrees of freedom consumed by the FEs.
`df_resid`	`int`
`r2_within`	`float`
`se_type`	`str`	'iid' \| 'cluster' \| 'multiway_cluster' \| 'wild_cluster'
`cluster_info`	`dict`	Metadata (cluster names, counts).
`formula`	`str`
`absorber`	`Absorber`	Reusable absorber (includes `keep_mask` to subset rows).

Examples:

>>> import statspai as sp
>>> from statspai.panel.feols import feols
>>> df = sp.mincer_wage_panel()
>>> res = feols("log_wage ~ education + experience | period",
...             data=df, cluster="period")
>>> type(res).__name__
'FEOLSResult'
>>> res.se_type
'cluster'
>>> bool(res.params["education"] > 0)
True

Absorber ¶

Reusable HDFE demean operator.

Build once from a DataFrame's FE columns; reuse demean to sweep any outcome / regressor vector or matrix. Useful when fitting many models that share the same absorbing FEs (e.g. event-study coefficient paths).

Parameters:

Name	Type	Description	Default
`fe_data`	`DataFrame, ndarray (n, K), or None`	FE columns. Must have no NaN. May be `None` (or an `(n, 0)` array) when only varying-slope terms are absorbed, in which case `n_obs` is inferred from `slopes`.	required
`weights`	`ndarray(n)`	Observation weights. If given, weighted means are used.	`None`
`drop_singletons`	`bool`	If True, singleton observations (FE groups of size 1) are pruned before building the absorber. `keep_mask` stores the surviving rows.	`True`
`tol`	`float`	Convergence threshold on max \|dx\| per iteration.	`1e-8`
`maxiter`	`int`	Maximum alternating-projection iterations.	`10000`
`accelerate`	`bool`	Enable Irons-Tuck Δ² acceleration.	`True`
`solver`	`('map', 'lsmr', 'lsqr')`	Within-transformation backend. `"map"` uses alternating projections with Irons-Tuck acceleration (default, typically fastest on well-conditioned panels). `"lsmr"` / `"lsqr"` delegate to `scipy.sparse.linalg.lsmr` / `lsqr` on the sparse FE design matrix — more robust for ill-conditioned or highly nested FE structures. See the migration guide for how this maps to `pyreghdfe`.	`"map"`
`slopes`	`sequence of SlopeSpec`	Varying-slope terms to absorb alongside `fe_data`. Each counts as one more absorbed dimension in the alternating projections.	`None`
`n_obs`	`int`	Row count, required only when `fe_data` is `None` and `slopes` is empty.	`None`

Attributes:

Name	Type	Description
`keep_mask`	`ndarray of bool`	Rows retained after singleton pruning. Callers must apply this mask to `y`, `X`, and any weights before passing to `demean`.
`n_kept`	`int`	Number of surviving observations.
`n_dropped`	`int`	Number of singleton observations removed.
`n_fe`	`list of int`	Number of groups per FE dimension (post-prune).
`slope_ops`	`list`	Pre-conditioned varying-slope projectors (post-prune). Each exposes `n_levels` and `n_degenerate` (levels whose within-level design is rank-deficient and absorbs nothing).
`n_slope_levels`	`list of int`	Number of levels per varying-slope term (post-prune).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 300
>>> fe_data = pd.DataFrame({
...     "firm": rng.integers(0, 20, n),
...     "year": rng.integers(0, 5, n),
... })
>>> y = (fe_data["firm"] * 0.5 + fe_data["year"] * 0.3
...      + rng.normal(0, 1, n))
>>> absorber = sp.Absorber(fe_data)
>>> y_within = absorber.demean(y.to_numpy())  # keep_mask applied
>>> absorber.n_fe
[20, 5]
>>> bool(abs(y_within.mean()) < 1e-6)
True

demean ¶

demean(x: ndarray, copy: bool = True, already_masked: bool = False) -> ndarray

Within-transform x by sweeping out all absorbed FEs.

Parameters:

Name	Type	Description	Default
`x`	`(ndarray, shape(n) or (n, p))`	Variable(s) to residualize. `n` must equal either the full input size (then `keep_mask` is applied) or `n_kept` (when `already_masked=True`).	required
`copy`	`bool`	If True, operate on a copy; if False, modify `x` in place. Callers passing fresh arrays can set False to save memory.	`True`
`already_masked`	`bool`	Skip application of `keep_mask`.	`False`

Returns:

Type	Description
`ndarray`	Residualized `x` with shape `(n_kept,)` or `(n_kept, p)`.

residualize ¶

residualize(x: ndarray, copy: bool = True) -> ndarray

Alias for demean — returns FE-residualized version of x.

SlopeSpec ¶

Bases: NamedTuple

A varying-slope absorbed term.

Represents Stata's absorb(i.g#c.x) / absorb(i.g##c.x) and fixest's | g[[x]] / | g[x].

Attributes:

Name	Type	Description
`group`	`ndarray(n)`	Raw group labels (numeric or object). Factorized internally.
`x`	`ndarray(n)`	Continuous variable whose coefficient varies across levels of `group`.
`with_intercept`	`bool`	False → absorb only the `G` slope columns `x · 1[g = j]` (Stata `#`, fixest `[[x]]`). True → additionally absorb the `G` level intercepts `1[g = j]` (Stata `##`, fixest `[x]`).
`name`	`str`	Display name, used in diagnostics and warnings.

Examples:

>>> import numpy as np
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> g = np.repeat(np.arange(4), 25)
>>> x = rng.normal(size=100)
>>> spec = sp.SlopeSpec(group=g, x=x, with_intercept=False, name="i.g#c.x")
>>> spec.with_intercept
False
>>> spec.name
'i.g#c.x'

PanelCompareResults ¶

Bases: ResultProtocolMixin

Side-by-side comparison of two panel models.

Produced by :meth:PanelResults.compare. Holds the two fitted models and renders a shared coefficient/SE table plus per-model diagnostics via :meth:summary, or a side-by-side coefficient plot via :meth:plot.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(40):
...     alpha = rng.normal()
...     for t in range(5):
...         x1, x2 = rng.normal(), rng.normal()
...         y = 1.0 + 0.5 * x1 - 0.3 * x2 + alpha + rng.normal(0, 0.5)
...         rows.append({"id": i, "year": t, "y": y, "x1": x1, "x2": x2})
>>> df = pd.DataFrame(rows)
>>> r = sp.panel(df, "y ~ x1 + x2", entity="id", time="year", method="fe")
>>> cmp = r.compare("re")                      # FE vs RE side by side
>>> isinstance(cmp, sp.PanelCompareResults)
True
>>> bool("Panel Comparison" in cmp.summary())
True

plot ¶

plot(variables: Optional[List[str]] = None, **kwargs: Any) -> Any

Side-by-side coefficient comparison plot.

Returns:

Type	Description
`(fig, ax)`

PanelRegression ¶

Deprecated: use panel() directly. Kept for backward compatibility.

Construct with the same keyword arguments as :func:panel, then call :meth:fit; the result is an ordinary :class:PanelResults. New code should call sp.panel(...) instead.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(40):
...     alpha = rng.normal()
...     for t in range(5):
...         x1, x2 = rng.normal(), rng.normal()
...         y = 1.0 + 0.5 * x1 - 0.3 * x2 + alpha + rng.normal(0, 0.5)
...         rows.append({"id": i, "year": t, "y": y, "x1": x1, "x2": x2})
>>> df = pd.DataFrame(rows)
>>> model = sp.PanelRegression(data=df, formula="y ~ x1 + x2",
...                            entity="id", time="year", method="fe")
>>> res = model.fit()
>>> isinstance(res, sp.PanelResults)
True

PanelResults ¶

Bases: EconometricResults

Panel regression results with built-in diagnostics.

Extends EconometricResults with panel-specific tests that can be called directly on the result object (see Examples for a runnable setup)::

result = sp.panel(df, "y ~ x1 + x2", entity='id', time='t')
result.hausman_test()        # FE vs RE
result.bp_lm_test()          # Pooled vs RE (Breusch-Pagan LM)
result.f_test_effects()      # Joint significance of entity FE
result.compare('re')         # Compare with RE side by side

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(40):
...     alpha = rng.normal()
...     for t in range(5):
...         x1, x2 = rng.normal(), rng.normal()
...         y = 1.0 + 0.5 * x1 - 0.3 * x2 + alpha + rng.normal(0, 0.5)
...         rows.append({"id": i, "year": t, "y": y, "x1": x1, "x2": x2})
>>> df = pd.DataFrame(rows)
>>> r = sp.panel(df, "y ~ x1 + x2", entity="id", time="year", method="fe")
>>> isinstance(r, sp.PanelResults)
True
>>> sorted(r.params.index)
['x1', 'x2']
>>> h = r.hausman_test()                       # FE vs RE specification test
>>> set(["statistic", "df", "pvalue"]).issubset(h)
True

hausman_test ¶

hausman_test(alpha: float = 0.05) -> Dict[str, Any]

Hausman (1978) specification test: FE vs RE.

Under H0 (RE consistent), both FE and RE are consistent but RE is efficient. Under H1, only FE is consistent.

Returns:

Type	Description
`dict`	'statistic', 'df', 'pvalue', 'recommendation', 'interpretation'

bp_lm_test ¶

bp_lm_test() -> Dict[str, Any]

Breusch-Pagan (1980) Lagrange Multiplier test for random effects.

Tests H0: Var(alpha_i) = 0 (Pooled OLS is appropriate) vs H1: Var(alpha_i) > 0 (Random Effects needed).

Returns:

Type	Description
`dict`	'statistic', 'df', 'pvalue', 'recommendation', 'interpretation'

f_test_effects ¶

f_test_effects() -> Dict[str, Any]

F-test for joint significance of entity fixed effects.

Tests H0: all alpha_i = 0 (entity effects not needed).

Returns:

Type	Description
`dict`	'statistic', 'df1', 'df2', 'pvalue', 'interpretation'

pesaran_cd_test ¶

pesaran_cd_test() -> Dict[str, Any]

Pesaran (2004) CD test for cross-sectional dependence in residuals.

Returns:

Type	Description
`dict`	'statistic', 'pvalue', 'interpretation'

plot ¶

plot(type: str = 'coef', **kwargs: Any) -> Any

Generate panel-specific plots.

Parameters:

Name	Type	Description	Default
`type`	`str`	`'coef'` — Coefficient forest plot (default) `'effects'` — Distribution of entity fixed effects `'residuals'` — Residual diagnostics (2x2 grid) `'hausman'` — Visual FE vs RE comparison	`'coef'`
`**kwargs`	`Any`	Passed to the underlying plot function.	`{}`

Returns:

Type	Description
`(fig, ax)`

plot_effects ¶

plot_effects(**kwargs: Any) -> Any

Shortcut for .plot(type='effects'). Distribution of entity FE.

plot_residuals ¶

plot_residuals(**kwargs: Any) -> Any

Shortcut for .plot(type='residuals'). Residual diagnostics (2x2).

plot_hausman ¶

plot_hausman(**kwargs: Any) -> Any

Shortcut for .plot(type='hausman'). Visual FE vs RE comparison.

compare ¶

compare(method: str, **kwargs: Any) -> PanelCompareResults

Re-estimate with a different method and compare side by side.

Parameters:

Name	Type	Description	Default
`method`	`str`	Alternative method to compare against.	required

Returns:

Type	Description
`PanelCompareResults`	Side-by-side comparison with diagnostics.

hdfe_feols ¶

hdfe_feols(formula: str, data: DataFrame, *, weights: Optional[Union[str, ndarray]] = None, cluster: Optional[Union[str, List[str]]] = None, se_type: Optional[str] = None, vce: Optional[str] = None, wild: bool = False, wild_n_boot: int = 999, wild_weight_type: str = 'webb', wild_seed: Optional[int] = None, conley_lat: Optional[str] = None, conley_lon: Optional[str] = None, conley_cutoff: Optional[float] = None, alpha: float = 0.05, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000) -> FEOLSResult

reghdfe-style OLS with high-dimensional fixed effects.

Parameters:

Name	Type	Description	Default
`formula`	`str`	`"y ~ x1 + x2 \| fe1 + fe2 + fe3"`. The `\| fe...` part is optional. Both sides accept bare names, `c.x` / `i.f`, `a:b`, `a*b`, `f1^f2` and the varying-slope forms `i.f#c.x` / `i.f##c.x` / `f[[x]]` / `f[x]` — see the module docstring for the full grammar.	required
`data`	`DataFrame`		required
`weights`	`str or ndarray`	Observation weights. Column name or raw array.	`None`
`cluster`	`str or list`	One-way or multi-way cluster column(s).	`None`
`se_type`	`('iid', 'cluster', 'multiway_cluster', 'wild_cluster')`	Override automatic inference of SE type. Usually inferred from `cluster` / `wild`.	`'iid'`
`vce`	`str`	Canonical SE-menu keyword (matches `sp.regress` / `sp.feols`): `"robust"` / `"hc1"` — heteroskedasticity-robust on the FE-absorbed design with reghdfe's small-sample factor `N/(N-k-df_a)`; matches Stata `reghdfe ..., vce(robust)`. `"hc0"` — no small-sample factor. `"CR2"` / `"CR3"` / `"jackknife"` — Pustejovsky-Tipton (2018) bias-reduced cluster-robust on the within design (requires `cluster=`, one-way); matches R `clubSandwich::vcovCR(plm)`. `"conley"` — Conley spatial HAC on the within design (requires `conley_lat=/conley_lon=/conley_cutoff=`; Stata `acreg` planar distance convention). `"wild"` — shorthand for `wild=True` (requires `cluster=`).	`None`
`wild`	`bool`	If True (and `cluster` is given), return wild-cluster-bootstrap p-values / CIs alongside classical cluster SE. Applied variable- by-variable. Only supported with a single cluster column.	`False`
`wild_n_boot`	`int`	Bootstrap replications.	`999`
`wild_weight_type`	`('rademacher', 'webb', 'mammen')`		`'rademacher'`
`wild_seed`	`int`		`None`
`conley_lat`	`str`	Coordinate columns (decimal degrees) for `vce="conley"`.	`None`
`conley_lon`	`str`	Coordinate columns (decimal degrees) for `vce="conley"`.	`None`
`conley_cutoff`	`float`	Conley distance cutoff in km for `vce="conley"`.	`None`
`alpha`	`float`		`0.05`
`drop_singletons`	`bool`		`True`
`tol`	`convergence controls for the absorber.`		`1e-08`
`maxiter`	`convergence controls for the absorber.`		`1e-08`

Returns:

Type	Description
`FEOLSResult`

Examples:

Two-way fixed effects (firm and year) with cluster-robust SE. This function is exported at top level as :func:statspai.hdfe_ols.

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> firm_fe = rng.normal(0, 1, 40)
>>> year_fe = rng.normal(0, 0.5, 6)
>>> rows = []
>>> for i in range(40):
...     for t in range(6):
...         educ = rng.normal(12, 2)
...         exper = rng.normal(10, 3)
...         lwage = (1.0 + 0.08 * educ + 0.02 * exper
...                  + firm_fe[i] + year_fe[t] + rng.normal(0, 0.3))
...         rows.append((i, t, lwage, educ, exper))
>>> df = pd.DataFrame(rows, columns=['firm', 'year', 'lwage',
...                                  'educ', 'exper'])
>>> res = sp.hdfe_ols("lwage ~ educ + exper | firm + year", data=df,
...                   cluster='firm')
>>> sorted(res.coef.index.tolist())
['educ', 'exper']

absorb_ols ¶

absorb_ols(y: ndarray, X: ndarray, fe: Union[DataFrame, ndarray, None], weights: Optional[ndarray] = None, cluster: Optional[Union[ndarray, List[ndarray]]] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, return_absorber: bool = False, solver: str = 'map', slopes: Optional[Sequence[SlopeSpec]] = None) -> dict

OLS with absorbed high-dimensional fixed effects (reghdfe-style).

Solves y = X β + Σ_k α_{g_k} + ε by sweeping out the FEs from both y and X (Frisch-Waugh-Lovell) and running OLS on residuals.

Parameters:

Name	Type	Description	Default
`y`	`(ndarray, shape(n))`		required
`X`	`(ndarray, shape(n, p))`	Regressors excluding the absorbed FEs and the constant (the constant is absorbed by any FE dimension).	required
`fe`	`DataFrame or ndarray(n, K)`	Fixed-effect columns.	required
`weights`	`ndarray(n)`	Observation weights.	`None`
`cluster`	`ndarray or list of ndarrays`	One-way or multi-way cluster variables for robust SEs. If provided, returns cluster-robust SEs (one-way: Liang-Zeger sandwich; multi-way: inclusion-exclusion Cameron-Gelbach-Miller).	`None`
`drop_singletons`	`bool`		`True`
`tol`	`(float, int)`	Demean convergence controls.	`1e-08`
`maxiter`	`(float, int)`	Demean convergence controls.	`1e-08`
`return_absorber`	`bool`	If True, also return the `Absorber` object for reuse.	`False`
`solver`	`('map', 'lsmr', 'lsqr')`	Within-transformation backend. See :class:`Absorber`.	`"map"`

Returns:

Type	Description
`dict with keys:`	`coef` (p,), `se` (p,), `vcov` (p,p), `resid` (n_kept,), `n` (n_kept), `df_resid`, `dof_fe`, `r2_within`, `n_singletons_dropped`, `converged`, `iters`, `absorber` (if requested)

Examples:

Two-way (firm + year) fixed effects with firm-clustered SEs:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 500
>>> firm = rng.integers(0, 30, n)
>>> year = rng.integers(0, 10, n)
>>> X = rng.normal(size=(n, 2))
>>> y = (X @ np.array([1.5, -0.7]) + rng.normal(size=30)[firm]
...      + rng.normal(size=10)[year] + rng.normal(0, 0.5, n))
>>> fe = pd.DataFrame({"firm": firm, "year": year})
>>> out = sp.absorb_ols(y, X, fe, cluster=firm)
>>> print(np.round(out["coef"], 2))   # [ 1.53 -0.71] — truth [1.5, -0.7]
>>> print(out["n"], out["converged"])  # 500 True

demean ¶

demean(x: ndarray, fe: Union[DataFrame, ndarray, None], weights: Optional[ndarray] = None, drop_singletons: bool = True, tol: float = 1e-08, maxiter: int = 10000, solver: str = 'map', slopes: Optional[Sequence[SlopeSpec]] = None) -> Tuple[ndarray, ndarray]

Return the within-transformed x and the singleton keep mask.

Convenience wrapper around :class:Absorber. See Absorber for the solver kwarg semantics.

Examples:

Sweep firm and year fixed effects out of a column:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 200
>>> fe = pd.DataFrame({"firm": rng.integers(0, 20, n),
...                    "year": rng.integers(0, 5, n)})
>>> x = rng.normal(size=n) + 2.0 * fe["firm"].to_numpy()
>>> xw, keep = sp.demean(x, fe)
>>> print(xw.shape, int(keep.sum()))  # (200,) 200 — no singletons dropped
>>> print(abs(xw.mean()) < 1e-8)      # True — FE means swept out

panel_logit ¶

panel_logit(data: DataFrame, y: str, x: List[str], id: str = 'id', time: str = 'time', method: str = 'fe', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: Optional[str] = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults

Panel logit model.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`y`	`str`	Binary dependent variable (0/1).	required
`x`	`list of str`	Regressors.	required
`id`	`str`	Unit and time identifier columns.	`'id'`
`time`	`str`	Unit and time identifier columns.	`'id'`
`method`	`str`	'fe' (conditional FE logit), 're' (random effects), 'cre' (Mundlak).	`'fe'`
`n_quadrature`	`int`	Gauss-Hermite quadrature points (RE/CRE only).	`12`
`robust`	`str`	'nonrobust' or 'robust'.	`'nonrobust'`
`cluster`	`str or None`	Column for cluster-robust SEs.	`None`
`maxiter`	`int`	Maximum optimizer iterations.	`200`
`tol`	`float`	Gradient tolerance.	`1e-08`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(60):
...     a = rng.normal()  # unit effect
...     for t in range(5):
...         xit = rng.normal()
...         p = 1.0 / (1.0 + np.exp(-(0.8 * xit + a)))
...         rows.append({"id": i, "time": t,
...                      "y": int(rng.uniform() < p), "x": xit})
>>> df = pd.DataFrame(rows)
>>> res = sp.panel_logit(df, y="y", x=["x"], id="id", time="time", method="fe")
>>> bool("x" in res.params.index)
True

panel_probit ¶

panel_probit(data: DataFrame, y: str, x: List[str], id: str = 'id', time: str = 'time', method: str = 're', n_quadrature: int = 12, robust: str = 'nonrobust', cluster: Optional[str] = None, maxiter: int = 200, tol: float = 1e-08, alpha: float = 0.05) -> EconometricResults

Panel probit model.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`y`	`str`	Binary dependent variable (0/1).	required
`x`	`list of str`	Regressors.	required
`id`	`str`	Unit and time identifier columns.	`'id'`
`time`	`str`	Unit and time identifier columns.	`'id'`
`method`	`str`	're' (random effects) or 'cre' (Mundlak). FE probit not supported (incidental parameters problem).	`'re'`
`n_quadrature`	`int`	Gauss-Hermite quadrature points.	`12`
`robust`	`str`	'nonrobust' or 'robust'.	`'nonrobust'`
`cluster`	`str or None`	Column for cluster-robust SEs.	`None`
`maxiter`	`int`	Maximum optimizer iterations.	`200`
`tol`	`float`	Gradient tolerance.	`1e-08`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> from scipy.stats import norm
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(50):
...     a = rng.normal()  # unit effect
...     for t in range(4):
...         xit = rng.normal()
...         p = norm.cdf(0.7 * xit + a)
...         rows.append({"id": i, "time": t,
...                      "y": int(rng.uniform() < p), "x": xit})
>>> df = pd.DataFrame(rows)
>>> res = sp.panel_probit(df, y="y", x=["x"], id="id", time="time",
...                       method="re", n_quadrature=8)
>>> bool("x" in res.params.index)
True

plot_within_between ¶

plot_within_between(data: DataFrame, variables: List[str], entity: str, ax: Any = None, figsize: Tuple[float, float] = (8, 6), color: str = '#2C3E50', title: Optional[str] = None) -> Tuple[Any, Any]

Bar chart comparing within vs between variation for each variable.

Helps assess which estimator is appropriate: - High between / low within → BE may be efficient - High within / low between → FE captures the action - Similar → both capture similar information

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`variables`	`list of str`	Variables to decompose.	required
`entity`	`str`	Entity identifier column.	required
`ax`	`matplotlib Axes`		`None`
`figsize`	`tuple`		`(8, 6)`
`color`	`str`		`'#2C3E50'`
`title`	`str`		`None`

Returns:

Type	Description
`(fig, ax)`

balance_panel ¶

balance_panel(data: DataFrame, entity: str, time: str) -> DataFrame

Balance a panel by keeping only units observed in every time period.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Panel data in long format.	required
`entity`	`str`	Entity (unit) identifier column.	required
`time`	`str`	Time period column.	required

Returns:

Type	Description
`DataFrame`	Balanced panel (same column order, sorted by entity then time).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(5):
...     for t in range(4):
...         if i == 3 and t == 2:
...             continue  # drop one obs -> unbalanced
...         rows.append({"id": i, "year": t, "y": rng.normal()})
>>> df = pd.DataFrame(rows)
>>> df.groupby("id")["year"].count().tolist()
[4, 4, 4, 3, 4]
>>> balanced = sp.balance_panel(df, entity="id", time="year")
>>> int(balanced.groupby("id")["year"].count().nunique())  # all same count
1
>>> balanced["id"].nunique()  # the short unit is dropped
4

panel_compare ¶

panel_compare(data: DataFrame, formula: str, entity: str, time: str, methods: Optional[List[str]] = None, robust: str = 'nonrobust', cluster: Optional[str] = None, **kwargs: Any) -> DataFrame

Estimate the same model with multiple methods and compare.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`formula`	`str`		required
`entity`	`str`		required
`time`	`str`		required
`methods`	`list of str`	Methods to compare. Default: ['pooled', 'fe', 're', 'twoway', 'mundlak'].	`None`
`robust`	`str`	Passed to each `panel()` call.	`'nonrobust'`
`cluster`	`str`	Passed to each `panel()` call.	`'nonrobust'`

Returns:

Type	Description
`DataFrame`	Comparison table with coefficients, SEs, and diagnostics.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for i in range(40):
...     alpha = rng.normal()
...     for t in range(5):
...         edu, exp = rng.normal(), rng.normal()
...         wage = 1.0 + 0.5 * edu - 0.3 * exp + alpha + rng.normal(0, 0.5)
...         rows.append({"id": i, "year": t, "wage": wage,
...                      "edu": edu, "exp": exp})
>>> df = pd.DataFrame(rows)
>>> comparison = sp.panel_compare(
...     df, "wage ~ edu + exp", entity="id", time="year",
...     methods=["pooled", "fe", "re"],
... )
>>> isinstance(comparison, pd.DataFrame)
True
>>> bool("edu" in comparison.index)  # one row per coefficient
True

panel ¶

panel(data: Any = None, formula: Optional[str] = None, entity: Optional[str] = None, time: Optional[str] = None, *, method: str = 'fe', **kwargs: Any) -> Union[PanelResults, FEOLSResult]

Unified panel-regression dispatcher.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		`None`
`formula`	`str`	Patsy-style outcome ~ regressors specification.	`None`
`entity`	`str`	Cross-section identifier column.	`None`
`time`	`str`	Time identifier column.	`None`
`method`	`str`	Estimator family: Static: `fe` / `fixed`, `re` / `random`, `be` / `between`, `fd` / `first_difference`, `pooled` / `pooled_ols`, `twoway` / `two_way`. Correlated random effects: `mundlak`, `chamberlain`. Dynamic GMM: `ab` / `arellano_bond` / `gmm`, `system` / `blundell_bond` / `system_gmm`. HDFE absorption: `hdfe` / `feols` / `reghdfe` (high-dimensional fixed-effects OLS).	``'fe'``
`**kwargs`	`Any`	Forwarded to the chosen estimator. Classical methods accept `robust` / `cluster` / `weights` / `alpha` / `balance` / `lags` / `gmm_lags`, plus (for `method='fe'`) the extended SE menu `vce='CR2'/'CR3'/'jackknife'/'conley'/'wild'`, two-way `cluster=['a', 'b']`, `conley_lat/lon/cutoff` and `wild_reps` / `seed`. HDFE accepts `cluster` / `vce` / `se_type` / `wild` / `alpha` etc.	`{}`

Returns:

Type	Description
Result object whose type depends on ``method``.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n_id, n_t = 80, 8
>>> ids = np.repeat(np.arange(n_id), n_t)
>>> years = np.tile(np.arange(2000, 2000 + n_t), n_id)
>>> exp = rng.normal(10, 3, n_id * n_t)
>>> edu = rng.normal(12, 2, n_id * n_t)
>>> alpha_i = np.repeat(rng.normal(0, 1, n_id), n_t)
>>> wage = 1.0 + 0.05 * exp + 0.08 * edu + alpha_i + rng.normal(0, 0.5, n_id * n_t)
>>> df = pd.DataFrame({'id': ids, 'year': years, 'wage': wage,
...                    'exp': exp, 'edu': edu})

>>> # Default: within (FE) estimator
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year')

>>> # Random effects
>>> r = sp.panel(df, "wage ~ exp + edu", entity='id', time='year',
...              method='re')

>>> # Friendly alias (case insensitive)
>>> r = sp.panel(df, "wage ~ exp", entity='id', time='year',
...              method='Fixed')

>>> # Arellano-Bond dynamic panel
>>> df['wage_lag'] = df.groupby('id')['wage'].shift(1)
>>> r = sp.panel(df, "wage ~ wage_lag + edu", entity='id', time='year',
...              method='gmm', lags=1)

>>> # HDFE absorption (multiple FEs in formula)
>>> r = sp.panel(df, "wage ~ exp | id + year", method='hdfe',
...              cluster='id')

statspai.panel¶

panel ¶

FEOLSResult dataclass ¶

Absorber ¶

demean ¶

residualize ¶

SlopeSpec ¶

PanelCompareResults ¶

plot ¶

PanelRegression ¶

PanelResults ¶

hausman_test ¶

bp_lm_test ¶

f_test_effects ¶

pesaran_cd_test ¶

plot ¶

plot_effects ¶

plot_residuals ¶

plot_hausman ¶

compare ¶

hdfe_feols ¶

absorb_ols ¶

demean ¶

panel_logit ¶

panel_probit ¶

plot_within_between ¶

balance_panel ¶

panel_compare ¶

panel ¶

`statspai.panel`¶

FEOLSResult `dataclass` ¶