statspai.gmm¶
gmm ¶
Dynamic panel GMM estimators for StatsPAI.
Provides: - Arellano-Bond (1991) first-differenced GMM - Blundell-Bond (1998) system GMM - Arellano-Bond test for serial correlation (AR(1)/AR(2)) - Hansen/Sargan test for overidentifying restrictions
xtabond ¶
xtabond(data: DataFrame, y: str, x: Optional[List[str]] = None, id: str = 'id', time: str = 'time', lags: int = 1, gmm_lags: Tuple[int, Optional[int]] = (2, None), method: str = 'difference', twostep: bool = False, robust: bool = True, alpha: float = 0.05) -> CausalResult
Arellano-Bond / Blundell-Bond dynamic panel GMM estimator.
Equivalent to Stata's xtabond / xtabond2.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Balanced or unbalanced panel in long format. |
required |
y
|
str
|
Dependent variable. |
required |
x
|
list of str
|
Strictly exogenous regressors. Entered in first differences both as regressors and as their own (standard) instruments. |
None
|
id
|
str
|
Unit identifier. |
'id'
|
time
|
str
|
Time period variable. Treated as an ordinal sequence: the sorted distinct values define consecutive periods, so a missing period (gap) is recognised, but non-integer / irregularly-spaced codes are collapsed to their rank order. |
'time'
|
lags
|
int
|
Number of lags of Y to include (ρ₁ Y_{t-1} + ... + ρ_p Y_{t-p}). |
1
|
gmm_lags
|
tuple(min, max)
|
Range of lags of Y (in levels) used as GMM instruments. |
(2, None)
|
method
|
str
|
|
'difference'
|
twostep
|
bool
|
Use two-step GMM with the efficient weight matrix. When
|
False
|
robust
|
bool
|
Heteroskedasticity-robust standard errors (Windmeijer-corrected
for two-step). When |
True
|
alpha
|
float
|
Significance level. |
0.05
|
Returns:
| Type | Description |
|---|---|
CausalResult
|
|
Examples:
>>> # Arellano-Bond (difference GMM)
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
... id='firm', time='year')
>>> print(result.summary())
>>> # Two-step with Windmeijer-corrected SEs
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
... id='firm', time='year', twostep=True)
Notes
Arellano-Bond (1991): First-differences the equation to remove fixed effects α_i, then uses lagged levels Y_{i,t-2}, Y_{i,t-3}, ... as a block-diagonal set of GMM instruments for ΔY_{i,t-1}.
No constant / time trend is included (unlike Stata's default
xtabond, which adds a _cons via a level moment). This matches
Stata's xtabond ..., noconstant; the reported ρ / β coefficients
are identical to Stata's _cons run when the series has no drift.
Balanced vs gapped panels. All standard errors, the Sargan/Hansen
tests, and the one-step AR(1)/AR(2) tests are validated to machine
precision against Stata for balanced (and ragged-but-gap-free) panels.
When a unit is missing an interior period, the estimator stays
consistent but its finite-sample numbers can differ from Stata's
xtabond by ~1% (Stata, xtabond2, and R's plm each use a
slightly different gap-weighting convention); a warning is emitted in
that case.
Key diagnostics: - AR(1) test: Should reject (expected in first differences). - AR(2) test: Should NOT reject (validates instrument exogeneity). - Sargan / Hansen test: Should NOT reject (overidentification). Sargan (one-step) is not robust to heteroskedasticity; prefer the two-step Hansen J when that is a concern.
See Roodman (2009, Stata Journal) for practical guidance.
References
Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies. [@arellano1991some] Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal. [@roodman2009xtabond]
gmm ¶
gmm(moment_fn: Callable, theta0: ndarray, data: DataFrame = None, W: ndarray = None, method: str = 'twostep', se: str = 'robust', maxiter: int = 200, tol: float = 1e-08, param_names: List[str] = None, alpha: float = 0.05) -> EconometricResults
General GMM estimator for arbitrary moment conditions.
Minimizes Q(θ) = ḡ(θ)' W ḡ(θ) where ḡ(θ) = (1/n) Σ g_i(θ).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
moment_fn
|
callable
|
Function g(theta, data) -> np.ndarray of shape (n, q) returning moment conditions for each observation. |
required |
theta0
|
ndarray
|
Initial parameter vector. |
required |
data
|
DataFrame
|
Data passed to moment_fn. |
None
|
W
|
ndarray
|
Weighting matrix (q x q). If None, uses identity (one-step) then optimal (two-step). |
None
|
method
|
str
|
'onestep', 'twostep', 'iterative', 'cue' (continuously updated). |
'twostep'
|
se
|
str
|
'robust' (HAC-type), 'unadjusted'. |
'robust'
|
maxiter
|
int
|
|
200
|
tol
|
float
|
|
1e-8
|
param_names
|
list of str
|
Names for parameters. |
None
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
EconometricResults
|
|
Examples:
>>> import statspai as sp
>>> import numpy as np
>>>
>>> # IV-GMM example
>>> def moment_fn(theta, data):
... y, X, Z = data['y'].values, data[['x1']].values, data[['z1', 'z2']].values
... X_full = np.column_stack([np.ones(len(y)), X])
... resid = y - X_full @ theta
... Z_full = np.column_stack([np.ones(len(y)), Z])
... return resid[:, np.newaxis] * Z_full # n x q moment conditions
>>>
>>> result = sp.gmm(moment_fn, theta0=np.zeros(2), data=df,
... param_names=['_cons', 'x1'])
>>> print(result.summary())