Skip to content

statspai.gmm

gmm

Dynamic panel GMM estimators for StatsPAI.

Provides: - Arellano-Bond (1991) first-differenced GMM - Blundell-Bond (1998) system GMM - Arellano-Bond test for serial correlation (AR(1)/AR(2)) - Hansen/Sargan test for overidentifying restrictions

xtabond

xtabond(data: DataFrame, y: str, x: Optional[List[str]] = None, id: str = 'id', time: str = 'time', lags: int = 1, gmm_lags: Tuple[int, Optional[int]] = (2, None), method: str = 'difference', twostep: bool = False, robust: bool = True, alpha: float = 0.05) -> CausalResult

Arellano-Bond / Blundell-Bond dynamic panel GMM estimator.

Equivalent to Stata's xtabond / xtabond2.

Parameters:

Name Type Description Default
data DataFrame

Balanced or unbalanced panel in long format.

required
y str

Dependent variable.

required
x list of str

Strictly exogenous regressors. Entered in first differences both as regressors and as their own (standard) instruments.

None
id str

Unit identifier.

'id'
time str

Time period variable. Treated as an ordinal sequence: the sorted distinct values define consecutive periods, so a missing period (gap) is recognised, but non-integer / irregularly-spaced codes are collapsed to their rank order.

'time'
lags int

Number of lags of Y to include (ρ₁ Y_{t-1} + ... + ρ_p Y_{t-p}).

1
gmm_lags tuple(min, max)

Range of lags of Y (in levels) used as GMM instruments. min must be ≥ 2 (deeper lags are orthogonal to the differenced error). max=None uses all available deeper lags, matching Stata's xtabond default. Setting max caps the instrument count (Stata's maxldep() / collapse-style trimming).

(2, None)
method str

'difference' — Arellano-Bond (first-differenced GMM). This is the validated path (machine-precision parity with Stata's xtabond). 'system' (Blundell-Bond) currently raises NotImplementedError: proper system GMM requires a stacked level equation and its own Stata parity reference, which is planned for a future release.

'difference'
twostep bool

Use two-step GMM with the efficient weight matrix. When robust=True the Windmeijer (2005) finite-sample correction is applied to the two-step standard errors; with robust=False the conventional (downward-biased) two-step SEs are returned and a warning is issued.

False
robust bool

Heteroskedasticity-robust standard errors (Windmeijer-corrected for two-step). When False, the classical one/two-step VCE.

True
alpha float

Significance level.

0.05

Returns:

Type Description
CausalResult

estimate / se are the lagged-Y (ρ₁) coefficient. detail carries the per-coefficient table (lagged Y first, then exogenous regressors). model_info holds n_obs (number of first-differenced observations entering the GMM, not the raw panel rows), the AR(1)/AR(2) Arellano-Bond test statistics, the Sargan test (one-step, valid under homoskedasticity), and — for two-step — the heteroskedasticity-robust Hansen J statistic.

Examples:

>>> # Arellano-Bond (difference GMM)
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
...                     id='firm', time='year')
>>> print(result.summary())
>>> # Two-step with Windmeijer-corrected SEs
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
...                     id='firm', time='year', twostep=True)
Notes

Arellano-Bond (1991): First-differences the equation to remove fixed effects α_i, then uses lagged levels Y_{i,t-2}, Y_{i,t-3}, ... as a block-diagonal set of GMM instruments for ΔY_{i,t-1}.

No constant / time trend is included (unlike Stata's default xtabond, which adds a _cons via a level moment). This matches Stata's xtabond ..., noconstant; the reported ρ / β coefficients are identical to Stata's _cons run when the series has no drift.

Balanced vs gapped panels. All standard errors, the Sargan/Hansen tests, and the one-step AR(1)/AR(2) tests are validated to machine precision against Stata for balanced (and ragged-but-gap-free) panels. When a unit is missing an interior period, the estimator stays consistent but its finite-sample numbers can differ from Stata's xtabond by ~1% (Stata, xtabond2, and R's plm each use a slightly different gap-weighting convention); a warning is emitted in that case.

Key diagnostics: - AR(1) test: Should reject (expected in first differences). - AR(2) test: Should NOT reject (validates instrument exogeneity). - Sargan / Hansen test: Should NOT reject (overidentification). Sargan (one-step) is not robust to heteroskedasticity; prefer the two-step Hansen J when that is a concern.

See Roodman (2009, Stata Journal) for practical guidance.

References

Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies. [@arellano1991some] Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal. [@roodman2009xtabond]

gmm

gmm(moment_fn: Callable, theta0: ndarray, data: DataFrame = None, W: ndarray = None, method: str = 'twostep', se: str = 'robust', maxiter: int = 200, tol: float = 1e-08, param_names: List[str] = None, alpha: float = 0.05) -> EconometricResults

General GMM estimator for arbitrary moment conditions.

Minimizes Q(θ) = ḡ(θ)' W ḡ(θ) where ḡ(θ) = (1/n) Σ g_i(θ).

Parameters:

Name Type Description Default
moment_fn callable

Function g(theta, data) -> np.ndarray of shape (n, q) returning moment conditions for each observation.

required
theta0 ndarray

Initial parameter vector.

required
data DataFrame

Data passed to moment_fn.

None
W ndarray

Weighting matrix (q x q). If None, uses identity (one-step) then optimal (two-step).

None
method str

'onestep', 'twostep', 'iterative', 'cue' (continuously updated).

'twostep'
se str

'robust' (HAC-type), 'unadjusted'.

'robust'
maxiter int
200
tol float
1e-8
param_names list of str

Names for parameters.

None
alpha float
0.05

Returns:

Type Description
EconometricResults

Examples:

>>> import statspai as sp
>>> import numpy as np
>>>
>>> # IV-GMM example
>>> def moment_fn(theta, data):
...     y, X, Z = data['y'].values, data[['x1']].values, data[['z1', 'z2']].values
...     X_full = np.column_stack([np.ones(len(y)), X])
...     resid = y - X_full @ theta
...     Z_full = np.column_stack([np.ones(len(y)), Z])
...     return resid[:, np.newaxis] * Z_full  # n x q moment conditions
>>>
>>> result = sp.gmm(moment_fn, theta0=np.zeros(2), data=df,
...                 param_names=['_cons', 'x1'])
>>> print(result.summary())