`statspai.gmm`¶

gmm ¶

Dynamic panel GMM estimators for StatsPAI.

Provides: - Arellano-Bond (1991) first-differenced GMM - Blundell-Bond (1998) system GMM - Arellano-Bond test for serial correlation (AR(1)/AR(2)) - Hansen/Sargan test for overidentifying restrictions

xtabond ¶

xtabond(data: DataFrame, y: str, x: Optional[List[str]] = None, id: str = 'id', time: str = 'time', lags: int = 1, gmm_lags: Tuple[int, Optional[int]] = (2, None), method: str = 'difference', twostep: bool = False, robust: bool = True, alpha: float = 0.05) -> CausalResult

Arellano-Bond / Blundell-Bond dynamic panel GMM estimator.

Equivalent to Stata's xtabond / xtabond2.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Balanced or unbalanced panel in long format.	required
`y`	`str`	Dependent variable.	required
`x`	`list of str`	Strictly exogenous regressors. Entered in first differences both as regressors and as their own (standard) instruments.	`None`
`id`	`str`	Unit identifier.	`'id'`
`time`	`str`	Time period variable. Treated as an ordinal sequence: the sorted distinct values define consecutive periods, so a missing period (gap) is recognised, but non-integer / irregularly-spaced codes are collapsed to their rank order.	`'time'`
`lags`	`int`	Number of lags of Y to include (ρ₁ Y_{t-1} + ... + ρ_p Y_{t-p}).	`1`
`gmm_lags`	`tuple(min, max)`	Range of lags of Y (in levels) used as GMM instruments. `min` must be ≥ 2 (deeper lags are orthogonal to the differenced error). `max=None` uses all available deeper lags, matching Stata's `xtabond` default. Setting `max` caps the instrument count (Stata's `maxldep()` / collapse-style trimming).	`(2, None)`
`method`	`str`	`'difference'` — Arellano-Bond (first-differenced GMM). This is the validated path (machine-precision parity with Stata's `xtabond`). `'system'` (Blundell-Bond) currently raises `NotImplementedError`: proper system GMM requires a stacked level equation and its own Stata parity reference, which is planned for a future release.	`'difference'`
`twostep`	`bool`	Use two-step GMM with the efficient weight matrix. When `robust=True` the Windmeijer (2005) finite-sample correction is applied to the two-step standard errors; with `robust=False` the conventional (downward-biased) two-step SEs are returned and a warning is issued.	`False`
`robust`	`bool`	Heteroskedasticity-robust standard errors (Windmeijer-corrected for two-step). When `False`, the classical one/two-step VCE.	`True`
`alpha`	`float`	Significance level.	`0.05`

Returns:

Type Description

CausalResult

estimate / se are the lagged-Y (ρ₁) coefficient. detail carries the per-coefficient table (lagged Y first, then exogenous regressors). model_info holds n_obs (number of first-differenced observations entering the GMM, not the raw panel rows), the AR(1)/AR(2) Arellano-Bond test statistics, the Sargan test (one-step, valid under homoskedasticity), and — for two-step — the heteroskedasticity-robust Hansen J statistic.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> rows = []
>>> for firm in range(40):  # build a dynamic panel
...     alpha = rng.normal(0, 1)
...     y_prev = rng.normal(0, 1)
...     for t in range(8):
...         capital = rng.normal(0, 1)
...         labor = rng.normal(0, 1)
...         y = (0.5 * y_prev + 0.3 * capital + 0.2 * labor
...              + alpha + rng.normal(0, 1))
...         rows.append({'firm': firm, 'year': 2000 + t, 'output': y,
...                      'capital': capital, 'labor': labor})
...         y_prev = y
>>> df = pd.DataFrame(rows)
>>> # Arellano-Bond (difference GMM)
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
...                     id='firm', time='year')
>>> print(result.summary())

>>> # Two-step with Windmeijer-corrected SEs
>>> result = sp.xtabond(df, y='output', x=['capital', 'labor'],
...                     id='firm', time='year', twostep=True)

Notes

Arellano-Bond (1991): First-differences the equation to remove fixed effects α_i, then uses lagged levels Y_{i,t-2}, Y_{i,t-3}, ... as a block-diagonal set of GMM instruments for ΔY_{i,t-1}.

No constant / time trend is included (unlike Stata's default xtabond, which adds a _cons via a level moment). This matches Stata's xtabond ..., noconstant; the reported ρ / β coefficients are identical to Stata's _cons run when the series has no drift.

Balanced vs gapped panels. All standard errors, the Sargan/Hansen tests, and the one-step AR(1)/AR(2) tests are validated to machine precision against Stata for balanced (and ragged-but-gap-free) panels. When a unit is missing an interior period, the estimator stays consistent but its finite-sample numbers can differ from Stata's xtabond by ~1% (Stata, xtabond2, and R's plm each use a slightly different gap-weighting convention); a warning is emitted in that case.

Key diagnostics: - AR(1) test: Should reject (expected in first differences). - AR(2) test: Should NOT reject (validates instrument exogeneity). - Sargan / Hansen test: Should NOT reject (overidentification). Sargan (one-step) is not robust to heteroskedasticity; prefer the two-step Hansen J when that is a concern.

See Roodman (2009, Stata Journal) for practical guidance.

References

Arellano, M. and Bond, S. (1991). Some tests of specification for panel data: Monte Carlo evidence and an application to employment equations. Review of Economic Studies. [@arellano1991some] Roodman, D. (2009). How to do xtabond2: An introduction to difference and system GMM in Stata. Stata Journal. [@roodman2009xtabond]

gmm ¶

gmm(moment_fn: Callable[[ndarray, Optional[DataFrame]], Any], theta0: ndarray, data: Optional[DataFrame] = None, W: Optional[ndarray] = None, method: str = 'twostep', se: str = 'robust', maxiter: int = 200, tol: float = 1e-08, param_names: Optional[List[str]] = None, alpha: float = 0.05) -> EconometricResults

General GMM estimator for arbitrary moment conditions.

Minimizes Q(θ) = ḡ(θ)' W ḡ(θ) where ḡ(θ) = (1/n) Σ g_i(θ).

Parameters:

Name	Type	Description	Default
`moment_fn`	`callable`	Function g(theta, data) -> np.ndarray of shape (n, q) returning moment conditions for each observation.	required
`theta0`	`ndarray`	Initial parameter vector.	required
`data`	`DataFrame`	Data passed to moment_fn.	`None`
`W`	`ndarray`	Weighting matrix (q x q). If None, uses identity (one-step) then optimal (two-step).	`None`
`method`	`str`	'onestep', 'twostep', 'iterative', 'cue' (continuously updated).	`'twostep'`
`se`	`str`	'robust' (HAC-type), 'unadjusted'.	`'robust'`
`maxiter`	`int`		`200`
`tol`	`float`		`1e-8`
`param_names`	`list of str`	Names for parameters.	`None`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`EconometricResults`

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 400
>>> z1, z2, u = rng.normal(size=n), rng.normal(size=n), rng.normal(size=n)
>>> x1 = 0.7 * z1 + 0.5 * z2 + u + rng.normal(size=n)
>>> y = 1.0 + 2.0 * x1 + u + rng.normal(size=n)
>>> df = pd.DataFrame({'y': y, 'x1': x1, 'z1': z1, 'z2': z2})
>>>
>>> # IV-GMM example
>>> def moment_fn(theta, data):
...     y, X, Z = data['y'].values, data[['x1']].values, data[['z1', 'z2']].values
...     X_full = np.column_stack([np.ones(len(y)), X])
...     resid = y - X_full @ theta
...     Z_full = np.column_stack([np.ones(len(y)), Z])
...     return resid[:, np.newaxis] * Z_full  # n x q moment conditions
>>>
>>> result = sp.gmm(moment_fn, theta0=np.zeros(2), data=df,
...                 param_names=['_cons', 'x1'])
>>> bool(result is not None)
True

statspai.gmm¶

gmm ¶

xtabond ¶

gmm ¶

`statspai.gmm`¶