`statspai.spatial`¶

spatial ¶

Spatial econometrics — StatsPAI's answer to R's spatialreg + spdep + mgwr and Python's PySAL.

Three layers, flat-imported for convenience:

Weights

W, queen_weights, rook_weights, knn_weights, distance_band, kernel_weights, block_weights.

Exploratory spatial data analysis (ESDA)

moran, moran_local, geary, getis_ord_g, getis_ord_local, join_counts, moran_plot, lisa_cluster_map.

Regression

ML estimators: sar, sem, sdm, slx, sac
Diagnostics: lm_tests, moran_residuals
Effects: impacts (LeSage-Pace direct / indirect / total)

All regression estimators accept a :class:W object, a scipy.sparse matrix, or an (n, n) ndarray for the weights argument.

Examples:

>>> import statspai as sp
>>> w = sp.knn_weights(coords, k=6); w.transform = "R"
>>> moran = sp.moran(df["y"], w)
>>> result = sp.sar(w, data=df, formula='y ~ x1 + x2')
>>> eff = sp.impacts(result)
>>> lms = sp.lm_tests("y ~ x1 + x2", df, w)

SpatialDiDResult `dataclass` ¶

Bases: ResultProtocolMixin

Result object for :func:spatial_did.

The object intentionally exposes both a spatial-DiD-specific surface (direct_effect, spillover_effect, total_effect) and a broom/modelsummary-compatible surface (tidy(), glance(), params, std_errors).

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n_units, n_time = 12, 6
>>> W = np.zeros((n_units, n_units))  # ring spatial weights over units
>>> for i in range(n_units):
...     W[i, (i + 1) % n_units] = 1
...     W[i, (i - 1) % n_units] = 1
>>> rows = []
>>> for u in range(n_units):
...     fe = rng.normal()
...     for t in range(n_time):
...         d = 1 if (u < 6 and t >= 3) else 0  # half treated from t>=3
...         y = 1.0 + fe + 0.3 * t + 1.5 * d + rng.normal(scale=0.3)
...         rows.append({"unit": u, "time": t, "treat": d, "y": y})
>>> df = pd.DataFrame(rows)
>>> res = sp.spatial_did(df, y="y", treat="treat", unit="unit",
...                     time="time", W=W)
>>> isinstance(res, sp.SpatialDiDResult)
True
>>> res.n_obs
72

plot ¶

plot(kind: str = 'coef', ax: Any = None, figsize: tuple[int, int] = (8, 5), **kwargs: Any) -> Any

Plot coefficient, exposure, or spatial event-study diagnostics.

SpatialIVResult `dataclass` ¶

Bases: ResultProtocolMixin

Container for :func:spatial_iv Kelejian-Prucha S2SLS estimates.

Attributes:

Name	Type	Description
`rho`	`float`	Spatial autoregressive coefficient on `WY` (`nan` when `include_WY=False`).
`rho_se`	`float`	Standard error of `rho`.
`coefficients`	`DataFrame`	Full coefficient table with `variable`, `coef` and `se`.
`n_obs`	`int`	Number of observations after dropping missing rows.

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 60
>>> W = np.zeros((n, n))  # ring network, row-normalised
>>> for i in range(n):
...     W[i, (i + 1) % n] = 1
...     W[i, (i - 1) % n] = 1
>>> W = W / W.sum(1, keepdims=True)
>>> x = rng.normal(size=n)
>>> z = rng.normal(size=n)  # excluded instrument
>>> d = 0.5 * x + 0.8 * z + rng.normal(size=n)  # endogenous regressor
>>> y = 1.0 + 1.5 * d + 0.3 * x + rng.normal(size=n)
>>> df = pd.DataFrame({"y": y, "d": d, "x": x, "z": z})
>>> res = sp.spatial_iv(df, y="y", endog=["d"], exog=["x"], W=W,
...                     instruments=["z"])
>>> isinstance(res, sp.SpatialIVResult)
True
>>> res.n_obs
60

SpatialModel ¶

Unified spatial regression estimator.

Uses concentrated ML: profile out β and σ², optimise over ρ (or λ) in one dimension using eigenvalues of W.

The engine behind :func:sp.sar / :func:sp.sem / :func:sp.sdm. Construct it with a dense (n, n) weights matrix W, the data and a formula, then call :meth:fit to obtain an EconometricResults carrying the spatial parameter (rho for SAR / SDM, lambda for SEM).

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 50
>>> W = np.zeros((n, n))            # nearest-neighbour chain weights
>>> for i in range(n - 1):
...     W[i, i + 1] = W[i + 1, i] = 1.0
>>> df = pd.DataFrame({"x": rng.normal(size=n)})
>>> df["y"] = 1.0 + 0.5 * df["x"] + rng.normal(size=n)
>>> model = sp.SpatialModel(W, df, "y ~ x", model_type="sar")
>>> res = model.fit()
>>> type(res).__name__
'EconometricResults'
>>> model.model_type
'sar'

W ¶

Spatial weights matrix with CSR-sparse backing.

Parameters:

Name	Type	Description	Default
`neighbors`	`dict[int, list[int]]`	Mapping observation id -> list of neighbour ids.	required
`weights`	`dict[int, list[float]]`	Matching weight values. If `None`, binary (1.0) weights are used.	`None`
`id_order`	`sequence`	Explicit ordering of observation ids. Defaults to `sorted(neighbors)`.	`None`

Examples:

>>> import statspai as sp
>>> # A 4-node chain graph 0-1-2-3
>>> neighbors = {0: [1], 1: [0, 2], 2: [1, 3], 3: [2]}
>>> w = sp.W(neighbors)
>>> w.n
4
>>> w.islands
[]
>>> w.transform = "R"          # row-standardize the weights
>>> float(w.full()[1].sum())   # each row now sums to 1
1.0

spatial_did ¶

spatial_did(data: DataFrame, y: str, treat: str, unit: str, time: str, W: Any, covariates: Optional[Sequence[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, *, se_type: str = 'cluster', coords: Optional[Tuple[str, str]] = None, lat: Optional[str] = None, lon: Optional[str] = None, distance_matrix: Optional[ndarray] = None, conley_cutoff: Optional[float] = None, conley_kernel: str = 'bartlett', unit_order: Optional[Sequence[Any]] = None, normalize_W: bool = True, event_study: bool = False, event_window: Optional[Tuple[int, int]] = None, event_base: int = -1) -> SpatialDiDResult

Spatial DiD with direct and spillover treatment effects.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long panel with one row per unit-period.	required
`y`	`str`	Outcome, treatment, unit-id, and time-id columns. `treat` may be binary or a continuous exposure intensity; the spatial lag uses the same scale.	required
`treat`	`str`	Outcome, treatment, unit-id, and time-id columns. `treat` may be binary or a continuous exposure intensity; the spatial lag uses the same scale.	required
`unit`	`str`	Outcome, treatment, unit-id, and time-id columns. `treat` may be binary or a continuous exposure intensity; the spatial lag uses the same scale.	required
`time`	`str`	Outcome, treatment, unit-id, and time-id columns. `treat` may be binary or a continuous exposure intensity; the spatial lag uses the same scale.	required
`W`	ndarray, sparse matrix, or :class:`statspai.spatial.W`	Spatial weights matrix over units. If a StatsPAI `W` object carries an `id_order`, that order is respected.	required
`covariates`	`sequence of str`	Additional controls included after absorbing unit and time effects.	`None`
`cluster`	`str`	Cluster variable for `se_type='cluster'`. Defaults to `unit`.	`None`
`alpha`	`float`	Significance level for confidence intervals.	`0.05`
`se_type`	`('cluster', 'robust', 'conley')`	Covariance estimator. `conley` uses spatial-HAC dependence within each period and requires `conley_cutoff` plus coordinates or a unit-level distance matrix.	`"cluster"`
`coords`	`tuple(str, str)`	Convenience pair `(lat, lon)` in degrees.	`None`
`lat`	`str`	Latitude/longitude columns in degrees.	`None`
`lon`	`str`	Latitude/longitude columns in degrees.	`None`
`distance_matrix`	`ndarray`	Unit-by-unit distances in the same units as `conley_cutoff`.	`None`
`conley_cutoff`	`float`	Spatial cutoff for Conley-HAC standard errors.	`None`
`conley_kernel`	`('uniform', 'bartlett')`		`"uniform"`
`unit_order`	`sequence`	Explicit row/column order of `W`.	`None`
`normalize_W`	`bool`	Row-normalise weights before constructing `WD`.	`True`
`event_study`	`bool`	Estimate direct and spillover event-study paths by relative time.	`False`
`event_window`	`tuple(int, int)`	Inclusive event-time window. Defaults to observed support capped at [-5, 5] when `event_study=True`.	`None`
`event_base`	`int`	Omitted relative-time period.	`-1`

Returns:

Type	Description
`SpatialDiDResult`	Direct, spillover, total effects with diagnostics, plotting, and export helpers.

References

delgado2015difference

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n_units, n_time = 12, 6
>>> W = np.zeros((n_units, n_units))  # ring spatial weights over units
>>> for i in range(n_units):
...     W[i, (i + 1) % n_units] = 1
...     W[i, (i - 1) % n_units] = 1
>>> rows = []
>>> for u in range(n_units):
...     fe = rng.normal()
...     for t in range(n_time):
...         d = 1 if (u < 6 and t >= 3) else 0  # half treated from t>=3
...         y = 1.0 + fe + 0.3 * t + 1.5 * d + rng.normal(scale=0.3)
...         rows.append({"unit": u, "time": t, "treat": d, "y": y})
>>> df = pd.DataFrame(rows)
>>> res = sp.spatial_did(df, y="y", treat="treat", unit="unit",
...                     time="time", W=W)
>>> res.coefficients["variable"].tolist()
['treat', 'W_treat']
>>> res.n_obs
72

getis_ord_g ¶

getis_ord_g(y: Any, w: W, permutations: int = 999, seed: Optional[int] = None) -> SpatialStatistic

Global Getis-Ord G — overall concentration of high (or low) values.

The global G statistic summarises whether high values of y are globally concentrated among neighbouring units. It requires non-negative y. Larger-than-expected G indicates clustering of high values; smaller indicates clustering of low values.

Parameters:

Name	Type	Description	Default
`y`	`array_like`	Non-negative variable of interest, length `n`.	required
`w`	`W`	Spatial weights object defining the neighbourhood structure.	required
`permutations`	`int`	Number of random permutations for the pseudo p-value, empirical variance and z-score. `0` skips the permutation test.	`999`
`seed`	`int`	Seed for the permutation RNG for reproducibility.	`None`

Returns:

Type	Description
`SpatialStatistic`	Result with `value` (G), `expectation` (`S0 / (n(n-1))`), `variance`, `z_score`, `p_norm` and `p_sim` / `simulations`.

Raises:

Type	Description
`ValueError`	If any element of `y` is negative, or `y` is degenerate.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((50, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> y = np.random.default_rng(4).random(50)  # non-negative
>>> res = sp.getis_ord_g(y, w, seed=0)
>>> res.value > 0
True

See Also

getis_ord_local : Local Gi / Gi* hotspot statistics.

References

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]

getis_ord_local ¶

getis_ord_local(y: Any, w: W, star: bool = True, permutations: int = 999, seed: Optional[int] = None) -> Dict[str, ndarray]

Local Getis-Ord Gi / Gi* — hotspot and coldspot detection.

Computes a standardised local statistic for each unit measuring whether it is surrounded by high values (a hotspot, large positive z) or low values (a coldspot, large negative z). With star=True the focal unit is included in its own neighbourhood (Gi*); with star=False it is excluded (Gi).

Parameters:

Name	Type	Description	Default
`y`	`array_like`	Variable of interest, length `n`.	required
`w`	`W`	Spatial weights object; densified internally to an `n x n` matrix.	required
`star`	`bool`	If True compute Gi* (self-included); otherwise Gi (self-excluded).	`True`
`permutations`	`int`	Accepted for API symmetry; the current implementation returns the analytic standardised statistic.	`999`
`seed`	`int`	Seed placeholder for API symmetry.	`None`

Returns:

Type	Description
`dict`	`{"Gs": ndarray, "z": ndarray}` — the `n` local statistics, which are already standardised (so `Gs` and `z` coincide).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(5).random(40)
>>> out = sp.getis_ord_local(y, w, star=True)
>>> out["Gs"].shape
(40,)

See Also

getis_ord_g : Global G statistic. moran_local : LISA, a cross-product local statistic.

References

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]

moran_local ¶

moran_local(y: Any, w: W, permutations: int = 999, seed: Optional[int] = None) -> Dict[str, Any]

Local Moran's I (LISA) — per-location spatial association.

Decomposes global Moran's I into a contribution I_i for each spatial unit, identifying local clusters (high-high / low-low) and spatial outliers (high-low / low-high). Each I_i gets a conditional permutation pseudo p-value.

Parameters:

Name	Type	Description	Default
`y`	`array_like`	Variable of interest, length `n`.	required
`w`	`W`	Spatial weights object; `w.sparse` is the `n x n` weights matrix.	required
`permutations`	`int`	Number of conditional permutations for the per-location pseudo p-values. `0` skips the permutation test.	`999`
`seed`	`int`	Seed for the permutation RNG for reproducibility.	`None`

Returns:

Type	Description
`dict`	`{"Is": ndarray, "p_sim": ndarray \| None, "simulations": ndarray \| None}` where `Is` holds the `n` local statistics and `p_sim` the matching pseudo p-values.

Raises:

Type	Description
`ValueError`	If `y` has zero variance.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(2).standard_normal(40)
>>> out = sp.moran_local(y, w, permutations=499, seed=0)
>>> out["Is"].shape
(40,)

See Also

moran : Global Moran's I.

References

Anselin, L. (1995). Local indicators of spatial association---LISA. Geographical Analysis, 27(2), 93-115. [@anselin1995local]

lisa_cluster_map ¶

lisa_cluster_map(y: Any, w: W, gdf: Any, ax: Optional[Any] = None, p_threshold: float = 0.05) -> Any

Classify each observation HH/LL/HL/LH/Not significant and colour on a GDF.

Requires geopandas (install with pip install geopandas); the gdf must align row-for-row with y and w.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> coords = rng.uniform(size=(60, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> y = coords[:, 0] + rng.normal(scale=0.3, size=60)
>>> import geopandas as gpd
>>> from shapely.geometry import Point
>>> pts = [Point(xy) for xy in coords]
>>> gdf = gpd.GeoDataFrame(geometry=pts)
>>> ax = sp.lisa_cluster_map(y, w, gdf)
>>> ax.figure.savefig("lisa.png")

moran_plot ¶

moran_plot(y: Any, w: W, ax: Optional[Any] = None) -> Any

Moran scatter: z vs spatial lag Wz. Slope of OLS line = Moran's I.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> coords = rng.uniform(size=(60, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> y = coords[:, 0] + rng.normal(scale=0.3, size=60)
>>> ax = sp.moran_plot(y, w)
>>> ax.figure.savefig("moran.png")

gwr_bandwidth ¶

gwr_bandwidth(coords: Any, y: Any, X: Any, kernel: KernelName = 'bisquare', fixed: bool = False, criterion: Criterion = 'AICc', bw_min: Optional[float] = None, bw_max: Optional[float] = None, add_constant: bool = True, tol: float = 0.001) -> float

Select GWR bandwidth via golden-section search.

Parameters:

Name	Type	Description	Default
`criterion`	`('AICc', 'AIC', 'BIC', 'CV')`	Default AICc matches mgwr.	`"AICc"`
`fixed`	`bool`	If False (default), the bandwidth is a nearest-neighbour count (integer-valued; rounded at each candidate).	`False`
`bw_min`	`float`	Override the default search bounds.	`None`
`bw_max`	`float`	Override the default search bounds.	`None`

Examples:

>>> import numpy as np
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> coords = rng.uniform(0, 10, size=(n, 2))
>>> x = rng.normal(size=n)
>>> beta = 0.5 + 0.1 * coords[:, 1]  # spatially varying slope
>>> y = 1.0 + beta * x + rng.normal(0, 0.3, n)
>>> bw = sp.gwr_bandwidth(coords, y, x.reshape(-1, 1), criterion="AICc")
>>> bool(bw > 0)
True
>>> res = sp.gwr(coords, y, x.reshape(-1, 1), bw=bw, fixed=False)
>>> type(res).__name__
'GWRResult'

spatial_iv ¶

spatial_iv(data: DataFrame, y: str, endog: Sequence[str], exog: Sequence[str], W: Any, instruments: Optional[Sequence[str]] = None, include_WY: bool = True, alpha: float = 0.05) -> SpatialIVResult

Spatial 2SLS estimator.

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`		required
`y`	`str`		required
`endog`	`sequence of str`	Names of additional endogenous regressors.	required
`exog`	`sequence of str`	Names of exogenous regressors (included instruments).	required
`W`	ndarray or :class:`spatial.W`	Spatial weights matrix (n_obs x n_obs). Should be row-normalised.	required
`instruments`	`sequence of str`	Extra excluded instruments beyond the spatial lags of X.	`None`
`include_WY`	`bool`	If True, include WY as a regressor (spatial autoregressive coefficient ρ).	`True`
`alpha`	`float`		`0.05`

Returns:

Type	Description
`SpatialIVResult`

References

kelejian1998generalized

Examples:

>>> import statspai as sp
>>> import numpy as np, pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 60
>>> W = np.zeros((n, n))  # ring network, row-normalised
>>> for i in range(n):
...     W[i, (i + 1) % n] = 1
...     W[i, (i - 1) % n] = 1
>>> W = W / W.sum(1, keepdims=True)
>>> x = rng.normal(size=n)
>>> z = rng.normal(size=n)  # excluded instrument for the endogenous d
>>> d = 0.5 * x + 0.8 * z + rng.normal(size=n)
>>> y = 1.0 + 1.5 * d + 0.3 * x + rng.normal(size=n)
>>> df = pd.DataFrame({"y": y, "d": d, "x": x, "z": z})
>>> res = sp.spatial_iv(df, y="y", endog=["d"], exog=["x"], W=W,
...                     instruments=["z"])
>>> res.coefficients["variable"].tolist()
['d', 'rho (WY)', '(Intercept)', 'x']
>>> res.n_obs
60

sar ¶

sar(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Autoregressive (Lag) Model: Y = ρWY + Xβ + ε.

Parameters:

Name	Type	Default
`W`	`ndarray, scipy.sparse matrix, or statspai.spatial.weights.W`	required
`data`	`see legacy implementation.`	required
`formula`	`see legacy implementation.`	required
`row_normalize`	`see legacy implementation.`	required
`alpha`	`see legacy implementation.`	required

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 30
>>> W = np.zeros((n, n))  # ring contiguity: each unit linked to neighbours
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sar(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'rho']

References

anselin1988spatial

sdm ¶

sdm(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Durbin Model: Y = ρWY + Xβ + WXθ + ε.

Reports LeSage & Pace direct / indirect / total effects in the diagnostics when n is small enough for the dense (I - ρW)^{-1}.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 30
>>> W = np.zeros((n, n))
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sdm(W, data=df, formula="y ~ x")
>>> list(res.params.index)  # own + spatially-lagged covariate + rho
['const', 'x', 'W_x', 'rho']

References

lesage2009introduction

sem ¶

sem(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Error Model: Y = Xβ + u, u = λWu + ε.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 30
>>> W = np.zeros((n, n))
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sem(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'lambda']

References

anselin1988spatial

lm_tests ¶

lm_tests(formula: str, data: DataFrame, W: Any, row_normalize: bool = True) -> Dict[str, Tuple[float, float]]

Anselin (1988) Lagrange multiplier battery.

Parameters:

Name	Type	Description	Default
`formula`	`str`	`"y ~ x1 + x2"` style formula (constant added automatically).	required
`data`	`DataFrame`		required
`W`	ndarray, scipy.sparse, or ``statspai.spatial.weights.W``		required
`row_normalize`	`bool`		`True`

Returns:

Type	Description
`dict`	`{name: (statistic, p_value)}` for the five tests.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> import pandas as pd
>>> rng = np.random.default_rng(0)
>>> n = 80
>>> coords = rng.uniform(size=(n, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> x1 = rng.normal(size=n)
>>> y = 1.0 + 0.5 * x1 + rng.normal(scale=0.5, size=n)
>>> df = pd.DataFrame({"y": y, "x1": x1})
>>> out = sp.lm_tests("y ~ x1", df, w)
>>> sorted(out.keys())
['LM_err', 'LM_lag', 'Robust_LM_err', 'Robust_LM_lag', 'SARMA']

moran_residuals ¶

moran_residuals(residuals: ndarray, W: Any, row_normalize: bool = True) -> Tuple[float, float]

Moran's I applied to regression residuals (quick LM-err companion).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> rng = np.random.default_rng(0)
>>> coords = rng.uniform(size=(80, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> resid = rng.normal(size=80)
>>> I, p = sp.moran_residuals(resid, w)
>>> bool(np.isfinite(I) and np.isfinite(p))
True

sar_gmm ¶

sar_gmm(W: Any, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None, w_lags: int = 1) -> EconometricResults

Kelejian-Prucha (1998) 2SLS for SAR with spatial-lag instruments.

Instruments: [X, W X, …, W^w_lags X] (dropping constant duplicates). w_lags=1 (default) matches spreg.GM_Lag default. Endogenous regressor: W Y.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 40
>>> W = np.zeros((n, n))  # ring contiguity: each unit linked to neighbours
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sar_gmm(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'rho']

References

kelejian1998generalized

sarar_gmm ¶

sarar_gmm(W: Any, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults

Combined GMM: Kelejian-Prucha SAR 2SLS then SEM GMM on residuals.

Equivalent to spreg.GM_Combo (or GM_Combo_Het with robust='het').

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 40
>>> W = np.zeros((n, n))  # ring contiguity: each unit linked to neighbours
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sarar_gmm(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'rho', 'lambda']

References

kelejian1998generalized

sem_gmm ¶

sem_gmm(W: Any, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults

Kelejian-Prucha (1999) GMM for the spatial-error parameter λ.

Stage 1 — OLS on y = Xβ + u ⇒ residuals u. Stage 2 — minimise sum of squared KP moment residuals over (λ, σ²). Stage 3 — GLS with (I - λ̂ W) → final β̂.

Parameters:

Name	Type	Description	Default
`robust`	`(None, 'het')`	When `"het"`, the final β covariance uses the heteroscedasticity- robust sandwich (Arraiz et al. 2010 / `spreg.GM_Error_Het`).	`None`

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 40
>>> W = np.zeros((n, n))  # ring contiguity: each unit linked to neighbours
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sem_gmm(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'lambda']

impacts ¶

impacts(result: Any, n_sim: int = 1000, seed: Optional[int] = None) -> DataFrame

Compute direct / indirect / total impacts + simulated SEs.

Parameters:

Name	Type	Description	Default
`result`	`EconometricResults`	Output of `sp.sar`, `sp.sdm`, or `sp.sac`.	required
`n_sim`	`int`	Number of Monte-Carlo draws for simulated SEs.	`1000`
`seed`	`int`	RNG seed.	`None`

Returns:

Type	Description
`DataFrame with one row per non-constant regressor and columns`
``Direct``, ``SE_Direct``, ``Indirect``, ``SE_Indirect``, ``Total``,
``SE_Total``.

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(7)
>>> n = 120
>>> coords = rng.uniform(size=(n, 2))
>>> w = sp.knn_weights(coords, k=6)
>>> w.transform = "R"
>>> W = w.sparse.toarray()
>>> x = rng.standard_normal(n)
>>> y = np.linalg.solve(np.eye(n) - 0.5 * W,
...                     1 + 2 * x + rng.standard_normal(n))
>>> res = sp.sar(w, pd.DataFrame({"y": y, "x": x}), "y ~ x")
>>> imp = sp.impacts(res, n_sim=200, seed=0)
>>> list(imp.columns)
['Direct', 'SE_Direct', 'Indirect', 'SE_Indirect', 'Total', 'SE_Total']
>>> list(imp.index)
['x']

sac ¶

sac(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

SAC / SARAR: combined spatial lag + spatial error model.

Jointly estimates the autoregressive coefficient ρ (on the dependent variable) and the spatial-error coefficient λ by concentrated maximum likelihood, profiling out (β, σ²) at each (ρ, λ) candidate.

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 30
>>> W = np.zeros((n, n))
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.sac(W, data=df, formula="y ~ x")
>>> list(res.params.index)  # both spatial coefficients estimated jointly
['const', 'x', 'rho', 'lambda']

References

anselin1988spatial

slx ¶

slx(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Lag of X (SLX) model: Y = Xβ + WXθ + ε.

No autoregressive term. Estimated by ordinary least squares on the design matrix augmented with spatially-lagged covariates (skip the constant).

Examples:

>>> import numpy as np, pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> n = 30
>>> W = np.zeros((n, n))
>>> for i in range(n):
...     W[i, (i - 1) % n] = 1.0
...     W[i, (i + 1) % n] = 1.0
>>> x = rng.normal(size=n)
>>> df = pd.DataFrame({"y": 1.0 + 0.8 * x + rng.normal(scale=0.5, size=n),
...                    "x": x})
>>> res = sp.slx(W, data=df, formula="y ~ x")
>>> list(res.params.index)
['const', 'x', 'W_x']

References

anselin1988spatial

spatial_panel ¶

spatial_panel(data: DataFrame, formula: str, entity: str, time: str, W: Any, model: ModelKind = 'sar', effects: EffectKind = 'fe', row_normalize: bool = True) -> SpatialPanelResult

Fit a spatial panel model by concentrated ML (Elhorst 2014).

Parameters:

Name	Type	Description	Default
`data`	`DataFrame`	Long-format panel; must be balanced.	required
`formula`	`str`	`"y ~ x1 + x2"`. Constant is dropped (absorbed by the entity FE).	required
`entity`	`str`	Column names identifying entity and time dimensions.	required
`time`	`str`	Column names identifying entity and time dimensions.	required
`W`	sparse matrix, ndarray, or :class:`statspai.spatial.weights.W`	N × N spatial weights matrix aligned with `sorted(data[entity].unique())`.	required
`model`	`('sar', 'sem', 'sdm')`		`"sar"`
`effects`	`('fe', 'twoways')`	`"fe"` = entity FE only. `"twoways"` = entity + time demeaning.	`"fe"`

Examples:

>>> import numpy as np
>>> import pandas as pd
>>> import statspai as sp
>>> rng = np.random.default_rng(0)
>>> N, T = 8, 6
>>> W = np.zeros((N, N))  # ring contiguity weights
>>> for i in range(N):
...     W[i, (i - 1) % N] = 1.0
...     W[i, (i + 1) % N] = 1.0
>>> rows = []
>>> for i in range(N):
...     fe = rng.normal()
...     for t in range(T):
...         x = rng.normal()
...         y = fe + 0.5 * x + rng.normal(0, 0.3)
...         rows.append({"id": i, "year": t, "y": y, "x": x})
>>> df = pd.DataFrame(rows)
>>> res = sp.spatial_panel(
...     df, "y ~ x", entity="id", time="year", W=W, model="sar",
... )
>>> res.spatial_param
'rho'
>>> (res.N, res.T)
(8, 6)
>>> bool(np.isfinite(res.params["x"]))
True

distance_to_feature ¶

distance_to_feature(points_gdf: Any, target_gdf: Any, unit: str = 'km', allow_geographic: bool = False) -> Series

Distance from each point to the nearest target feature.

The canonical use is "distance from each county seat to the canal", the continuous treatment-intensity variable in a historical-GIS design.

Parameters:

Name	Type	Description	Default
`points_gdf`	`GeoDataFrame`	Point features.	required
`target_gdf`	`GeoDataFrame`	Target layer of any geometry type; the nearest distance to the union of its features is returned. Points inside a target polygon get `0`.	required
`unit`	`('km', 'm')`	Output unit.	`'km'`
`allow_geographic`	`bool`	Opt out of the projected-CRS requirement.	`False`

Returns:

Type	Description
`Series`	Named `distance_<unit>`, indexed like `points_gdf`.

Raises:

Type	Description
`ValueError`	On any of the CRS / geometry problems listed in :func:`line_length_in_polygon`.

Examples:

>>> import statspai as sp
>>> sp.distance_to_feature(county_seats, canal, unit="km")

line_length_in_polygon ¶

line_length_in_polygon(lines_gdf: Any, polygons_gdf: Any, polygon_id: Optional[str] = None, unit: str = 'km', allow_geographic: bool = False) -> DataFrame

Length of linear features falling inside each polygon.

The canonical use is "kilometres of canal (or railway, or road) per county", the exposure variable in a historical-GIS panel.

Parameters:

Name	Type	Description	Default
`lines_gdf`	`GeoDataFrame`	Linear features. Must be `LineString` / `MultiLineString`.	required
`polygons_gdf`	`GeoDataFrame`	Areal units. Must be `Polygon` / `MultiPolygon`.	required
`polygon_id`	`str`	Column in `polygons_gdf` identifying each unit. Defaults to the index of `polygons_gdf`.	`None`
`unit`	`('km', 'm')`	Output unit.	`'km'`
`allow_geographic`	`bool`	Opt out of the projected-CRS requirement. Only set this if you have verified the CRS units are metres.	`False`

Returns:

Type	Description
`DataFrame`	One row per polygon with the id column and `line_length_<unit>`. Polygons with no intersecting line get `0.0`, not `NaN`.

Raises:

Type	Description
`ValueError`	If either frame has no CRS, a geographic CRS (and `allow_geographic` is False), a non-metre projected CRS, mismatched CRS, or missing / empty / invalid geometries.

Examples:

>>> import statspai as sp
>>> canal = canal.to_crs("EPSG:4479")
>>> counties = counties.to_crs("EPSG:4479")
>>> sp.line_length_in_polygon(canal, counties, polygon_id="county_id")
...

share_within_buffer ¶

share_within_buffer(points_gdf: Any, lines_gdf: Any, buffer_km: float = 10.0, group_col: Optional[str] = None, allow_geographic: bool = False) -> Any

Share of point features lying within a buffer of the linear features.

The canonical use is "share of market towns within 10 km of the canal", either overall or per county.

Parameters:

Name	Type	Description	Default
`points_gdf`	`GeoDataFrame`	Point features (e.g. market towns).	required
`lines_gdf`	`GeoDataFrame`	Linear features to buffer (e.g. the canal).	required
`buffer_km`	`float`	Buffer radius in kilometres. Must be strictly positive.	`10.0`
`group_col`	`str`	Column in `points_gdf` to group by (e.g. `'county_id'`). If omitted the share is computed over all points.	`None`
`allow_geographic`	`bool`	Opt out of the projected-CRS requirement.	`False`

Returns:

Type	Description
`float or DataFrame`	Without `group_col`, the overall share in `[0, 1]`. With it, a DataFrame with `group_col`, `n_points`, `n_within`, `share`.

Raises:

Type	Description
`ValueError`	On non-positive `buffer_km`, or any of the CRS / geometry problems listed in :func:`line_length_in_polygon`.

Examples:

>>> import statspai as sp
>>> sp.share_within_buffer(towns, canal, buffer_km=10, group_col="county_id")
...

block_weights ¶

block_weights(regimes: Sequence[Any]) -> W

Block (regime) spatial weights — full connectivity within each group.

Builds a weights object in which every unit is a neighbour of every other unit sharing the same regime label, and of no unit outside it. Useful for encoding membership in a region, cluster, or treatment block as a spatial structure.

Parameters:

Name	Type	Description	Default
`regimes`	`array_like`	Length-`n` array of group labels (any hashable dtype). Units with equal labels become mutual neighbours.	required

Returns:

Type	Description
`W`	Spatial weights object with block-diagonal connectivity.

Examples:

>>> import statspai as sp
>>> w = sp.block_weights([0, 0, 1, 1, 1])
>>> sorted(w.neighbors[2])
[3, 4]

distance_band ¶

distance_band(coords: ndarray, threshold: float, binary: bool = True) -> W

Distance-band spatial weights — connect units within a fixed radius.

Every unit is connected to all other units within Euclidean distance threshold. Weights are either binary (1 for every neighbour) or inverse-distance (1/d).

Parameters:

Name	Type	Description	Default
`coords`	`ndarray`	`(n, d)` array of coordinates.	required
`threshold`	`float`	Distance band radius. Units within this distance become neighbours. Choose at least the maximum nearest-neighbour distance to avoid islands (units with no neighbours).	required
`binary`	`bool`	If True all neighbour weights are `1.0`; if False they are inverse Euclidean distance `1/d` (coincident points are dropped via an infinite distance).	`True`

Returns:

Type	Description
`W`	Spatial weights object; neighbour counts vary by local density.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.distance_band(coords, threshold=0.3, binary=False)
>>> w.n
30

See Also

knn_weights : Fixed neighbour count instead of fixed radius.

kernel_weights ¶

kernel_weights(coords: ndarray, bandwidth: float, kernel: Literal['gaussian', 'bisquare', 'triangular'] = 'gaussian', fixed: bool = True) -> W

Kernel (distance-decay) spatial weights from point coordinates.

Assigns each neighbour a continuous weight that decays with distance according to a kernel function. Supports a fixed bandwidth (same radius everywhere) or an adaptive bandwidth (the bandwidth-th nearest neighbour distance, varying with local density).

Parameters:

Name	Type	Description	Default
`coords`	`ndarray`	`(n, d)` array of coordinates.	required
`bandwidth`	`float`	With `fixed=True`, the bandwidth distance `h` (for the bisquare and triangular kernels, neighbours beyond `h` get zero weight; the gaussian kernel has infinite support so all units are included). With `fixed=False`, the integer number of nearest neighbours that defines the adaptive bandwidth.	required
`kernel`	`('gaussian', 'bisquare', 'triangular')`	Kernel shape. `gaussian` `exp(-u^2/2)`; `bisquare` `(1-u^2)^2` for `u<1`; `triangular` `1-u` for `u<1`, where `u = d / bandwidth`.	`"gaussian"`
`fixed`	`bool`	Fixed (True) versus adaptive (False) bandwidth.	`True`

Returns:

Type	Description
`W`	Spatial weights object with continuous kernel weights.

Raises:

Type	Description
`ValueError`	If `kernel` is not one of the supported names.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.kernel_weights(coords, bandwidth=0.4, kernel="bisquare")
>>> w.n
30

See Also

distance_band : Binary / inverse-distance fixed-radius weights. knn_weights : Uniform k-nearest-neighbour weights.

knn_weights ¶

knn_weights(coords: ndarray, k: int = 5) -> W

k-nearest-neighbour spatial weights from point coordinates.

Each unit is connected to its k closest units by Euclidean distance (the unit itself is excluded). The resulting graph is generally asymmetric — being someone's nearest neighbour is not reciprocal.

Parameters:

Name	Type	Description	Default
`coords`	`ndarray`	`(n, d)` array of coordinates (typically `d=2`).	required
`k`	`int`	Number of nearest neighbours per unit. Must satisfy `0 < k < n`.	`5`

Returns:

Type	Description
`W`	Spatial weights object with exactly `k` neighbours per unit.

Raises:

Type	Description
`ValueError`	If `k` is not in the open interval `(0, n)`.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> len(w.neighbors[0])
4

See Also

distance_band : Connect all units within a fixed radius. kernel_weights : Continuous distance-decay weights.

queen_weights ¶

queen_weights(gdf: Any) -> W

Queen-contiguity spatial weights from polygon geometries.

Two areal units are queen-contiguous if their geometries share at least one boundary point (an edge or a single vertex). Requires geopandas.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	Layer of polygon geometries (accessed via `gdf.geometry` and the spatial index `gdf.sindex`).	required

Returns:

Type	Description
`W`	Spatial weights object whose neighbour lists encode queen contiguity.

Raises:

Type	Description
`ImportError`	If `geopandas` is not installed.

Examples:

>>> import statspai as sp
>>> import geopandas as gpd
>>> gdf = gpd.read_file("counties.shp")
>>> w = sp.queen_weights(gdf)

See Also

rook_weights : Edge-only contiguity (excludes shared vertices).

rook_weights ¶

rook_weights(gdf: Any) -> W

Rook-contiguity spatial weights from polygon geometries.

Two areal units are rook-contiguous if their geometries share a boundary segment of positive length (a shared edge); units touching only at a single vertex are not neighbours. Requires geopandas.

Parameters:

Name	Type	Description	Default
`gdf`	`GeoDataFrame`	Layer of polygon geometries.	required

Returns:

Type	Description
`W`	Spatial weights object whose neighbour lists encode rook contiguity.

Raises:

Type	Description
`ImportError`	If `geopandas` is not installed.

Examples:

>>> import statspai as sp
>>> import geopandas as gpd
>>> w = sp.rook_weights(gpd.read_file("counties.shp"))

See Also

queen_weights : Contiguity that also counts shared vertices.

statspai.spatial¶

spatial ¶

SpatialDiDResult dataclass ¶

plot ¶

SpatialIVResult dataclass ¶

SpatialModel ¶

W ¶

spatial_did ¶

getis_ord_g ¶

getis_ord_local ¶

moran_local ¶

lisa_cluster_map ¶

moran_plot ¶

gwr_bandwidth ¶

spatial_iv ¶

sar ¶

sdm ¶

sem ¶

lm_tests ¶

moran_residuals ¶

sar_gmm ¶

sarar_gmm ¶

sem_gmm ¶

impacts ¶

sac ¶

slx ¶

spatial_panel ¶

distance_to_feature ¶

line_length_in_polygon ¶

share_within_buffer ¶

block_weights ¶

distance_band ¶

kernel_weights ¶

knn_weights ¶

queen_weights ¶

rook_weights ¶

`statspai.spatial`¶

SpatialDiDResult `dataclass` ¶

SpatialIVResult `dataclass` ¶