Skip to content

statspai.spatial

spatial

Spatial econometrics — StatsPAI's answer to R's spatialreg + spdep + mgwr and Python's PySAL.

Three layers, flat-imported for convenience:

Weights

W, queen_weights, rook_weights, knn_weights, distance_band, kernel_weights, block_weights.

Exploratory spatial data analysis (ESDA)

moran, moran_local, geary, getis_ord_g, getis_ord_local, join_counts, moran_plot, lisa_cluster_map.

Regression
  • ML estimators: sar, sem, sdm, slx, sac
  • Diagnostics: lm_tests, moran_residuals
  • Effects: impacts (LeSage-Pace direct / indirect / total)

All regression estimators accept a :class:W object, a scipy.sparse matrix, or an (n, n) ndarray for the weights argument.

Examples:

>>> import statspai as sp
>>> w = sp.knn_weights(coords, k=6); w.transform = "R"
>>> moran = sp.moran(df["y"], w)
>>> result = sp.sar(w, data=df, formula='y ~ x1 + x2')
>>> eff = sp.impacts(result)
>>> lms = sp.lm_tests("y ~ x1 + x2", df, w)

SpatialModel

Unified spatial regression estimator.

Uses concentrated ML: profile out β and σ², optimise over ρ (or λ) in one dimension using eigenvalues of W.

W

Spatial weights matrix with CSR-sparse backing.

Parameters:

Name Type Description Default
neighbors dict[int, list[int]]

Mapping observation id -> list of neighbour ids.

required
weights dict[int, list[float]]

Matching weight values. If None, binary (1.0) weights are used.

None
id_order sequence

Explicit ordering of observation ids. Defaults to sorted(neighbors).

None

SpatialDiDResult dataclass

Result object for :func:spatial_did.

The object intentionally exposes both a spatial-DiD-specific surface (direct_effect, spillover_effect, total_effect) and a broom/modelsummary-compatible surface (tidy(), glance(), params, std_errors).

plot

plot(kind: str = 'coef', ax=None, figsize=(8, 5), **kwargs)

Plot coefficient, exposure, or spatial event-study diagnostics.

sar

sar(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Autoregressive (Lag) Model: Y = ρWY + Xβ + ε.

Parameters:

Name Type Description Default
W ndarray, scipy.sparse matrix, or statspai.spatial.weights.W
required
data see legacy implementation.
required
formula see legacy implementation.
required
row_normalize see legacy implementation.
required
alpha see legacy implementation.
required

sem

sem(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Error Model: Y = Xβ + u, u = λWu + ε.

sdm

sdm(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Durbin Model: Y = ρWY + Xβ + WXθ + ε.

slx

slx(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

Spatial Lag of X (SLX) model: Y = Xβ + WXθ + ε.

No autoregressive term. Estimated by ordinary least squares on the design matrix augmented with spatially-lagged covariates (skip the constant).

sac

sac(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults

SAC / SARAR: combined spatial lag + spatial error model.

Jointly estimates the autoregressive coefficient ρ (on the dependent variable) and the spatial-error coefficient λ by concentrated maximum likelihood, profiling out (β, σ²) at each (ρ, λ) candidate.

sar_gmm

sar_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None, w_lags: int = 1) -> EconometricResults

Kelejian-Prucha (1998) 2SLS for SAR with spatial-lag instruments.

Instruments: [X, W X, …, W^w_lags X] (dropping constant duplicates). w_lags=1 (default) matches spreg.GM_Lag default. Endogenous regressor: W Y.

sem_gmm

sem_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults

Kelejian-Prucha (1999) GMM for the spatial-error parameter λ.

Stage 1 — OLS on y = Xβ + u ⇒ residuals u. Stage 2 — minimise sum of squared KP moment residuals over (λ, σ²). Stage 3 — GLS with (I - λ̂ W) → final β̂.

Parameters:

Name Type Description Default
robust (None, 'het')

When "het", the final β covariance uses the heteroscedasticity- robust sandwich (Arraiz et al. 2010 / spreg.GM_Error_Het).

None

sarar_gmm

sarar_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults

Combined GMM: Kelejian-Prucha SAR 2SLS then SEM GMM on residuals.

Equivalent to spreg.GM_Combo (or GM_Combo_Het with robust='het').

lm_tests

lm_tests(formula: str, data: DataFrame, W, row_normalize: bool = True) -> Dict[str, Tuple[float, float]]

Anselin (1988) Lagrange multiplier battery.

Parameters:

Name Type Description Default
formula str

"y ~ x1 + x2" style formula (constant added automatically).

required
data DataFrame
required
W ndarray, scipy.sparse, or ``statspai.spatial.weights.W``
required
row_normalize bool
True

Returns:

Type Description
dict

{name: (statistic, p_value)} for the five tests.

moran_residuals

moran_residuals(residuals: ndarray, W, row_normalize: bool = True) -> Tuple[float, float]

Moran's I applied to regression residuals (quick LM-err companion).

impacts

impacts(result, n_sim: int = 1000, seed: Optional[int] = None) -> DataFrame

Compute direct / indirect / total impacts + simulated SEs.

Parameters:

Name Type Description Default
result EconometricResults

Output of sp.sar, sp.sdm, or sp.sac.

required
n_sim int

Number of Monte-Carlo draws for simulated SEs.

1000
seed int

RNG seed.

None

Returns:

Type Description
DataFrame with one row per non-constant regressor and columns
``Direct``, ``SE_Direct``, ``Indirect``, ``SE_Indirect``, ``Total``,
``SE_Total``.

gwr_bandwidth

gwr_bandwidth(coords, y, X, kernel: KernelName = 'bisquare', fixed: bool = False, criterion: Criterion = 'AICc', bw_min: float = None, bw_max: float = None, add_constant: bool = True, tol: float = 0.001) -> float

Select GWR bandwidth via golden-section search.

Parameters:

Name Type Description Default
criterion ('AICc', 'AIC', 'BIC', 'CV')

Default AICc matches mgwr.

"AICc"
fixed bool

If False (default), the bandwidth is a nearest-neighbour count (integer-valued; rounded at each candidate).

False
bw_min float

Override the default search bounds.

None
bw_max float

Override the default search bounds.

None

spatial_panel

spatial_panel(data: DataFrame, formula: str, entity: str, time: str, W, model: ModelKind = 'sar', effects: EffectKind = 'fe', row_normalize: bool = True) -> SpatialPanelResult

Fit a spatial panel model by concentrated ML (Elhorst 2014).

Parameters:

Name Type Description Default
data DataFrame

Long-format panel; must be balanced.

required
formula str

"y ~ x1 + x2". Constant is dropped (absorbed by the entity FE).

required
entity str

Column names identifying entity and time dimensions.

required
time str

Column names identifying entity and time dimensions.

required
W sparse matrix, ndarray, or :class:`statspai.spatial.weights.W`

N × N spatial weights matrix aligned with sorted(data[entity].unique()).

required
model ('sar', 'sem', 'sdm')
"sar"
effects ('fe', 'twoways')

"fe" = entity FE only. "twoways" = entity + time demeaning.

"fe"

queen_weights

queen_weights(gdf) -> W

Queen-contiguity spatial weights from polygon geometries.

Two areal units are queen-contiguous if their geometries share at least one boundary point (an edge or a single vertex). Requires geopandas.

Parameters:

Name Type Description Default
gdf GeoDataFrame

Layer of polygon geometries (accessed via gdf.geometry and the spatial index gdf.sindex).

required

Returns:

Type Description
W

Spatial weights object whose neighbour lists encode queen contiguity.

Raises:

Type Description
ImportError

If geopandas is not installed.

Examples:

>>> import statspai as sp
>>> import geopandas as gpd
>>> gdf = gpd.read_file("counties.shp")
>>> w = sp.queen_weights(gdf)
See Also

rook_weights : Edge-only contiguity (excludes shared vertices).

rook_weights

rook_weights(gdf) -> W

Rook-contiguity spatial weights from polygon geometries.

Two areal units are rook-contiguous if their geometries share a boundary segment of positive length (a shared edge); units touching only at a single vertex are not neighbours. Requires geopandas.

Parameters:

Name Type Description Default
gdf GeoDataFrame

Layer of polygon geometries.

required

Returns:

Type Description
W

Spatial weights object whose neighbour lists encode rook contiguity.

Raises:

Type Description
ImportError

If geopandas is not installed.

Examples:

>>> import statspai as sp
>>> import geopandas as gpd
>>> w = sp.rook_weights(gpd.read_file("counties.shp"))
See Also

queen_weights : Contiguity that also counts shared vertices.

knn_weights

knn_weights(coords: ndarray, k: int = 5) -> W

k-nearest-neighbour spatial weights from point coordinates.

Each unit is connected to its k closest units by Euclidean distance (the unit itself is excluded). The resulting graph is generally asymmetric — being someone's nearest neighbour is not reciprocal.

Parameters:

Name Type Description Default
coords ndarray

(n, d) array of coordinates (typically d=2).

required
k int

Number of nearest neighbours per unit. Must satisfy 0 < k < n.

5

Returns:

Type Description
W

Spatial weights object with exactly k neighbours per unit.

Raises:

Type Description
ValueError

If k is not in the open interval (0, n).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> len(w.neighbors[0])
4
See Also

distance_band : Connect all units within a fixed radius. kernel_weights : Continuous distance-decay weights.

distance_band

distance_band(coords: ndarray, threshold: float, binary: bool = True) -> W

Distance-band spatial weights — connect units within a fixed radius.

Every unit is connected to all other units within Euclidean distance threshold. Weights are either binary (1 for every neighbour) or inverse-distance (1/d).

Parameters:

Name Type Description Default
coords ndarray

(n, d) array of coordinates.

required
threshold float

Distance band radius. Units within this distance become neighbours. Choose at least the maximum nearest-neighbour distance to avoid islands (units with no neighbours).

required
binary bool

If True all neighbour weights are 1.0; if False they are inverse Euclidean distance 1/d (coincident points are dropped via an infinite distance).

True

Returns:

Type Description
W

Spatial weights object; neighbour counts vary by local density.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.distance_band(coords, threshold=0.3, binary=False)
>>> w.n
30
See Also

knn_weights : Fixed neighbour count instead of fixed radius.

kernel_weights

kernel_weights(coords: ndarray, bandwidth: float, kernel: Literal['gaussian', 'bisquare', 'triangular'] = 'gaussian', fixed: bool = True) -> W

Kernel (distance-decay) spatial weights from point coordinates.

Assigns each neighbour a continuous weight that decays with distance according to a kernel function. Supports a fixed bandwidth (same radius everywhere) or an adaptive bandwidth (the bandwidth-th nearest neighbour distance, varying with local density).

Parameters:

Name Type Description Default
coords ndarray

(n, d) array of coordinates.

required
bandwidth float

With fixed=True, the bandwidth distance h (for the bisquare and triangular kernels, neighbours beyond h get zero weight; the gaussian kernel has infinite support so all units are included). With fixed=False, the integer number of nearest neighbours that defines the adaptive bandwidth.

required
kernel ('gaussian', 'bisquare', 'triangular')

Kernel shape. gaussian exp(-u^2/2); bisquare (1-u^2)^2 for u<1; triangular 1-u for u<1, where u = d / bandwidth.

"gaussian"
fixed bool

Fixed (True) versus adaptive (False) bandwidth.

True

Returns:

Type Description
W

Spatial weights object with continuous kernel weights.

Raises:

Type Description
ValueError

If kernel is not one of the supported names.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.kernel_weights(coords, bandwidth=0.4, kernel="bisquare")
>>> w.n
30
See Also

distance_band : Binary / inverse-distance fixed-radius weights. knn_weights : Uniform k-nearest-neighbour weights.

block_weights

block_weights(regimes) -> W

Block (regime) spatial weights — full connectivity within each group.

Builds a weights object in which every unit is a neighbour of every other unit sharing the same regime label, and of no unit outside it. Useful for encoding membership in a region, cluster, or treatment block as a spatial structure.

Parameters:

Name Type Description Default
regimes array_like

Length-n array of group labels (any hashable dtype). Units with equal labels become mutual neighbours.

required

Returns:

Type Description
W

Spatial weights object with block-diagonal connectivity.

Examples:

>>> import statspai as sp
>>> w = sp.block_weights([0, 0, 1, 1, 1])
>>> sorted(w.neighbors[2])
[3, 4]

moran_local

moran_local(y, w: W, permutations: int = 999, seed: Optional[int] = None)

Local Moran's I (LISA) — per-location spatial association.

Decomposes global Moran's I into a contribution I_i for each spatial unit, identifying local clusters (high-high / low-low) and spatial outliers (high-low / low-high). Each I_i gets a conditional permutation pseudo p-value.

Parameters:

Name Type Description Default
y array_like

Variable of interest, length n.

required
w W

Spatial weights object; w.sparse is the n x n weights matrix.

required
permutations int

Number of conditional permutations for the per-location pseudo p-values. 0 skips the permutation test.

999
seed int

Seed for the permutation RNG for reproducibility.

None

Returns:

Type Description
dict

{"Is": ndarray, "p_sim": ndarray | None, "simulations": ndarray | None} where Is holds the n local statistics and p_sim the matching pseudo p-values.

Raises:

Type Description
ValueError

If y has zero variance.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(2).standard_normal(40)
>>> out = sp.moran_local(y, w, permutations=499, seed=0)
>>> out["Is"].shape
(40,)
See Also

moran : Global Moran's I.

References

Anselin, L. (1995). Local indicators of spatial association---LISA. Geographical Analysis, 27(2), 93-115. [@anselin1995local]

getis_ord_g

getis_ord_g(y, w: W, permutations: int = 999, seed: Optional[int] = None) -> SpatialStatistic

Global Getis-Ord G — overall concentration of high (or low) values.

The global G statistic summarises whether high values of y are globally concentrated among neighbouring units. It requires non-negative y. Larger-than-expected G indicates clustering of high values; smaller indicates clustering of low values.

Parameters:

Name Type Description Default
y array_like

Non-negative variable of interest, length n.

required
w W

Spatial weights object defining the neighbourhood structure.

required
permutations int

Number of random permutations for the pseudo p-value, empirical variance and z-score. 0 skips the permutation test.

999
seed int

Seed for the permutation RNG for reproducibility.

None

Returns:

Type Description
SpatialStatistic

Result with value (G), expectation (S0 / (n(n-1))), variance, z_score, p_norm and p_sim / simulations.

Raises:

Type Description
ValueError

If any element of y is negative, or y is degenerate.

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((50, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> y = np.random.default_rng(4).random(50)  # non-negative
>>> res = sp.getis_ord_g(y, w, seed=0)
>>> res.value > 0
True
See Also

getis_ord_local : Local Gi / Gi* hotspot statistics.

References

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]

getis_ord_local

getis_ord_local(y, w: W, star: bool = True, permutations: int = 999, seed: Optional[int] = None)

Local Getis-Ord Gi / Gi* — hotspot and coldspot detection.

Computes a standardised local statistic for each unit measuring whether it is surrounded by high values (a hotspot, large positive z) or low values (a coldspot, large negative z). With star=True the focal unit is included in its own neighbourhood (Gi*); with star=False it is excluded (Gi).

Parameters:

Name Type Description Default
y array_like

Variable of interest, length n.

required
w W

Spatial weights object; densified internally to an n x n matrix.

required
star bool

If True compute Gi* (self-included); otherwise Gi (self-excluded).

True
permutations int

Accepted for API symmetry; the current implementation returns the analytic standardised statistic.

999
seed int

Seed placeholder for API symmetry.

None

Returns:

Type Description
dict

{"Gs": ndarray, "z": ndarray} — the n local statistics, which are already standardised (so Gs and z coincide).

Examples:

>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(5).random(40)
>>> out = sp.getis_ord_local(y, w, star=True)
>>> out["Gs"].shape
(40,)
See Also

getis_ord_g : Global G statistic. moran_local : LISA, a cross-product local statistic.

References

Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]

moran_plot

moran_plot(y, w: W, ax=None)

Moran scatter: z vs spatial lag Wz. Slope of OLS line = Moran's I.

lisa_cluster_map

lisa_cluster_map(y, w: W, gdf, ax=None, p_threshold: float = 0.05)

Classify each observation HH/LL/HL/LH/Not significant and colour on a GDF.

spatial_did

spatial_did(data: DataFrame, y: str, treat: str, unit: str, time: str, W, covariates: Optional[Sequence[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, *, se_type: str = 'cluster', coords: Optional[Tuple[str, str]] = None, lat: Optional[str] = None, lon: Optional[str] = None, distance_matrix: Optional[ndarray] = None, conley_cutoff: Optional[float] = None, conley_kernel: str = 'bartlett', unit_order: Optional[Sequence[Any]] = None, normalize_W: bool = True, event_study: bool = False, event_window: Optional[Tuple[int, int]] = None, event_base: int = -1) -> SpatialDiDResult

Spatial DiD with direct and spillover treatment effects.

Parameters:

Name Type Description Default
data DataFrame

Long panel with one row per unit-period.

required
y str

Outcome, treatment, unit-id, and time-id columns. treat may be binary or a continuous exposure intensity; the spatial lag uses the same scale.

required
treat str

Outcome, treatment, unit-id, and time-id columns. treat may be binary or a continuous exposure intensity; the spatial lag uses the same scale.

required
unit str

Outcome, treatment, unit-id, and time-id columns. treat may be binary or a continuous exposure intensity; the spatial lag uses the same scale.

required
time str

Outcome, treatment, unit-id, and time-id columns. treat may be binary or a continuous exposure intensity; the spatial lag uses the same scale.

required
W ndarray, sparse matrix, or :class:`statspai.spatial.W`

Spatial weights matrix over units. If a StatsPAI W object carries an id_order, that order is respected.

required
covariates sequence of str

Additional controls included after absorbing unit and time effects.

None
cluster str

Cluster variable for se_type='cluster'. Defaults to unit.

None
alpha float

Significance level for confidence intervals.

0.05
se_type ('cluster', 'robust', 'conley')

Covariance estimator. conley uses spatial-HAC dependence within each period and requires conley_cutoff plus coordinates or a unit-level distance matrix.

"cluster"
coords tuple(str, str)

Convenience pair (lat, lon) in degrees.

None
lat str

Latitude/longitude columns in degrees.

None
lon str

Latitude/longitude columns in degrees.

None
distance_matrix ndarray

Unit-by-unit distances in the same units as conley_cutoff.

None
conley_cutoff float

Spatial cutoff for Conley-HAC standard errors.

None
conley_kernel ('uniform', 'bartlett')
"uniform"
unit_order sequence

Explicit row/column order of W.

None
normalize_W bool

Row-normalise weights before constructing WD.

True
event_study bool

Estimate direct and spillover event-study paths by relative time.

False
event_window tuple(int, int)

Inclusive event-time window. Defaults to observed support capped at [-5, 5] when event_study=True.

None
event_base int

Omitted relative-time period.

-1

Returns:

Type Description
SpatialDiDResult

Direct, spillover, total effects with diagnostics, plotting, and export helpers.

spatial_iv

spatial_iv(data: DataFrame, y: str, endog: Sequence[str], exog: Sequence[str], W, instruments: Optional[Sequence[str]] = None, include_WY: bool = True, alpha: float = 0.05) -> SpatialIVResult

Spatial 2SLS estimator.

Parameters:

Name Type Description Default
data DataFrame
required
y str
required
endog sequence of str

Names of additional endogenous regressors.

required
exog sequence of str

Names of exogenous regressors (included instruments).

required
W ndarray or :class:`spatial.W`

Spatial weights matrix (n_obs x n_obs). Should be row-normalised.

required
instruments sequence of str

Extra excluded instruments beyond the spatial lags of X.

None
include_WY bool

If True, include WY as a regressor (spatial autoregressive coefficient ρ).

True
alpha float
0.05

Returns:

Type Description
SpatialIVResult