statspai.spatial¶
spatial ¶
Spatial econometrics — StatsPAI's answer to R's spatialreg + spdep + mgwr
and Python's PySAL.
Three layers, flat-imported for convenience:
Weights
W, queen_weights, rook_weights, knn_weights,
distance_band, kernel_weights, block_weights.
Exploratory spatial data analysis (ESDA)
moran, moran_local, geary, getis_ord_g, getis_ord_local,
join_counts, moran_plot, lisa_cluster_map.
Regression
- ML estimators:
sar,sem,sdm,slx,sac - Diagnostics:
lm_tests,moran_residuals - Effects:
impacts(LeSage-Pace direct / indirect / total)
All regression estimators accept a :class:W object, a scipy.sparse
matrix, or an (n, n) ndarray for the weights argument.
Examples:
>>> import statspai as sp
>>> w = sp.knn_weights(coords, k=6); w.transform = "R"
>>> moran = sp.moran(df["y"], w)
>>> result = sp.sar(w, data=df, formula='y ~ x1 + x2')
>>> eff = sp.impacts(result)
>>> lms = sp.lm_tests("y ~ x1 + x2", df, w)
SpatialModel ¶
Unified spatial regression estimator.
Uses concentrated ML: profile out β and σ², optimise over ρ (or λ) in one dimension using eigenvalues of W.
W ¶
Spatial weights matrix with CSR-sparse backing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
neighbors
|
dict[int, list[int]]
|
Mapping observation id -> list of neighbour ids. |
required |
weights
|
dict[int, list[float]]
|
Matching weight values. If |
None
|
id_order
|
sequence
|
Explicit ordering of observation ids. Defaults to |
None
|
SpatialDiDResult
dataclass
¶
Result object for :func:spatial_did.
The object intentionally exposes both a spatial-DiD-specific surface
(direct_effect, spillover_effect, total_effect) and a
broom/modelsummary-compatible surface (tidy(), glance(),
params, std_errors).
plot ¶
Plot coefficient, exposure, or spatial event-study diagnostics.
sar ¶
sar(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults
Spatial Autoregressive (Lag) Model: Y = ρWY + Xβ + ε.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
W
|
ndarray, scipy.sparse matrix, or statspai.spatial.weights.W
|
|
required |
data
|
see legacy implementation.
|
|
required |
formula
|
see legacy implementation.
|
|
required |
row_normalize
|
see legacy implementation.
|
|
required |
alpha
|
see legacy implementation.
|
|
required |
sem ¶
sem(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults
Spatial Error Model: Y = Xβ + u, u = λWu + ε.
sdm ¶
sdm(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults
Spatial Durbin Model: Y = ρWY + Xβ + WXθ + ε.
slx ¶
slx(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults
Spatial Lag of X (SLX) model: Y = Xβ + WXθ + ε.
No autoregressive term. Estimated by ordinary least squares on the design matrix augmented with spatially-lagged covariates (skip the constant).
sac ¶
sac(W: ArrayOrW, data: DataFrame, formula: str, row_normalize: bool = True, alpha: float = 0.05) -> EconometricResults
SAC / SARAR: combined spatial lag + spatial error model.
Jointly estimates the autoregressive coefficient ρ (on the dependent variable) and the spatial-error coefficient λ by concentrated maximum likelihood, profiling out (β, σ²) at each (ρ, λ) candidate.
sar_gmm ¶
sar_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None, w_lags: int = 1) -> EconometricResults
Kelejian-Prucha (1998) 2SLS for SAR with spatial-lag instruments.
Instruments: [X, W X, …, W^w_lags X] (dropping constant duplicates).
w_lags=1 (default) matches spreg.GM_Lag default.
Endogenous regressor: W Y.
sem_gmm ¶
sem_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults
Kelejian-Prucha (1999) GMM for the spatial-error parameter λ.
Stage 1 — OLS on y = Xβ + u ⇒ residuals u.
Stage 2 — minimise sum of squared KP moment residuals over (λ, σ²).
Stage 3 — GLS with (I - λ̂ W) → final β̂.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
robust
|
(None, 'het')
|
When |
None
|
sarar_gmm ¶
sarar_gmm(W, data: DataFrame, formula: str, row_normalize: bool = True, robust: Optional[str] = None) -> EconometricResults
Combined GMM: Kelejian-Prucha SAR 2SLS then SEM GMM on residuals.
Equivalent to spreg.GM_Combo (or GM_Combo_Het with robust='het').
lm_tests ¶
lm_tests(formula: str, data: DataFrame, W, row_normalize: bool = True) -> Dict[str, Tuple[float, float]]
Anselin (1988) Lagrange multiplier battery.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
formula
|
str
|
|
required |
data
|
DataFrame
|
|
required |
W
|
ndarray, scipy.sparse, or ``statspai.spatial.weights.W``
|
|
required |
row_normalize
|
bool
|
|
True
|
Returns:
| Type | Description |
|---|---|
dict
|
|
moran_residuals ¶
Moran's I applied to regression residuals (quick LM-err companion).
impacts ¶
Compute direct / indirect / total impacts + simulated SEs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
result
|
EconometricResults
|
Output of |
required |
n_sim
|
int
|
Number of Monte-Carlo draws for simulated SEs. |
1000
|
seed
|
int
|
RNG seed. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame with one row per non-constant regressor and columns
|
|
``Direct``, ``SE_Direct``, ``Indirect``, ``SE_Indirect``, ``Total``,
|
|
``SE_Total``.
|
|
gwr_bandwidth ¶
gwr_bandwidth(coords, y, X, kernel: KernelName = 'bisquare', fixed: bool = False, criterion: Criterion = 'AICc', bw_min: float = None, bw_max: float = None, add_constant: bool = True, tol: float = 0.001) -> float
Select GWR bandwidth via golden-section search.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
criterion
|
('AICc', 'AIC', 'BIC', 'CV')
|
Default AICc matches mgwr. |
"AICc"
|
fixed
|
bool
|
If False (default), the bandwidth is a nearest-neighbour count (integer-valued; rounded at each candidate). |
False
|
bw_min
|
float
|
Override the default search bounds. |
None
|
bw_max
|
float
|
Override the default search bounds. |
None
|
spatial_panel ¶
spatial_panel(data: DataFrame, formula: str, entity: str, time: str, W, model: ModelKind = 'sar', effects: EffectKind = 'fe', row_normalize: bool = True) -> SpatialPanelResult
Fit a spatial panel model by concentrated ML (Elhorst 2014).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long-format panel; must be balanced. |
required |
formula
|
str
|
|
required |
entity
|
str
|
Column names identifying entity and time dimensions. |
required |
time
|
str
|
Column names identifying entity and time dimensions. |
required |
W
|
sparse matrix, ndarray, or :class:`statspai.spatial.weights.W`
|
N × N spatial weights matrix aligned with |
required |
model
|
('sar', 'sem', 'sdm')
|
|
"sar"
|
effects
|
('fe', 'twoways')
|
|
"fe"
|
queen_weights ¶
queen_weights(gdf) -> W
Queen-contiguity spatial weights from polygon geometries.
Two areal units are queen-contiguous if their geometries share at least
one boundary point (an edge or a single vertex). Requires geopandas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
Layer of polygon geometries (accessed via |
required |
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object whose neighbour lists encode queen contiguity. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
Examples:
>>> import statspai as sp
>>> import geopandas as gpd
>>> gdf = gpd.read_file("counties.shp")
>>> w = sp.queen_weights(gdf)
See Also
rook_weights : Edge-only contiguity (excludes shared vertices).
rook_weights ¶
rook_weights(gdf) -> W
Rook-contiguity spatial weights from polygon geometries.
Two areal units are rook-contiguous if their geometries share a boundary
segment of positive length (a shared edge); units touching only at a
single vertex are not neighbours. Requires geopandas.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
gdf
|
GeoDataFrame
|
Layer of polygon geometries. |
required |
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object whose neighbour lists encode rook contiguity. |
Raises:
| Type | Description |
|---|---|
ImportError
|
If |
Examples:
>>> import statspai as sp
>>> import geopandas as gpd
>>> w = sp.rook_weights(gpd.read_file("counties.shp"))
See Also
queen_weights : Contiguity that also counts shared vertices.
knn_weights ¶
knn_weights(coords: ndarray, k: int = 5) -> W
k-nearest-neighbour spatial weights from point coordinates.
Each unit is connected to its k closest units by Euclidean distance
(the unit itself is excluded). The resulting graph is generally asymmetric
— being someone's nearest neighbour is not reciprocal.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coords
|
ndarray
|
|
required |
k
|
int
|
Number of nearest neighbours per unit. Must satisfy |
5
|
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object with exactly |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> len(w.neighbors[0])
4
See Also
distance_band : Connect all units within a fixed radius. kernel_weights : Continuous distance-decay weights.
distance_band ¶
distance_band(coords: ndarray, threshold: float, binary: bool = True) -> W
Distance-band spatial weights — connect units within a fixed radius.
Every unit is connected to all other units within Euclidean distance
threshold. Weights are either binary (1 for every neighbour) or
inverse-distance (1/d).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coords
|
ndarray
|
|
required |
threshold
|
float
|
Distance band radius. Units within this distance become neighbours. Choose at least the maximum nearest-neighbour distance to avoid islands (units with no neighbours). |
required |
binary
|
bool
|
If True all neighbour weights are |
True
|
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object; neighbour counts vary by local density. |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.distance_band(coords, threshold=0.3, binary=False)
>>> w.n
30
See Also
knn_weights : Fixed neighbour count instead of fixed radius.
kernel_weights ¶
kernel_weights(coords: ndarray, bandwidth: float, kernel: Literal['gaussian', 'bisquare', 'triangular'] = 'gaussian', fixed: bool = True) -> W
Kernel (distance-decay) spatial weights from point coordinates.
Assigns each neighbour a continuous weight that decays with distance
according to a kernel function. Supports a fixed bandwidth (same radius
everywhere) or an adaptive bandwidth (the bandwidth-th nearest
neighbour distance, varying with local density).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
coords
|
ndarray
|
|
required |
bandwidth
|
float
|
With |
required |
kernel
|
('gaussian', 'bisquare', 'triangular')
|
Kernel shape. |
"gaussian"
|
fixed
|
bool
|
Fixed (True) versus adaptive (False) bandwidth. |
True
|
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object with continuous kernel weights. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((30, 2))
>>> w = sp.kernel_weights(coords, bandwidth=0.4, kernel="bisquare")
>>> w.n
30
See Also
distance_band : Binary / inverse-distance fixed-radius weights. knn_weights : Uniform k-nearest-neighbour weights.
block_weights ¶
block_weights(regimes) -> W
Block (regime) spatial weights — full connectivity within each group.
Builds a weights object in which every unit is a neighbour of every other unit sharing the same regime label, and of no unit outside it. Useful for encoding membership in a region, cluster, or treatment block as a spatial structure.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
regimes
|
array_like
|
Length- |
required |
Returns:
| Type | Description |
|---|---|
W
|
Spatial weights object with block-diagonal connectivity. |
Examples:
moran_local ¶
moran_local(y, w: W, permutations: int = 999, seed: Optional[int] = None)
Local Moran's I (LISA) — per-location spatial association.
Decomposes global Moran's I into a contribution I_i for each spatial
unit, identifying local clusters (high-high / low-low) and spatial
outliers (high-low / low-high). Each I_i gets a conditional
permutation pseudo p-value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array_like
|
Variable of interest, length |
required |
w
|
W
|
Spatial weights object; |
required |
permutations
|
int
|
Number of conditional permutations for the per-location pseudo
p-values. |
999
|
seed
|
int
|
Seed for the permutation RNG for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(2).standard_normal(40)
>>> out = sp.moran_local(y, w, permutations=499, seed=0)
>>> out["Is"].shape
(40,)
See Also
moran : Global Moran's I.
References
Anselin, L. (1995). Local indicators of spatial association---LISA. Geographical Analysis, 27(2), 93-115. [@anselin1995local]
getis_ord_g ¶
getis_ord_g(y, w: W, permutations: int = 999, seed: Optional[int] = None) -> SpatialStatistic
Global Getis-Ord G — overall concentration of high (or low) values.
The global G statistic summarises whether high values of y are
globally concentrated among neighbouring units. It requires non-negative
y. Larger-than-expected G indicates clustering of high values; smaller
indicates clustering of low values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array_like
|
Non-negative variable of interest, length |
required |
w
|
W
|
Spatial weights object defining the neighbourhood structure. |
required |
permutations
|
int
|
Number of random permutations for the pseudo p-value, empirical
variance and z-score. |
999
|
seed
|
int
|
Seed for the permutation RNG for reproducibility. |
None
|
Returns:
| Type | Description |
|---|---|
SpatialStatistic
|
Result with |
Raises:
| Type | Description |
|---|---|
ValueError
|
If any element of |
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((50, 2))
>>> w = sp.knn_weights(coords, k=5)
>>> y = np.random.default_rng(4).random(50) # non-negative
>>> res = sp.getis_ord_g(y, w, seed=0)
>>> res.value > 0
True
See Also
getis_ord_local : Local Gi / Gi* hotspot statistics.
References
Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]
getis_ord_local ¶
getis_ord_local(y, w: W, star: bool = True, permutations: int = 999, seed: Optional[int] = None)
Local Getis-Ord Gi / Gi* — hotspot and coldspot detection.
Computes a standardised local statistic for each unit measuring whether it
is surrounded by high values (a hotspot, large positive z) or low values
(a coldspot, large negative z). With star=True the focal unit is
included in its own neighbourhood (Gi*); with star=False it is
excluded (Gi).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y
|
array_like
|
Variable of interest, length |
required |
w
|
W
|
Spatial weights object; densified internally to an |
required |
star
|
bool
|
If True compute Gi* (self-included); otherwise Gi (self-excluded). |
True
|
permutations
|
int
|
Accepted for API symmetry; the current implementation returns the analytic standardised statistic. |
999
|
seed
|
int
|
Seed placeholder for API symmetry. |
None
|
Returns:
| Type | Description |
|---|---|
dict
|
|
Examples:
>>> import statspai as sp
>>> import numpy as np
>>> coords = np.random.default_rng(0).random((40, 2))
>>> w = sp.knn_weights(coords, k=4)
>>> y = np.random.default_rng(5).random(40)
>>> out = sp.getis_ord_local(y, w, star=True)
>>> out["Gs"].shape
(40,)
See Also
getis_ord_g : Global G statistic. moran_local : LISA, a cross-product local statistic.
References
Getis, A. and Ord, J. K. (1992). The analysis of spatial association by use of distance statistics. Geographical Analysis, 24(3), 189-206. [@getis1992analysis]
moran_plot ¶
moran_plot(y, w: W, ax=None)
Moran scatter: z vs spatial lag Wz. Slope of OLS line = Moran's I.
lisa_cluster_map ¶
lisa_cluster_map(y, w: W, gdf, ax=None, p_threshold: float = 0.05)
Classify each observation HH/LL/HL/LH/Not significant and colour on a GDF.
spatial_did ¶
spatial_did(data: DataFrame, y: str, treat: str, unit: str, time: str, W, covariates: Optional[Sequence[str]] = None, cluster: Optional[str] = None, alpha: float = 0.05, *, se_type: str = 'cluster', coords: Optional[Tuple[str, str]] = None, lat: Optional[str] = None, lon: Optional[str] = None, distance_matrix: Optional[ndarray] = None, conley_cutoff: Optional[float] = None, conley_kernel: str = 'bartlett', unit_order: Optional[Sequence[Any]] = None, normalize_W: bool = True, event_study: bool = False, event_window: Optional[Tuple[int, int]] = None, event_base: int = -1) -> SpatialDiDResult
Spatial DiD with direct and spillover treatment effects.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
Long panel with one row per unit-period. |
required |
y
|
str
|
Outcome, treatment, unit-id, and time-id columns. |
required |
treat
|
str
|
Outcome, treatment, unit-id, and time-id columns. |
required |
unit
|
str
|
Outcome, treatment, unit-id, and time-id columns. |
required |
time
|
str
|
Outcome, treatment, unit-id, and time-id columns. |
required |
W
|
ndarray, sparse matrix, or :class:`statspai.spatial.W`
|
Spatial weights matrix over units. If a StatsPAI |
required |
covariates
|
sequence of str
|
Additional controls included after absorbing unit and time effects. |
None
|
cluster
|
str
|
Cluster variable for |
None
|
alpha
|
float
|
Significance level for confidence intervals. |
0.05
|
se_type
|
('cluster', 'robust', 'conley')
|
Covariance estimator. |
"cluster"
|
coords
|
tuple(str, str)
|
Convenience pair |
None
|
lat
|
str
|
Latitude/longitude columns in degrees. |
None
|
lon
|
str
|
Latitude/longitude columns in degrees. |
None
|
distance_matrix
|
ndarray
|
Unit-by-unit distances in the same units as |
None
|
conley_cutoff
|
float
|
Spatial cutoff for Conley-HAC standard errors. |
None
|
conley_kernel
|
('uniform', 'bartlett')
|
|
"uniform"
|
unit_order
|
sequence
|
Explicit row/column order of |
None
|
normalize_W
|
bool
|
Row-normalise weights before constructing |
True
|
event_study
|
bool
|
Estimate direct and spillover event-study paths by relative time. |
False
|
event_window
|
tuple(int, int)
|
Inclusive event-time window. Defaults to observed support capped at
[-5, 5] when |
None
|
event_base
|
int
|
Omitted relative-time period. |
-1
|
Returns:
| Type | Description |
|---|---|
SpatialDiDResult
|
Direct, spillover, total effects with diagnostics, plotting, and export helpers. |
spatial_iv ¶
spatial_iv(data: DataFrame, y: str, endog: Sequence[str], exog: Sequence[str], W, instruments: Optional[Sequence[str]] = None, include_WY: bool = True, alpha: float = 0.05) -> SpatialIVResult
Spatial 2SLS estimator.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DataFrame
|
|
required |
y
|
str
|
|
required |
endog
|
sequence of str
|
Names of additional endogenous regressors. |
required |
exog
|
sequence of str
|
Names of exogenous regressors (included instruments). |
required |
W
|
ndarray or :class:`spatial.W`
|
Spatial weights matrix (n_obs x n_obs). Should be row-normalised. |
required |
instruments
|
sequence of str
|
Extra excluded instruments beyond the spatial lags of X. |
None
|
include_WY
|
bool
|
If True, include WY as a regressor (spatial autoregressive coefficient ρ). |
True
|
alpha
|
float
|
|
0.05
|
Returns:
| Type | Description |
|---|---|
SpatialIVResult
|
|