Testing for Second-Order Stationarity in Python

Testing for second-order stationarity in Python requires verifying that a spatial random field maintains a constant mean, constant variance, and a covariance structure that depends exclusively on separation distance (lag), not absolute coordinates. In practice, you validate this by combining exploratory spatial data analysis (ESDA) with empirical variogram estimation, trend removal, and statistical homogeneity tests using numpy, scipy, and geostatistical libraries like gstools. If the empirical variogram reaches a stable sill, residuals show no spatially correlated drift, and variance remains homogeneous across distance bins, the process satisfies second-order stationarity and is mathematically safe for ordinary kriging or spatial regression.

The Three Mathematical Conditions

Second-order (weak) stationarity sits between intrinsic stationarity and strict stationarity. It imposes three explicit constraints that must hold across your study area:

  1. Constant mean: E[Z(x)] = μ for all locations x
  2. Constant variance: Var[Z(x)] = σ² for all x
  3. Lag-dependent covariance: Cov[Z(x), Z(x+h)] = C(h), meaning spatial correlation decays or stabilizes purely as a function of distance h

Violations typically appear as global trends (e.g., elevation-driven temperature gradients), directional anisotropy, or heteroscedasticity. Mastering these constraints is essential when navigating the broader Stationarity & Trend Analysis workflow, as stationarity dictates which interpolation and inference methods remain valid.

Environment & Dependencies

Component Minimum Version Notes
Python 3.9+ 3.10+ recommended for scipy performance optimizations
NumPy 1.24+ Required for vectorized lag calculations
SciPy 1.10+ Statistical tests (levene, shapiro)
GeoPandas 0.12+ Spatial I/O and coordinate handling
GSTools / scikit-gstat 1.5+ / 1.4+ C-extensions may require build-essential (Linux) or Xcode CLI tools (macOS)

For Windows users, prefer conda install -c conda-forge gstools to avoid MSVC compilation errors. The implementation below relies on pure numpy/scipy for maximum portability, with optional integration points for GSTools noted in the comments.

Complete Python Implementation

The following script removes linear drift, computes an empirical semivariogram, bins distances, and tests variance homogeneity using Levene’s test. It returns a structured assessment of second-order stationarity.

python
import numpy as np
from scipy.spatial.distance import pdist, squareform
from scipy.stats import levene
import warnings

def test_second_order_stationarity(coords, values, n_bins=10, alpha=0.05):
    """
    Tests second-order stationarity for spatial data.

    Parameters:
        coords (np.ndarray): (N, 2) array of x, y coordinates
        values (np.ndarray): (N,) array of observed values
        n_bins (int): Number of distance bins for variogram estimation
        alpha (float): Significance level for homogeneity tests

    Returns:
        dict: Stationarity assessment with test statistics and pass/fail flags
    """
    coords = np.asarray(coords)
    values = np.asarray(values)

    if coords.shape[0] != values.shape[0]:
        raise ValueError("coords and values must have the same number of observations.")
    if coords.shape[1] != 2:
        raise ValueError("coords must be 2D (N, 2).")

    # 1. Remove linear trend (fit plane: z = ax + by + c)
    X = np.column_stack((coords, np.ones(len(coords))))
    coeffs, _, _, _ = np.linalg.lstsq(X, values, rcond=None)
    trend = X @ coeffs
    residuals = values - trend

    # 2. Compute pairwise distances and empirical semivariogram
    dist_matrix = squareform(pdist(coords))
    diff_matrix = squareform(pdist(residuals.reshape(-1, 1), metric='sqeuclidean')) * 0.5

    # Flatten upper triangle
    dist_flat = dist_matrix[np.triu_indices_from(dist_matrix, k=1)]
    gamma_flat = diff_matrix[np.triu_indices_from(diff_matrix, k=1)]

    # 3. Bin distances
    max_dist = np.max(dist_flat)
    bin_edges = np.linspace(0, max_dist, n_bins + 1)
    bin_indices = np.digitize(dist_flat, bin_edges) - 1
    bin_indices = np.clip(bin_indices, 0, n_bins - 1)

    # Group semivariances by bin
    bin_variances = []
    for b in range(n_bins):
        mask = bin_indices == b
        if np.sum(mask) > 1:
            bin_variances.append(gamma_flat[mask])
        else:
            bin_variances.append(np.array([np.nan]))

    # 4. Test variance homogeneity across bins (Levene's test)
    valid_bins = [v for v in bin_variances if not np.isnan(v).all() and len(v) > 1]
    if len(valid_bins) >= 2:
        _, levene_p = levene(*valid_bins, center='median')
        variance_homogeneous = levene_p > alpha
    else:
        levene_p = np.nan
        variance_homogeneous = False

    # 5. Check for sill stabilization (coefficient of variation < 0.15 in last 3 bins)
    last_bins = [v for v in valid_bins[-3:] if len(v) > 1]
    if last_bins:
        sill_means = [np.mean(v) for v in last_bins]
        sill_cv = np.std(sill_means) / np.mean(sill_means) if np.mean(sill_means) > 0 else 1.0
        sill_reached = sill_cv < 0.15
    else:
        sill_reached = False

    # 6. Residual mean check (should be ~0 after detrending)
    mean_residual = np.mean(residuals)
    mean_stationary = abs(mean_residual) < (np.std(values) * 0.05)

    is_stationary = variance_homogeneous and sill_reached and mean_stationary

    return {
        "is_stationary": is_stationary,
        "mean_residual": float(mean_residual),
        "mean_stationary": bool(mean_stationary),
        "levene_p_value": float(levene_p),
        "variance_homogeneous": bool(variance_homogeneous),
        "sill_cv": float(sill_cv),
        "sill_reached": bool(sill_reached),
        "n_valid_bins": len(valid_bins)
    }

Interpreting Results & Troubleshooting

The function returns a dictionary with three primary pass/fail flags:

  • mean_stationary: Confirms the detrended residuals center near zero.
  • variance_homogeneous: Uses Levene’s test to ensure variance doesn’t systematically change across distance bands. See the official SciPy Levene documentation for test assumptions.
  • sill_reached: Validates that the empirical semivariogram stabilizes rather than growing indefinitely.

If is_stationary returns False, diagnose the failure:

  • High levene_p failure: Variance increases with distance. Apply a variance-stabilizing transformation (e.g., Box-Cox or log) or switch to universal kriging.
  • sill_reached failure: The process exhibits long-range dependence or an unremoved trend. Increase the search radius, add polynomial detrending, or model anisotropy explicitly.
  • mean_stationary failure: The linear trend removal was insufficient. Consider higher-order polynomial fitting or spatial filtering.

Once stationarity is confirmed, you can safely proceed to model fitting and spatial prediction. For advanced workflows involving anisotropy detection or automated variogram modeling, consult the Core Concepts of Spatial Statistics & Geostatistics reference guide. Always validate model assumptions with cross-validation before deploying spatial predictions in production environments.