Step-by-Step Ordinary Kriging with PyKrige
Ordinary kriging estimates unknown values at unsampled locations by minimizing estimation variance under the assumption of a constant but unknown mean across the study area. As a foundational technique within Kriging, Interpolation & Surface Generation Techniques, it remains the default choice for environmental monitoring, soil mapping, and urban sensor networks. With PyKrige, the workflow reduces to four deterministic steps: prepare coordinate/value arrays, fit a variogram model, instantiate the OrdinaryKriging class, and execute interpolation on a target grid. The library handles covariance matrix construction, Lagrange multiplier calculation, and kriging variance estimation natively. This guide provides a production-ready pipeline for Step-by-Step Ordinary Kriging with PyKrige, including explicit validation, fallback paths, and scaling considerations.
Data Preparation & Coordinate Requirements
The pipeline requires strictly formatted 1D NumPy arrays for coordinates and measurements. All distance calculations default to Euclidean space. If your data uses geographic coordinates (latitude/longitude), project them to a metric coordinate reference system (CRS) like UTM before execution. Geographic coordinates violate Euclidean distance assumptions and will produce distorted spatial weights. Use pyproj or geopandas for transformation, adhering to OGC Coordinate Reference System Standards for consistent spatial referencing.
This constant-mean assumption distinguishes ordinary kriging from trend-based methods covered in Ordinary & Universal Kriging. Always verify stationarity before proceeding; if a clear directional trend exists, detrend the data or switch to universal kriging.
Implementation Code
The following block demonstrates a complete, validated interpolation pipeline. It includes explicit variogram configuration, grid execution, and post-processing checks.
import numpy as np
from pykrige.ok import OrdinaryKriging
import matplotlib.pyplot as plt
# 1. Load or generate spatial data (x, y, z must be 1D arrays)
np.random.seed(42)
n = 120
x = np.random.uniform(0, 100, n)
y = np.random.uniform(0, 100, n)
z = 3.0 * np.sin(x/8) * np.cos(y/12) + np.random.normal(0, 0.4, n)
# 2. Define target interpolation grid
gridx = np.linspace(0, 100, 150)
gridy = np.linspace(0, 100, 150)
# 3. Initialize Ordinary Kriging with explicit variogram configuration
ok = OrdinaryKriging(
x, y, z,
variogram_model='spherical',
variogram_parameters=[1.2, 18.0, 0.05], # [sill, range, nugget]
nlags=20,
weight=True,
verbose=False,
coordinates_type='euclidean'
)
# 4. Execute grid interpolation
z_interp, ss = ok.execute('grid', gridx, gridy)
# 5. Validate output dimensions and uncertainty bounds
print(f"Grid shape: {z_interp.shape}")
print(f"Variance range: [{ss.min():.4f}, {ss.max():.4f}]")
print(f"NaN count: {np.isnan(z_interp).sum()}")
Variogram Configuration & Model Selection
The variogram defines spatial autocorrelation structure and directly controls prediction accuracy. PyKrige supports spherical, exponential, gaussian, linear, and hole-effect models. While variogram_model='auto' is convenient, it frequently yields non-physical parameters when data exhibits clustering, anisotropy, or measurement error. For reproducible deployments, manually specify variogram_parameters using empirical semivariogram analysis or domain knowledge. The standard parameter order is [sill, range, nugget].
Auto-fitting relies on weighted least squares optimization, which can be unstable with sparse or heavily clustered datasets. Consult the PyKrige Official Documentation for advanced fitting routines and anisotropy scaling. When in doubt, start with a spherical model, as it provides a realistic cutoff at the range parameter and avoids the infinite tail behavior of exponential/gaussian models.
Execution & Variance Interpretation
The execute() method returns a tuple: (z_interp, ss), where z_interp is the predicted surface and ss is the kriging variance. Variance values near zero indicate proximity to sampled points, while elevated values flag extrapolation zones or data-sparse regions. Always verify that z_interp contains no NaN values before downstream analysis. Boundary artifacts often appear when the grid extends beyond the convex hull of input points; mask these regions using np.isnan() or apply a buffer zone.
Kriging variance is independent of the observed values and depends solely on the spatial configuration of samples and the chosen variogram. This makes it a reliable metric for uncertainty quantification, but it does not account for model misspecification. Pair variance maps with cross-validation to detect systematic bias.
Production Validation & Scaling
Deploying kriging at scale requires explicit validation and computational safeguards:
- Cross-Validation: Use leave-one-out (LOOCV) or 5-fold splits to compare predicted vs. observed values. Track RMSE, MAE, and standardized prediction errors. A mean standardized error near zero and variance near one indicate an unbiased, well-calibrated model.
- Anisotropy: Directional spatial dependence requires
anisotropy_scalingandanisotropy_angleparameters. Ignoring anisotropy in elongated geological formations or wind-driven pollutant dispersion biases predictions and inflates variance. - Computational Scaling: Exact kriging scales as O(N³) due to covariance matrix inversion. For datasets >5,000 points, consider moving-window kriging, block kriging, or sparse approximations. PyKrige supports
exact=Falsefor faster, approximate solutions when memory constraints arise. - Nugget Effects: High nugget-to-sill ratios (>0.3) indicate strong micro-scale variability or measurement noise. In these cases, kriging will smooth aggressively. Validate against independent holdout data before accepting smoothed outputs.
Summary
Ordinary kriging remains the most robust spatial interpolator when stationarity holds and variogram parameters are carefully calibrated. By enforcing strict array formatting, manually configuring variogram models, and validating outputs against kriging variance, teams can deploy reliable interpolation pipelines for environmental, geological, and urban analytics. For complex trends or non-stationary fields, transition to universal or indicator kriging while maintaining the same validation rigor.