This tutorial shows how to compute an Ordinary Kriging prediction directly from the OK equations.
This is an adaptation of a previous tutorial that used the sp
package to represent spatial objects in R. Here we use the newer sf
package that implements Simple Features 1 representation of spatial objects.
The prediction by Ordinary Kriging at an unknown point \(x_0\) is computed as a weighted average \(\hat{x}_o = \sum_i \lambda_i x_i\) of known points \(\hat{x}_i\), \(i = 1 \ldots n\).
These weights \(\lambda_i\) are dervied by solving the Ordinary Kriging equations: \(\mathbf{A} \lambda = b\), solved as \(\lambda = \mathbf{A}^{-1} b\). The kriging prediction variances are computed as \(b^T \lambda\), where\(A, \lambda, b\) are defined from the semivariances \(\gamma\) and the LaGrange multiplier \(\psi\) as:
\[\begin{eqnarray*} \mathbf{A} & = & \left[ \begin{array}{*5{c}} \gamma(\mathbf{x}_1,\mathbf{x}_1) & \gamma(\mathbf{x}_1,\mathbf{x}_2) & \cdots & \gamma(\mathbf{x}_1,\mathbf{x}_N) & 1 \\ \gamma(\mathbf{x}_2,\mathbf{x}_1) & \gamma(\mathbf{x}_2,\mathbf{x}_2) & \cdots & \gamma(\mathbf{x}_2,\mathbf{x}_N) & 1 \\ \vdots & \vdots & \cdots & \vdots & \vdots \\ \gamma(\mathbf{x}_N,\mathbf{x}_1) & \gamma(\mathbf{x}_N,\mathbf{x}_2) & \cdots & \gamma(\mathbf{x}_N,\mathbf{x}_N) & 1 \\ 1 & 1 & \cdots & 1 & 0 \\ \end{array} \right] ; \mathbf{\lambda} & = & \left[ \begin{array}{c} \lambda_1 \\ \lambda_2 \\ \vdots \\ \lambda_N \\ \psi \end{array} \right] ; \mathbf{b} & = & \left[ \begin{array}{c} \gamma(\mathbf{x}_1,\mathbf{x}_0) \\ \gamma(\mathbf{x}_2,\mathbf{x}_0) \\ \vdots \\ \gamma(\mathbf{x}_N,\mathbf{x}_0) \\ 1 \end{array} \right] \end{eqnarray*}\]
We will build these matrices and solve explicitly for one prediction point.
Load the Meuse example dataset from the sp
package and convert to an sf
object, with both coordinates and attributes. Note that all functions in the sf
package have the prefix st_
, which stands for “space + time”; this prefix prevents namespace conflicts with functions from other loaded packages.
require(sf)
## Loading required package: sf
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
data(meuse, package="sp")
class(meuse)
## [1] "data.frame"
str(meuse)
## 'data.frame': 155 obs. of 14 variables:
## $ x : num 181072 181025 181165 181298 181307 ...
## $ y : num 333611 333558 333537 333484 333330 ...
## $ cadmium: num 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
## $ copper : num 85 81 68 81 48 61 31 29 37 24 ...
## $ lead : num 299 277 199 116 117 137 132 150 133 80 ...
## $ zinc : num 1022 1141 640 257 269 ...
## $ elev : num 7.91 6.98 7.8 7.66 7.48 ...
## $ dist : num 0.00136 0.01222 0.10303 0.19009 0.27709 ...
## $ om : num 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
## $ ffreq : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ soil : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
## $ lime : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ landuse: Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
## $ dist.m : num 50 30 150 270 380 470 240 120 240 420 ...
meuse.sf <- st_as_sf(meuse, coords=c("x", "y"))
class(meuse.sf)
## [1] "sf" "data.frame"
str(meuse.sf)
## Classes 'sf' and 'data.frame': 155 obs. of 13 variables:
## $ cadmium : num 11.7 8.6 6.5 2.6 2.8 3 3.2 2.8 2.4 1.6 ...
## $ copper : num 85 81 68 81 48 61 31 29 37 24 ...
## $ lead : num 299 277 199 116 117 137 132 150 133 80 ...
## $ zinc : num 1022 1141 640 257 269 ...
## $ elev : num 7.91 6.98 7.8 7.66 7.48 ...
## $ dist : num 0.00136 0.01222 0.10303 0.19009 0.27709 ...
## $ om : num 13.6 14 13 8 8.7 7.8 9.2 9.5 10.6 6.3 ...
## $ ffreq : Factor w/ 3 levels "1","2","3": 1 1 1 1 1 1 1 1 1 1 ...
## $ soil : Factor w/ 3 levels "1","2","3": 1 1 1 2 2 2 2 1 1 2 ...
## $ lime : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
## $ landuse : Factor w/ 15 levels "Aa","Ab","Ag",..: 4 4 4 11 4 11 4 2 2 15 ...
## $ dist.m : num 50 30 150 270 380 470 240 120 240 420 ...
## $ geometry:sfc_POINT of length 155; first list element: 'XY' num 181072 333611
## - attr(*, "sf_column")= chr "geometry"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA ...
## ..- attr(*, "names")= chr "cadmium" "copper" "lead" "zinc" ...
st_crs(meuse.sf)
## Coordinate Reference System: NA
st_crs(meuse.sf) = 28992 # the EPSG code, see ?meuse
st_crs(meuse.sf)
## Coordinate Reference System:
## User input: EPSG:28992
## wkt:
## PROJCS["Amersfoort / RD New",
## GEOGCS["Amersfoort",
## DATUM["Amersfoort",
## SPHEROID["Bessel 1841",6377397.155,299.1528128,
## AUTHORITY["EPSG","7004"]],
## TOWGS84[565.2369,50.0087,465.658,-0.406857,0.350733,-1.87035,4.0812],
## AUTHORITY["EPSG","6289"]],
## PRIMEM["Greenwich",0,
## AUTHORITY["EPSG","8901"]],
## UNIT["degree",0.0174532925199433,
## AUTHORITY["EPSG","9122"]],
## AUTHORITY["EPSG","4289"]],
## PROJECTION["Oblique_Stereographic"],
## PARAMETER["latitude_of_origin",52.15616055555555],
## PARAMETER["central_meridian",5.38763888888889],
## PARAMETER["scale_factor",0.9999079],
## PARAMETER["false_easting",155000],
## PARAMETER["false_northing",463000],
## UNIT["metre",1,
## AUTHORITY["EPSG","9001"]],
## AXIS["X",EAST],
## AXIS["Y",NORTH],
## AUTHORITY["EPSG","28992"]]
Notice that the converted object has the original data.frame
, except that the two fields representing coördinates are now in a newly-created field geometry
, of class sfc_POINT
. The Coördinate Reference System (CRS) must be supplied from the metadata, since no where in the data.frame
can this be indicated. Although not needed for this example, we assign anyway, using information from ?meuse
. Notice the Well-known text (WKT) format of the CRS specification.
Place a point to predict somewhere in the middle of the bounding box, and convert it to an sf
object. Note it has no attributes, we only need its coordinates. The sfg
“Simple Features Geometry” object must be upgraded to an sfc
“Simple feature columns” object in order to specify the CRS.
print(bb <- st_bbox(meuse.sf))
## xmin ymin xmax ymax
## 178605 329714 181390 333611
x0 <- st_point(c(median(bb$xmin, bb$xmax),
median(bb$ymin, bb$ymax)))
print(x0)
## POINT (178605 329714)
class(x0)
## [1] "XY" "POINT" "sfg"
x0 <- st_sfc(x0)
st_crs(x0) <- st_crs(meuse.sf)
print(x0)
## Geometry set for 1 feature
## geometry type: POINT
## dimension: XY
## bbox: xmin: 178605 ymin: 329714 xmax: 178605 ymax: 329714
## CRS: EPSG:28992
## POINT (178605 329714)
class(x0)
## [1] "sfc_POINT" "sfc"
str(x0)
## sfc_POINT of length 1; first list element: 'XY' num [1:2] 178605 329714
Now the prediction and known points are both sf
ojects with CRS.
Build a matrix of the distances between known points with the st_distance
function:
dim(st_coordinates(meuse.sf))
## [1] 155 2
dm.A <- st_distance(meuse.sf)
dim(dm.A)
## [1] 155 155
dm.A[1:5, 1:5]
## Units: [m]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.00000 70.83784 118.8486 259.2393 366.3141
## [2,] 70.83784 0.00000 141.5662 282.8516 362.6403
## [3,] 118.84864 141.56624 0.0000 143.1712 251.0239
## [4,] 259.23927 282.85155 143.1712 0.0000 154.2628
## [5,] 366.31407 362.64032 251.0239 154.2628 0.0000
Notice the diagonals are 0, this is the distance of a point to itself.
Convert these to semivariances, using a fitted variogram model, as the upper-left of the \(\mathbf{A}\) matrix, using geostatistics functions from the gstat
package, which can understand sf
objects.
require(gstat)
## Loading required package: gstat
meuse.sf$logZn <- log10(meuse.sf$zinc)
v <- variogram(logZn ~ 1, loc=meuse.sf, cutoff=1300, width=90)
(vmf <- fit.variogram(v, vgm(psill=0.12, model="Sph", range=900, nugget=0.01)))
## model psill range
## 1 Nug 0.01004124 0.0000
## 2 Sph 0.11525701 967.2639
str(A <- variogramLine(object=vmf, dist_vector=dm.A))
## num [1:155, 1:155] 0 0.0227 0.0312 0.0553 0.0724 ...
A[1:5, 1:5]
## [,1] [,2] [,3] [,4] [,5]
## [1,] 0.00000000 0.02267992 0.03117694 0.05526735 0.07238485
## [2,] 0.02267992 0.00000000 0.03516364 0.05915613 0.07182145
## [3,] 0.03117694 0.03516364 0.00000000 0.03544430 0.05390113
## [4,] 0.05526735 0.05915613 0.03544430 0.00000000 0.03737988
## [5,] 0.07238485 0.07182145 0.05390113 0.03737988 0.00000000
Extend the \(\mathbf{A}\) matrix with the column and row of 1’s (constraints), bottom-right entry is 0:
dim(A)
## [1] 155 155
A <- cbind(A, 1)
dim(A)
## [1] 155 156
A <- rbind(A, 1)
dim(A)
## [1] 156 156
n <- dim(A)[1]
A[n,n] <- 0
Build the \(b\) vector from the distances from the target point to the known points.
Note the use of the drop
function to remove the 2nd dimension from the matrix returned by st_distance
, in this case there is only one column, because all distances are computed to a single point.
(st_crs(meuse.sf) == st_crs(x0)) # CRS must be the same for `st_distance` between two objects
## [1] TRUE
dm <- drop(st_distance(meuse.sf, x0)) # column vectors, st_distance arguments are rows, columns
length(dm)
## [1] 155
Convert to semivariances, and add a last element (1) needed for the kriging equations.
b <- c(variogramLine(vmf, dist_vector=dm)$gamma, 1)
length(b)
## [1] 156
head(b); tail(b)
## [1] 0.1252982 0.1252982 0.1252982 0.1252982 0.1252982 0.1252982
## [1] 0.1168317 0.1173227 0.1092399 0.1252982 0.1252982 1.0000000
Solve for weights and LaGrange multiplier. Note solve
with a single matrix argument inverts the matrix, and %*%
is the matrix multiplication operator. The drop
function removes matrix dimensions with only one entry, here it will be the column, and so converts the \(\lambda\) matrix into a vector.
lambda <- drop(solve(A) %*% b)
head(lambda)
## [1] 0.0121348932 0.0058718103 0.0020816606 0.0094320593 0.0006989918
## [6] 0.0112053146
The LaGrange multiplier is the last element in the solution:
tail(lambda, 1) # LaGrange multiplier
## [1] 0.003690692
Now that we have the \(\lambda\) vector, we use it to solve for the prediction and its variance. Here drop
will convert the results to scalars:
# prediction
(ok.pred <- drop(lambda[1:n-1] %*% meuse.sf$logZn))
## [1] 2.796016
# variance
(ok.pred.var <- drop(t(b) %*% lambda))
## [1] 0.07574819
krige
functionCheck that we get the same results with the krige
function. For this, the locations
and newdata
arguments must be of class sf
, i.e., with the geometry as a special field in a data.frame
. So, we upgrade x0
with the st_sf
function:
class(meuse.sf)
## [1] "sf" "data.frame"
class(x0)
## [1] "sfc_POINT" "sfc"
x0.sf <- st_sf(x0)
class(x0.sf)
## [1] "sf" "data.frame"
str(x0.sf)
## Classes 'sf' and 'data.frame': 1 obs. of 1 variable:
## $ x0:sfc_POINT of length 1; first list element: 'XY' num 178605 329714
## - attr(*, "sf_column")= chr "x0"
## - attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..:
## ..- attr(*, "names")= chr
(k.pt <- krige(logZn ~1, loc = meuse.sf, newdata = x0.sf, model = vmf))
## [using ordinary kriging]
## Simple feature collection with 1 feature and 2 fields
## geometry type: POINT
## dimension: XY
## bbox: xmin: 178605 ymin: 329714 xmax: 178605 ymax: 329714
## CRS: EPSG:28992
## var1.pred var1.var geometry
## 1 2.796016 0.07574819 POINT (178605 329714)
As promised!
Questions? Comments? mailto:d.g.rossiter@cornell.edu?subject=Comments_on_KrigingFromScratch_sf