Raster Mastery Issue 3
USDA Natural Resources Conservation Service sent this bulletin at 03/05/2021 08:30 AM EST

Sampling Techniques: When a Grid Can Be So Much More
Sampling is a key component to digital soil mapping projects. It’s also a topic that is often discussed amongst practitioners…for hours upon hours actually. Numerous tools are at our disposal when designing a sampling scheme, but we often don’t consider grid sampling as a firstline strategy for sampling design. In this issue of Raster Mastery, we ask…should I grid sample? If so, is there an optimal way to generate grid samples? What if I want to incorporate covariate data? We have so many options to consider! Read on as we explore the options in R (insert super enthusiastic R emoji here).
First, let’s set up our working environment and bring in some data.
# load libraries
library(raster);library(rgdal);library(clhs);library(spcosa); library(LICORS); library(parallel)
# load project area shapefile
poly < readOGR("C:/rwork/coweeta/watershed.shp")
# load dem
dem < raster("C:/rwork/coweeta/chldem10m.tif")
# create slope and aspect rasters
slp < terrain(dem, opt="slope")
aspt < terrain(dem, opt="aspect")
# create hillshade for visualization
hill < hillShade(slp, aspt, angle = 40, direction = 270)
Regular Grid Sampling
Traditional grid sampling can be useful for a detailed investigation. However, sometimes getting an exact number of points you want to fall within a regular grid can be very difficult. The following creates a regular grid of 115 points over the DEM for our project location.
Spatial Coverage Sampling
This process uses kmeans clustering to generate points based on the geometry of the raster data. Smarter grid for sure.
# convert raster to data.frame
grd < as.data.frame(dem, xy=T)
grd < na.omit(grd) #remove NAs
gridded(grd) < ~ x * y # convert to grid format
# function
sc_grd < stratify(grd, nStrata = 115, nTry=10)
# pull out sample points and save to shp
samp_pattern < spsample(sc_grd)
samp_pts < as(samp_pattern, "data.frame")
coords< samp_pts[c("x","y")]
prj < dem@crs
samp_pts < SpatialPointsDataFrame(coords = coords, data = samp_pts, proj4string = prj)
Feature Space Coverage Sampling
The spatial coverage sampling technique uses kmeans to find the shortest distance between sample locations and the XY coordinates of a raster. FSCS builds off this technique by using kmeans to find the optimal coverage spanned by the environmental covariates.
First, let’s set up the covariate data.
# set directory to covariates
setwd("C:/rwork/coweeta/cov/cov")
# load in raster data
# read in raster layers names and create list
rlist=list.files(getwd(), pattern=".tif$", full.names = FALSE)
# create raster stack
rstk < stack(rlist)
# create a grid of points (n=gsize) to represent the raster stack
#  this can be done many ways sampleRegular is one method
gsize < 10000
samp_reg < sampleRegular(rstk, size = gsize, sp=T)
samp_regdf < na.omit(as.data.frame(samp_reg))
Stack ’em up! To run FSCS, we need to convert the raster stack to a data frame in R. For spatially large rasters and for raster stacks with numerous covariates, the conversion is a timeconsuming process. That might be welcome if you enjoy the art of slow coffee, but it’s not great for productivity. We recommend you slam that coffee and create a regular grid of points to represent the covariate data. You can set the size of the grid in the gsize parameter.
Next let’s run the FSCS algorithm… in parallel to be even more productive.
# parallel implementation of FSCS
nw < detectCores(logical = F)1
cl < makeCluster(nw)
clusterSetRNGStream(cl, iseed=1234)
set.seed(88)
# Parallelize over the "nstart" argument
nstart < 100
# Create vector of length "nw" where sum(nstartv) == nstart
nstartv < rep(ceiling(nstart / nw), nw)
results < clusterApply(cl, nstartv,
function(n,x) LICORS::kmeanspp(x, 115, nstart=n, iter.max=1000),
samp_regdf[,1:4])
# Pick the best result
i < sapply(results, function(result) result$tot.withinss)
result < results[[which.min(i)]]
stopCluster(cl)
The key parameters (underlined below) for the kmeanspp function in the parallel implementation (code block above) of the FSCS algorithm are:
nstart 100 # number of random starts
sampNo 115 # number of clusters whose centroids represent sampling locations
iter.max 1000 # number of iterations
samp_regdf[,1:4] columns in our dataframe representing the covariates
Finally, let’s pull out the FSCS points.
# pull out the points
fscs < as.data.frame(result$inicial.centers)
cov < names(fscs)
fscs.pts < merge(samp_regdf, fscs, by = cov)
# projection information
prj < rstk@crs
coords < fscs.pts[c("x","y")]
# convert to sp
fscs.pts < SpatialPointsDataFrame(coords = coords, data = fscs.pts, proj4string = prj)
# save pts to shpfile
writeOGR(fscs.pts, dsn = ".", layer="fscs135.pts", driver = "ESRI Shapefile", overwrite_layer = T)