forked from Envirometrix/BigSpatialDataR
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
438 lines (359 loc) · 22.2 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
---
title: "Processing Large Rasters using Tiling and Parallelization: An R + SAGA GIS + GRASS GIS (tutorial)"
author: "Hengl, T. (tom.hengl@opengeohub.org)"
output:
github_document:
toc: true
bibliography: ./tex/gis_refs.bib
csl: ./tex/apa.csl
---
| <a href="https://github.com/thengl"><img src="https://avatars0.githubusercontent.com/u/640722?s=460&v=4" height="100" alt="Tomislav Hengl"></a> |
|---|
___
<a href="https://creativecommons.org/licenses/by-sa/4.0/" target="_blank"><img src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" alt=""></a>
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Processing large spatial data in a programming environment such as R is not trivial. Even if you use powerful computing infrastructure, it might take careful programming to be able to process large spatial data. The reasons why R has not been recommended as a programming environment for large data were: (a) R does not handle well large objects in the working memory, and (b) for many existing functions parallelization is not implemented automatically but has to be added through additional programming. This tutorial demonstrates how to use R to read, process and create large spatial (raster) data sets. In principle, both examples follow the same systematic approach:
1. prepare a function to run in parallel,
2. tile object and estimate processing time,
3. run function using all cores,
4. build a virtual mosaic and final image using GDAL,
5. avoid loading any large data in RAM,
In addition to R, we also provide some code examples of how to use SAGA GIS, GRASS GIS and GDAL in parallel. For more information, see also these [lecture notes](./tex/Processing_large_rasters_R.pdf). Packages used in this tutorial include:
```{r, echo=FALSE}
list.of.packages <- c("plyr", "parallel", "landmap", "ranger", "raster",
"rgdal", "rgrass7", "snowfall", "lidR", "knitr", "tmap")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages, dependencies = TRUE)
```
Note: the processing below is demonstrated using relatively large objects and a computer with 8 cores running on Ubuntu 16.04 operating system. To install software used in this tutorial and teach your-self some initial steps, consider reading [this document](https://envirometrix.github.io/PredictiveSoilMapping/soil-covs-chapter.html).
## Deriving differences in land cover using large GeoTIFFs
Land cover maps are often distributed as raster images showing distribution of 5–20 classes [@kirches2014land]. Here we use two images of ESA's land cover maps for Indonesia (Kalimantan island) obtained from the [ESA's land cover project website](https://www.esa-landcover-cci.org/?q=node/158) :
```{r}
library(rgdal)
library(raster)
GDALinfo("./data/Indonesia_ESA_lcc_300m_2000.tif")
```
this image is about 6000 by 6000 pixels in size hence not huge (for illustration, a land cover map of the whole world at 300 m resolution contains over billion pixels) but it could still be more efficiently processed if we use tiling and parallelization.
We are interested in deriving the difference in land cover between two periods 2000 and 2015. First, we make a function that can be used to detect differences:
```{r}
make_LC_tiles <- function(i, tile.tbl,
out.path="./tiled",
lc1="./data/Indonesia_ESA_lcc_300m_2000.tif",
lc2="./data/Indonesia_ESA_lcc_300m_2015.tif",
leg.lcc){
out.tif = paste0(out.path, "/T_", tile.tbl[i,"ID"], ".tif")
if(!file.exists(out.tif)){
m <- readGDAL(lc1, offset=unlist(tile.tbl[i,c("offset.y","offset.x")]),
region.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]),
output.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]),
silent = TRUE)
m@data[,2] <- readGDAL(lc2, offset=unlist(tile.tbl[i,c("offset.y","offset.x")]),
region.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]),
output.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]),
silent = TRUE)$band1
names(m) <- c("LC2000","LC2015")
m <- as(m, "SpatialPixelsDataFrame")
## Focus only on pixels that show land cover change
sel <- !m$LC2000==m$LC2015
if(sum(sel)>0){
m <- m[sel,]
m$v <- paste(m$LC2000, m$LC2015, sep="_")
m$i <- plyr::join(data.frame(NAME=m$v), leg.lcc, type="left")$Value
writeGDAL(m["i"], out.tif, type="Int16",
options="COMPRESS=DEFLATE", mvFlag=-32768)
}
}
}
```
this function we can run for each element `i` i.e. for smaller blocks and hence can be run in parallel. The function looks for where there has been a change in land cover, and then assign an unique number that identifies change from land cover class `A` to land cover class `B` (hence class `A-B`). We need to prepare a combined legend for combinations of land cover classes. This output legend we can prepare by using:
```{r}
leg <- read.csv("./data/ESA_landcover_legend.csv")
str(leg)
comb.leg <- expand.grid(leg$Value, leg$Value)
comb.leg$lcc <- paste(comb.leg$Var1, comb.leg$Var2, sep="_")
```
this gives almost 1400 combinations:
```{r}
leg.lcc <- data.frame(Value=1:nrow(comb.leg), NAME=comb.leg$lcc)
head(leg.lcc)
```
Next, we prepare a tiling system to run processing in parallel:
```{r}
library(raster)
if(!require("landmap")){ devtools::install_github("envirometrix/landmap") }
library(landmap)
## check whether the maps match perfectly to the same grid:
x <- raster::stack(paste0("./data/Indonesia_ESA_lcc_300m_", c(2000, 2015), ".tif"))
## OK!
obj <- GDALinfo("./data/Indonesia_ESA_lcc_300m_2000.tif")
## tile to 50km blocks:
tile.lst <- getSpatialTiles(obj, block.x=.5, return.SpatialPolygons=TRUE)
tile.tbl <- getSpatialTiles(obj, block.x=.5, return.SpatialPolygons=FALSE)
tile.tbl$ID <- as.character(1:nrow(tile.tbl))
head(tile.tbl)
tile.pol <- SpatialPolygonsDataFrame(tile.lst, tile.tbl, match.ID = FALSE)
if(!file.exists("./data/tiling_50km_Indonesia.shp")){
writeOGR(tile.pol, "./data/tiling_50km_Indonesia.shp", "tiling_50km_Indonesia", driver="ESRI Shapefile")
}
```
this gives a total of 550 tiles:
```{r plot-tiles, echo=TRUE, fig.width=5, fig.cap="Tiling system based on the 50 km by 50 km tiles."}
te <- as.vector(raster::extent(x))
if("tmap" %in% rownames(installed.packages()) == TRUE){
library(tmap)
data("World")
tm_shape(World, xlim=te[c(1,2)], ylim=te[c(3,4)], projection="EPSG:4326") +
tm_polygons() +
tm_shape(as(tile.lst, "SpatialLines")) + tm_lines()
} else {
knitr::include_graphics("./README_files/figure-markdown_github/plot-tiles-1.png")
}
```
Note that size of tiles needs to be carefully planned so that each tile can still be loaded in memory. If a HPC system has more cores, then in average size of tiles in memory needs to be smaller otherwise RAM might still be a problem for achieving fully parallelized computing.
We can visualize a single tile just to see that the images has been subset correctly:
```{r plot-tile-lcc, echo=TRUE, fig.width=5, fig.cap="Single tile loaded into memory and plotted."}
## plot tile number 124:
i = 124
m <- readGDAL("./data/Indonesia_ESA_lcc_300m_2000.tif",
offset=unlist(tile.tbl[i,c("offset.y","offset.x")]),
region.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]),
output.dim=unlist(tile.tbl[i,c("region.dim.y","region.dim.x")]))
plot(raster(m), legend=FALSE, col=rgb(leg$R/255, leg$G/255, leg$B/255))
```
We can further use the snowfall package to compute all land cover changes and save them to disk:
```{r}
library(snowfall)
sfInit(parallel=TRUE, cpus=parallel::detectCores())
sfExport("make_LC_tiles", "tile.tbl", "leg.lcc")
sfLibrary(rgdal)
sfLibrary(plyr)
out.lst <- sfClusterApplyLB(1:nrow(tile.tbl),
function(x){ make_LC_tiles(x, tile.tbl, leg.lcc=leg.lcc) })
sfStop()
## takes few seconds depending on the number of cores
```
```{r htop-8-cores, echo=FALSE, fig.cap="Fully parallelized computing using 8 cores. Displayed using htop software.", out.width="100%"}
knitr::include_graphics("./tex/htop_8_cores.png")
```
This shows that the script has generated some 295 tiles in total. Note that if all land cover classes are unchanged, then there is no need to generate a Geotiff so that the total number of tiles is much smaller than what we get with `getSpatialTiles` function:
```{r}
t.lst <- list.files("./tiled", pattern=".tif", full.names=TRUE)
str(t.lst)
```
From the list of files we can build a mosaic using GDAL and save it to disk [@mitchell2014geospatial]:
```{r}
out.tmp <- "./data/t_list.txt"
vrt.tmp <- "./data/indonesia.vrt"
cat(t.lst, sep="\n", file=out.tmp)
system(paste0('gdalbuildvrt -input_file_list ', out.tmp, ' ', vrt.tmp))
system(paste0('gdalwarp ', vrt.tmp,
' \"./data/Indonesia_ESA_lcc_300m_change.tif\" ',
'-ot \"Int16\" -dstnodata \"-32767\" -co \"BIGTIFF=YES\" ',
'-multi -wm 2000 -co \"COMPRESS=DEFLATE\" -overwrite ',
'-r \"near\" -wo \"NUM_THREADS=ALL_CPUS\"'))
```
```{r}
raster("./data/Indonesia_ESA_lcc_300m_change.tif")
```
```{r qgis-indonesia, echo=FALSE, fig.cap="Land cover class changes (2000 to 2015) for Kalimantan.", out.width="100%"}
knitr::include_graphics("./tex/Indonesia_ESA_lcc_300m_change_preview.jpg")
```
Note that, to properly optimize computing, one might have to go through several iterations of improving the function and the tiling system. Also, when updating some results, it is a good idea to update only the tiles that need to be updated, which can again be specified in the function. Note also that from the code above, at any stage, we do not read the large images to R but only use R to program functions, run computing and build mosaics / large objects.
## DEM analysis using tiling and parallelization
In the next example we use multisource Digital Elevation Data to construct most accurate land surface model of an area, and then use R, GDAL, SAGA GIS and GRASS GIS to derive some DEM derivatives and classify landforms using the 30 m resolution land surface model (DLSM). The study area is 100 by 140 km area in the vicinity of Boulder CO. Two DEM data sources include:
* 1/3rd arc-second (10 m) National Elevation Dataset (NED) [produced by USGS](https://catalog.data.gov/dataset/national-elevation-dataset-ned-1-3-arc-second-downloadable-data-collection-national-geospatial);
* 1 arc-second (30 m) global ALOS AW3D30 Digital Surface Model [produced and distributed by the Japan Aerospace Exploration Agency](http://www.eorc.jaxa.jp/ALOS/en/aw3d30/index.htm);
Both elevation rasters can be considered to represent Land Surface Model i.e. height of the land surface or altitude. The NED data set is probably more accurate / more reliable, but the ALOS AW3D30 is available globally and is based primarily on remote sensing data. We would like to produce a merged product i.e. a product that is the best combination of the two. For this we can use the LiDAR based derived samples of land surface (as these are at the order of magnitude more detailed and more accurate) as referent points representing the whole study area.
We have downloaded some 1,400 LAS files from the opentopography.org website and converted the point clouds to 30 m resolution grids estimating the land surface altitude (based on the [lidR](https://github.com/Jean-Romain/lidR) package):
```{r, eval=FALSE}
laz2grid <- function(file, res=30, out.dir, prj){
out.file = paste0(out.dir, gsub(".laz", ".tif", basename(file)))
if(!file.exists(out.file)){
require(lidR); require(raster); require(sp)
lidar = lidR::readLAS(file)
dtm = lidR::grid_terrain(lidar, res=res, method = "knnidw")
dtm.r = as.raster(dtm)
raster::projection(dtm.r) <- paste(lidar@crs)
dtm.s = raster::projectRaster(dtm.r, crs=prj, res=30)
raster::writeRaster(dtm.s, filename=out.file, options="COMPRESS=DEFLATE", overwrite=TRUE)
}
}
```
where the function `lidR::grid_terrain` generates land surface model by using only LiDAR points classified as ground.
In addition to the two elevation products, we can also add radar images of the area (ALOS PALSAR products) and NDVI images based on the Landsat cloud free images to account for the canopy and urban areas. To estimate the best combined Land Surface Model we will hence use the following formula:
```
mDLSM = f (NED, AW3D30, NDVI, HH/HV)
```
where `NED` is the National Elevation Dataset, `AW3D30` is the ALOS digital surface model, `NDVI` is the Landsat based NDVI and `HH/HV` are the PALSAR-2 radar images at 20 m resolution [@shimada2014new] bands HH (-27.7 5.3 dB) and HV (-35.8 3.0 dB). NDVI images can be used to represent the canopy height (the higher the NDVI the higher the chance that the canopy is high) and PALSAR-2 radar bands can be used to represent surface roughness, built up objects and bare rock areas.
We start by loading the training / reference points that have been derived from the [GLAS/ICESat mission](https://doi.org/10.5067/ICESAT/GLAS/DATA109) and the LiDAR based cloud data (downloaded from OpenTopography.org project and imported to R using the lidR package using the function from above):
```{r}
pnt.ices <- read.csv("./data/icesat_1533635398523.csv")
pnt.ices <- pnt.ices[,1:3]
names(pnt.ices) <- c("x","y","mDLSM")
pnt.ices$type <- "ICESAT"
coordinates(pnt.ices) <- ~ x+y
proj4string(pnt.ices) <- "+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs"
pnt.lid <- sampleRandom(raster("./data/Boulder_LiDAR_30m.tif"), size=4e4, sp=TRUE)
names(pnt.lid) <- "mDLSM"
pnt.lid$type <- "LiDAR"
pnt.ices <- spTransform(pnt.ices, pnt.lid@proj4string)
pnt.all <- rbind(pnt.ices, pnt.lid)
## filter out values outside the range:
pnt.all <- pnt.all[pnt.all$mDLSM<5000&pnt.all$mDLSM>1000,]
## 48,649 points
```
Here the `icesat.csv` file has been obtained from the [openaltimetry.org](http://openaltimetry.org/data/icesat/) and the `Boulder_LiDAR_30m.tif` is a gridded file derived using LiDAR point clouds for a part of the study area.
Next we can overlay all points over the layers of interest and create a regression matrix. We can run this in parallel to speed up processing:
```{r}
library(parallel)
library(raster)
cov.lst = c("Boulder_AW3D30s_30m_v1802.tif",
"Boulder_NED_30m.tif", "Boulder_Landsat_red_30m_2014.tif",
"Boulder_Landsat_NIR_30m_2014.tif",
"Boulder_HH_30m.tif", "Boulder_HV_30m.tif")
xB <- raster::stack(paste0("./data/", cov.lst))
## all covariates can be stacked!
if(.Platform$OS.type == "windows"){
ov.cov <- lapply(cov.lst, function(i){
raster::extract(raster(paste0("./data/", i)), pnt.all) })
} else {
ov.cov <- parallel::mclapply(cov.lst, function(i){
raster::extract(raster(paste0("./data/", i)), pnt.all) },
mc.cores = parallel::detectCores())
}
ov.cov <- data.frame(ov.cov)
names(ov.cov) = cov.lst
```
This gives the following regression matrix:
```{r}
rm.DLSM <- cbind(as.data.frame(pnt.all), ov.cov)
rm.DLSM$NDVI <- (rm.DLSM[,8] - rm.DLSM[,7]) / (rm.DLSM[,8] + rm.DLSM[,7]) * 100
str(rm.DLSM)
```
Next we can fit a Random Forest model that can be used to generate a combined estimate of the land surface elevation:
```{r}
library(ranger)
fm.DLSM <- mDLSM ~ Boulder_AW3D30s_30m_v1802.tif + Boulder_NED_30m.tif +
Boulder_HH_30m.tif + Boulder_HV_30m.tif + NDVI
sel.na <- complete.cases(rm.DLSM[,all.vars(fm.DLSM)])
summary(sel.na)
m.DLSM <- ranger(fm.DLSM, rm.DLSM[sel.na,], num.trees=85, importance="impurity")
m.DLSM
```
this is a highly accurate model with R-square above 0.99 (but with an RMSE of 30 m!) and where the most important bands are NED and AW3D30 elevation maps:
```{r}
xl1.P <- as.list(ranger::importance(m.DLSM))
print(t(data.frame(xl1.P[order(unlist(xl1.P), decreasing=TRUE)])))
```
which was expected. Note that AW3D30s seems to come somewhat closer to the training points. To produce a combined mDLSM we run the fitted model at pixels of interest. To speed up the prediction we will first prepare a tiling system for this area:
```{r}
objB <- GDALinfo("./data/Boulder_AW3D30s_30m_v1802.tif")
library(landmap)
## tile to 10km blocks:
tileB.lst <- getSpatialTiles(objB, block.x=1e4, return.SpatialPolygons=TRUE)
tileB.tbl <- getSpatialTiles(objB, block.x=1e4, return.SpatialPolygons=FALSE)
tileB.tbl$ID <- as.character(1:nrow(tileB.tbl))
```
next we prepare a function that we can use to create tiled objects and that we can then use to predict values in parallel:
```{r}
predict_mDLSM <- function(m.DLSM, i, tileB.tbl, cov.lst, in.path="./data/", out.path="./tiledB/"){
out.tif = paste0(out.path, "T_", tileB.tbl[i,"ID"], ".tif")
if(!file.exists(out.tif)){
covs.files <- paste0(in.path, cov.lst)
newdata <- readGDAL(covs.files[1], offset=unlist(tileB.tbl[i,c("offset.y","offset.x")]),
region.dim=unlist(tileB.tbl[i,c("region.dim.y","region.dim.x")]),
output.dim=unlist(tileB.tbl[i,c("region.dim.y","region.dim.x")]),
silent = TRUE)
for(j in 2:length(cov.lst)){
newdata@data[,j] <- readGDAL(covs.files[j],
offset=unlist(tileB.tbl[i,c("offset.y","offset.x")]),
region.dim=unlist(tileB.tbl[i,c("region.dim.y","region.dim.x")]),
output.dim=unlist(tileB.tbl[i,c("region.dim.y","region.dim.x")]),
silent = TRUE)$band1
}
names(newdata) <- basename(covs.files)
newdata$NDVI <- (newdata@data[,4] - newdata@data[,3]) /
(newdata@data[,4] + newdata@data[,3]) * 100
newdata$NDVI <- ifelse(is.na(newdata$NDVI), 0, newdata$NDVI)
sel.pr <- complete.cases(newdata@data)
out <- predict(m.DLSM, newdata@data[sel.pr,])
g <- as(newdata[1], "SpatialPixelsDataFrame")
g[sel.pr,"m"] = out$predictions
writeGDAL(g["m"], out.tif, type="Int16",
options="COMPRESS=DEFLATE", mvFlag=-32768)
}
}
```
Now we can predict the values of mDLSM at locations by using:
```{r}
library(snowfall)
sfInit(parallel=TRUE, cpus=parallel::detectCores())
sfExport("predict_mDLSM", "m.DLSM", "tileB.tbl", "cov.lst")
sfLibrary(rgdal)
sfLibrary(ranger)
outB.lst <- sfClusterApplyLB(1:nrow(tileB.tbl),
function(x){ predict_mDLSM(m.DLSM, x, tileB.tbl, cov.lst) })
sfStop()
## takes few minutes
```
This produces a merged digital land surface model that can be further used for spatial modeling:
```{r}
if(!file.exists("./data/Boulder_mDLSM_30m.tif")){
tB.lst <- list.files("./tiledB", pattern=".tif", full.names=TRUE)
outB.tmp <- "./data/b_list.txt"
vrtB.tmp <- "./data/boulder.vrt"
cat(tB.lst, sep="\n", file=outB.tmp)
system(paste0('gdalbuildvrt -input_file_list ', outB.tmp, ' ', vrtB.tmp))
system(paste0('gdalwarp ', vrtB.tmp,
' \"./data/Boulder_mDLSM_30m.tif\" ',
'-ot \"Int16\" -dstnodata \"-32767\" -co \"BIGTIFF=YES\" ',
'-multi -wm 2000 -co \"COMPRESS=DEFLATE\" -overwrite ',
'-r \"near\" -wo \"NUM_THREADS=ALL_CPUS\"'))
}
```
```{r comparison-shading, echo=FALSE, fig.cap="Comparison of the original AW3D30 vs the predicted Land Surface Model (mDLSM). Fitting a model to predict land surface model seems to solve the problem of artifacts / striping effects, but then it can introduce local artifacts in areas of higher vegetation and under-represented by training points.", out.width="100%"}
knitr::include_graphics("./tex/comprison_shading.png")
```
The resulting product seems to be satisfactory except in the eastern side of the study areas (plains) where RF seems to create many artificial spikes. RF is known to be very sensitive to extrapolation, so in this case it seems that plain areas were systematically under-represented and require additional training points. Note also that ranger package provides an option to derive also uncertainty of the predicted elevations in terms of lower and upper quantiles (read more in @Hengl2018RFsp) so we could theoretically run Monte Carlo simulations or similar.
We can next use the newly produced mDLSM to derive number of DEM derivatives. SAGA GIS [@gmd-8-1991-2015] allows us to run processing in parallel, hence there is no need to tile. For example, to derive a slope map we can use:
```{r, eval=FALSE}
tmp = tempfile(fileext = ".sdat")
system(paste0('gdal_translate ./data/Boulder_mDLSM_30m.tif -of \"SAGA\" ', tmp))
#system(paste0('saga_cmd ta_hydrology 15 -DEM=\"',
# gsub(".sdat", ".sgrd", tmp), '\" -TWI=\"',
# gsub(".sdat", "_twi.sgrd", tmp), '\"') )
system(paste0('saga_cmd ta_morphometry 0 -ELEVATION=\"',
gsub(".sdat", ".sgrd", tmp), '\" -SLOPE=\"',
gsub(".sdat", "_slope.sgrd", tmp),
'\" -C_PROF=\"',
gsub(".sdat", "_cprof.sgrd", tmp), '\"') )
system(paste0('gdal_translate ', gsub(".sdat", "_slope.sdat", tmp),
' ./data/Boulder_mDLSM_slope_30m.tif -scale ',
' -ot "Byte" -co \"COMPRESS=DEFLATE\"'))
```
SAGA automatically recognizes number of cores available and is thus highly suitable for processing large rasters on massive infrastructure.
To derive landform classes using the classification system of @JASIEWICZ2013147, we can also test using GRASS GIS, which is possible directly from R thanks to the spgrass7 package:
```{r, eval=FALSE}
library(rgrass7)
rname <- "./data/Boulder_mDLSM_30m.tif"
# Set GRASS environment and database location
loc <- initGRASS("/usr/lib/grass74", home="/data/tmp/",
gisDbase="GRASS_TEMP", override=TRUE)
execGRASS("r.in.gdal", flags="o", parameters=list(input=rname, output="mDLSM"))
execGRASS("g.region", parameters=list(raster="mDLSM"))
execGRASS("r.geomorphon", parameters=list(elevation="mDLSM", forms="mDLSMg"))
#plot(readRAST("mDLSMg"))
execGRASS("r.out.gdal", parameters=list(input="mDLSMg",
output="./data/Boulder_mDLSMg_30m.tif",
type="Byte", createopt="COMPRESS=DEFLATE"))
## clean-up
unlink("./GRASS_TEMP", recursive = TRUE)
unset.GIS_LOCK()
unlink_.gislock()
remove_GISRC()
```
GRASS 7 is also fairly efficient with processing large rasters [@neteler2015grass], even though parallelization needs to be implemented also with tiling. In fact, GRASS GIS can be regarded as an environment that provides applications (commands/modules), which means that, by extending the code described above one should be able to run several GRASS commands in parallel.
## References