---
title: "R workflow for forest mapping and inference from ABA prediction models"
author: "Jean-Matthieu Monnet"
date: "`r Sys.Date()`"
output:
  html_document: default
  pdf_document: default
papersize: a4
bibliography: "../bib/bibliography.bib"
---

```{r setup, include=FALSE}
# erase all
cat("\014")
rm(list = ls())
# knit options
knitr::opts_chunk$set(echo = TRUE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)
knitr::opts_chunk$set(fig.align = "center")
```

---
The code below presents a workflow to map forest parameters using calibrated prediction models (see [previous tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/area-based.2.model.calibration.Rmd)) and wall-to-wall coverage of the region of interest with airborne laser scanning. This tutorial is the last step of the so-called area-based approach. It is based on functions from `R` packages `lidaRtRee` and `lidR`.

Licence: GNU GPLv3 / [Source page](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/area-based.3.mapping.and.inference.Rmd)

Many thanks to Pascal Obstétar for checking code and improvement suggestions.

# Load data

## ABA prediction models

ABA prediction models calibrated in the previous tutorial can be loaded from the R archive "data/aba.model/output/models.rda". Three models are available: 

* a single model calibrated with the whole field dataset,
* a stratified model which is the combination of two models: one calibrated with field plots located in private forests, and the other with plots in public forests.

```{r loadModels, include = TRUE}
load(file = "../data/aba.model/output/models.rda")
```

## Airborne laser scanning data

For wall-to-wall mapping, the ALS metrics that are included in the models have to be computed for each element of a partition of the ALS acquisition. The area of interest is usually divided in square pixels which surface is similar to the one of field plots used for model calibration.

The workflow does not yet use the catalog processing capabilities of the package `lidR`. The batch processing of an ALS wall-to-wall thus cover relies on parallel processing of ALS point clouds organized in rectangular tiles, non-overlapping tiles. It is best if their borders correspond to multiple values of the pixel size.
For example purpose, two adjacent ALS tiles are provided, in two versions: 

* one with altitude values, in the folder "../data/aba.model/ALS/tiles.laz", these will be used for terrain statistics computations,
* one with (normalized) height values, in the folder "../data/aba.model/ALS/tiles.norm.laz", these will be used for tree detection and point metrics computation.

Files are too large to be hosted on the gitlab repository, they can be downloaded as a [zip file](https://drive.google.com/u/0/uc?export=download&id=1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i) from Google drive, and should be extracted in the folder "data/aba.model/ALS/" before proceeding with the processing. Files can be automatically downloaded thanks to the `googledrive` package with the following code, this requires authenticating yourself and authorizing the package to deal on your behalf with Google Drive.

```{r downloadGoogledrive, include = TRUE, eval = FALSE}
# set temporary file
temp <- tempfile(fileext = ".zip")
# download file from google drive
dl <- googledrive::drive_download(googledrive::as_id("1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i"), 
                     path = temp, 
                     overwrite = TRUE)
# unzip to folder
out <- unzip(temp, exdir = "../data/aba.model/ALS/")
# remove temporary file
unlink(temp)
```

```{r loadALS, include = TRUE, warning = FALSE}
# build catalogs
cata_height <- lidR::readALSLAScatalog("../data/aba.model/ALS/tiles.norm.laz/")
cata_altitude <- lidR::readALSLAScatalog("../data/aba.model/ALS/tiles.laz/")
lidR::opt_progress(cata_altitude) <- lidR::opt_progress(cata_height) <- FALSE
# set projection info
lidR::projection(cata_height) <- lidR::projection(cata_altitude) <- 2154
cata_height
```

## GIS data

In order to apply the corresponding models in the case of stratum specific calibration, a GIS layer defining the spatial extent of each category is required. The layer "Public4Montagnes" in the folder "data/aba.model/GIS" corresponds to public forests (Forêts publiques, ONF Paris, 2019). All remaining areas will be considered as private-owned. In order to exclude all non-forest areas from model application, another GIS layer will be used. It is the forest cover as defined by the [BD Forêt](https://inventaire-forestier.ign.fr/spip.php?article646) of IGN. Both layers are available under the [open licence Etalab 2.0](https://www.etalab.gouv.fr/licence-ouverte-open-licence).

```{r stratumExtraction, include=TRUE, message=FALSE, warning=FALSE, fig.width = 4, fig.height = 6}
# load GIS layer of public forests
public <- sf::st_read("../data/aba.model/GIS/Public4Montagnes.shp", stringsAsFactors = TRUE, quiet = TRUE)
# load GIS layer of forest mask
forest <- sf::st_read("../data/aba.model/GIS/ForestMask.shp", stringsAsFactors = TRUE, quiet = TRUE)
# set coordinate system
sf::st_crs(public) <- sf::st_crs(forest) <- 2154
```

# Forest parameters mapping

## ALS metrics mapping

First the ALS metrics used as independent variables in the ABA prediction models have to be mapped on the whole area. For this the SAME functions used for computing the metrics from the plot-extent cloud metrics should be used. Metrics are computed in each cell of the area of interest divided in square pixels. Pixel surface should be of the same size as calibration plots. Using round values is advisable in case other remote sensing data are considered for use.

```{r ALSMetrics, include=TRUE, message=FALSE, warning=FALSE, fig.width = 4, fig.height = 6}
# resolution for metrics map
resolution <- 25
```

The following paragraph show how to compute metrics for a 300 x 300 m^2^ subsample of the first ALS tile.

```{r loadTile, include=TRUE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 6}
# load ALS tile
point_cloud_height <- lidR::clip_rectangle(cata_height, 899600, 6447600, 899900, 6447900)
point_cloud_ground <- lidR::filter_ground(lidR::clip_rectangle(cata_altitude, 899600, 6447600, 899900, 6447900))
```

### Point cloud metrics

For computation of point cloud metrics, the same function used in `lidaRtRee::clouds_metrics` is now supplied to `lidR::grid_metrics`. Some metrics are displayed hereafter.

```{r pointMetrics, include=TRUE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 4}
# compute point metrics
metrics_points <- lidR::grid_metrics(point_cloud_height, aba_point_metrics_fun, res = resolution)
```

```{r plotpointMetrics, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 9, fig.height = 3}
par(mfrow = c(1, 3))
for (i in c("zmean", "zsd", "imean"))
{
  raster::plot(metrics_points[[i]], main = i)
}
```

### Tree metrics

For tree metrics, three steps are required:

* computation of the canopy height model, segmentation and extraction of trees;
* summary statistics of tree features by target pixels (values calculated based on trees which apices are inside the pixel);
* additional statistics about the percentage of cover and the mean canopy height (values calculated based on segmented tree crowns which are inside the target pixel).

It is very important that tree segmentation parameters are the same as in the calibration steps, and that the function for aggregating tree attributes into raster metrics is the also the same as in the calibration step (except for the area which might differ slightly between the field plots and target cells).

```{r treeMetrics, include=TRUE, message=FALSE, warning=FALSE, fig.width = 6, fig.height = 6}
# compute chm
chm <- lidaRtRee::points2DSM(point_cloud_height, res = aba_res_chm)
# replace NA, low and high values
chm[is.na(chm) | chm < 0 | chm > 60] <- 0
# tree top detection (same parameters as used by clouds_tree_metrics in calibration step -> default parameters)
segms <- lidaRtRee::tree_segmentation(chm)
# extraction to data.frame
trees <- lidaRtRee::tree_extraction(segms$filled_dem, segms$local_maxima, segms$segments_id)
# compute raster metrics
metrics_trees <- lidaRtRee::raster_metrics(trees[, -1],
  res = resolution,
  fun = function(x) {
    lidaRtRee::std_tree_metrics(x, resolution^2 / 10000)
  }, output = "raster"
)
# remove NAs and NaNs
metrics_trees[!is.finite(metrics_trees)] <- 0
#
# compute canopy cover in trees and canopy mean height in trees
# in region of interest, because it is not in previous step.
r_treechm <- segms$filled_dem
# set chm to 0 in non segment area
r_treechm[segms$segments_id == 0] <- NA
# compute raster metrics
dummy <- lidaRtRee::raster_metrics(r_treechm,
  res = resolution,
  fun = function(x) {
    c(
      sum(!is.na(x$filled_dem) / (resolution / aba_res_chm)^2),
      mean(x$filled_dem, na.rm = T)
    )
  },
  output = "raster"
)
names(dummy) <- c("TreeCanop_cover_in_plot", "TreeCanopy_meanH_in_plot")
metrics_trees <- raster::addLayer(metrics_trees, dummy)
```

The detected trees are plotted below, with the CHM as background.

```{r plotTrees, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 6, fig.height = 6}
par(mfrow = c(1, 1))
raster::plot(chm, main = "CHM and detected trees")
sp::plot(trees, cex = trees$h / 60, add = TRUE)
```

Some metrics maps are displayed below.

```{r plottreeMetrics, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 9, fig.height = 3}
par(mfrow = c(1, 3))
for (i in c("Tree_meanH", "Tree_sdH", "Tree_density"))
{
  raster::plot(metrics_trees[[i]], main = i)
}
```

### Terrain metrics

Terrain metrics can be computed from the altitude values of ground points. A wrapping function is required to pass `lidaRtRee::terrain_points_metrics` to `lidR::grid_metrics`.

```{r terrainMetrics, include=TRUE, message=FALSE, warning=FALSE}
# compute terrain metrics
f <- function(x, y, z) {
  as.list(lidaRtRee::terrain_points_metrics(data.frame(x, y, z)))
}
metrics_terrain <- lidR::grid_metrics(point_cloud_ground, ~ f(X, Y, Z), res = resolution)
```

```{r plotterrainMetrics, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 9, fig.height = 3}
par(mfrow = c(1, 3))
raster::plot(metrics_terrain[["altitude"]], main = "altitude")
raster::plot(metrics_terrain$azimut_gr, main = "azimut (gr)", breaks = seq(from = 0, to = 400, length.out = 81), col = rainbow(81))
raster::plot(metrics_terrain$slope_gr, main = "Slope (gr)")
```

### Batch processing of tiles

For the batch processing of tiles, parallel processing with packages `foreach` and `doFuture` is used. The processing capabilities of `lidR` are not yet used. 

```{r setupCluster, include=TRUE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 4}
library(foreach)
# create parallel frontend, specify to use two parallel sessions
doFuture::registerDoFuture()
future::plan("multisession", workers = 2L)
```

A buffering procedure is designed to handle the border effects when detecting trees at tile edges. A 10 m buffer is enough for tree metrics. One can also specify points classes to use, or apply a threshold to high points. Some parameters specified in the previous paragraphs are also required for the batch processing (output map resolution, chm resolution for tree segmentation, functions for metrics computation).

```{r paramCluster, include=TRUE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 4}
# processing parameters
b_size <- 10
# height threshold (m) for high points removal (points above this threshold are considered
# as outliers)
h_points <- 60
# points classes to retain for analysis (vegetation + ground)
# ground should be 2
class_points <- c(0, 1, 2, 3, 4, 5)
```

This first chunk of code computes tree and point metrics from the normalized ALS tiles.

```{r batchProcessing, include=TRUE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 4}
# processing by tile
metrics <- foreach::foreach(i = 1:nrow(cata_height), .errorhandling = "remove") %dopar% {
  # tile extent
  b_box <- as.numeric(cata_height@data[i, c("Min.X", "Min.Y", "Max.X", "Max.Y")])
  # load tile extent plus buffer
  a <- try(lidR::clip_rectangle(
    cata_height, b_box[1] - b_size, b_box[2] - b_size, b_box[3] + b_size,
    b_box[4] + b_size
  ))
  #
  # check if points are successfully loaded
  if (class(a) == "try-error") {
    return(NULL)
  }
  # add 'buffer' flag to points in buffer with TRUE value in this new attribute
  a <- lidR::add_attribute(
    a,
    a$X < b_box[1] | a$Y < b_box[2] | a$X >= b_box[3] | a$Y >= b_box[4],
    "buffer"
  )
  # remove unwanted point classes, and points higher than height threshold
  a <- lidR::filter_poi(a, is.element(Classification, class_points) & Z <= h_points)
  # check number of remaining points
  if (length(a) == 0) {
    return(NULL)
  }
  # set negative heights to 0
  a@data$Z[a@data$Z < 0] <- 0
  #
  # compute chm
  chm <- lidaRtRee::points2DSM(a, res = aba_res_chm)
  # replace NA, low and high values by 0
  chm[is.na(chm) | chm < 0 | chm > h_points] <- 0
  #
  # compute tree metrics
  # tree top detection (default parameters)
  segms <- lidaRtRee::tree_segmentation(chm)
  # extraction to data.frame
  trees <- lidaRtRee::tree_extraction(segms$filled_dem, segms$local_maxima, segms$segments_id)
  # remove trees outside of tile
  trees <- trees[trees$x >= b_box[1] & trees$x < b_box[3] & trees$y >= b_box[2] & trees$y < b_box[4], ]
  # compute raster metrics
  metrics_trees <- lidaRtRee::raster_metrics(trees[, -1],
    res = resolution,
    fun = function(x) {
      lidaRtRee::std_tree_metrics(x, resolution^2 / 10000)
    },
    output = "raster"
  )
  # compute canopy cover in trees and canopy mean height in trees
  # in region of interest, because it is not in previous step.
  r_treechm <- segms$filled_dem
  # set chm to NA in non segment area
  r_treechm[segms$segments_id == 0] <- NA
  # compute raster metrics
  metrics_trees2 <- lidaRtRee::raster_metrics(
    raster::crop(r_treechm, raster::extent(b_box[1], b_box[3], b_box[2], b_box[4])),
    res = resolution,
    fun = function(x) {
      c(
        sum(!is.na(x$filled_dem)) / (resolution / aba_res_chm)^2,
        mean(x$filled_dem, na.rm = T)
      )
    },
    output = "raster"
  )
  names(metrics_trees2) <- c("TreeCanopy_cover_in_plot", "TreeCanopy_meanH_in_plot")
  #
  # compute 1D height metrics
  # remove buffer points
  a <- lidR::filter_poi(a, buffer == 0)
  #
  if (length(a) == 0) {
    return(NULL)
  }
  # all points metrics
  metrics_points <- lidR::grid_metrics(a, aba_point_metrics_fun, res = resolution)
  #
  # extend / crop to match metrics_points
  metrics_trees <- raster::extend(metrics_trees, metrics_points, values = NA)
  metrics_trees2 <- raster::extend(metrics_trees2, metrics_points, values = NA)
  metrics_trees <- raster::crop(metrics_trees, metrics_points)
  metrics_trees2 <- raster::crop(metrics_trees2, metrics_points)
  # merge rasterstacks
  metrics <- metrics_points
  metrics <- raster::addLayer(metrics, metrics_trees)
  metrics <- raster::addLayer(metrics, metrics_trees2)
  return(metrics)
}
# mosaic rasters
names_metrics <- names(metrics[[1]])
metrics_map <- do.call(raster::merge, metrics)
names(metrics_map) <- names_metrics
```


The second chunk of code computes terrain metrics from the ALS tiles with altitude values.

```{r batchProcessingTerrain, include=TRUE, message=FALSE, warning=FALSE}
# function to compute terrain metrics to input to grid_metrics
f <- function(x, y, z) {
  as.list(lidaRtRee::terrain_points_metrics(data.frame(x, y, z)))
}
#
metrics_terrain <- foreach::foreach(i = 1:nrow(cata_altitude), .errorhandling = "remove") %dopar% {
  # tile extent
  b_box <- as.numeric(cata_altitude@data[i, c("Min.X", "Min.Y", "Max.X", "Max.Y")])
  # load tile extent plus buffer
  a <- try(lidR::clip_rectangle(
    cata_altitude, b_box[1], b_box[2], b_box[3],
    b_box[4]
  ))
  # check if points are successfully loaded
  if (class(a) == "try-error") {
    return(NULL)
  }
  # keep only ground points
  a <- lidR::filter_ground(a)
  lidR::grid_metrics(a, ~ f(X, Y, Z), res = resolution)
}
# mosaic rasters
names_metrics <- names(metrics_terrain[[1]])
metrics_terrain <- do.call(raster::merge, metrics_terrain)
names(metrics_terrain) <- names_metrics
```

Finally all metrics are combined in a single object.

```{r allMetrics, include=TRUE, message=FALSE, warning=FALSE}
metrics_terrain <- raster::extend(metrics_terrain, metrics_map, values = NA)
metrics_terrain <- raster::crop(metrics_terrain, metrics_map)
metrics_terrain <- raster::dropLayer(metrics_terrain, "adjR2.plane")
# merge rasterstacks
metrics_map <- raster::addLayer(metrics_map, metrics_terrain)
```

If the metrics maps are to be displayed in external software, the tif format can be used to produce one single with all bands. Meanwhile band names are not retained, so it is useful to save them in a separated R archive.

```{r saveMetrics, include=TRUE, message=FALSE, warning=FALSE}
metrics_names <- names(metrics_map)
save(metrics_names, file = "../data/aba.model/output/metrics_names.rda")
raster::writeRaster(metrics_map, file = "../data/aba.model/output/metrics_map.tif", overwrite = TRUE)
save(metrics_map, file = "../data/aba.model/output/metrics_map.rda")
```

## Forest parameters mapping

### Single (not stratified) model

Forest parameters are then mapped using the mapped metrics and the previously calibrated model with the function `lidaRtRee::aba_predict`.

```{r mapForest, include=TRUE, message=FALSE, warning=FALSE, fig.width = 5.1, fig.height = 3}
prediction_map <- lidaRtRee::aba_predict(model_aba, metrics_map)
raster::plot(prediction_map, main = "prediction map")
```

### Stratified models

In case strata-specific models have been calibrated, it is necessary to add a layer containing the strata information in the metrics raster. For this, the existing vector layer of public forests is rasterized and added to the metrics raster. The levels in the layer metadata are filled, both in the ID (code: numeric) and stratum (label: character) fields.

```{r rasterizeStrat, include=TRUE, message=FALSE, warning=FALSE}
# rasterize layer of public forest using value = 1 in polygons
metrics_map$stratum <- raster::rasterize(public, metrics_map, field = 1)
# set to 0 in non-public forests areas
metrics_map$stratum[is.na(metrics_map$stratum)] <- 0
# label levels in the raster metadata
levels(metrics_map$stratum) <- data.frame(ID = c(0, 1), stratum = c("private", "public"))
# crop public vector
public_cropped <- sf::st_crop(public, raster::extent(metrics_map))
```

Then the function `lidaRtRee::aba_predict` is used to produce the map. If the input is an object containing the stratified model and if the metrics map contains the strata layer, then the model directly maps the output by retaining the corresponding model in each strata.

```{r mapForestStratified, include=TRUE, message=FALSE, warning=FALSE}
# map with stratified model
prediction_map_mixed <- lidaRtRee::aba_predict(model_aba_stratified_mixed, metrics_map, stratum = "stratum")
# for checking purposes
# extracted "private" stratum model from combined model
model_private <- list(model = model_aba_stratified_mixed$model$private, stats = model_aba_stratified_mixed$stats["private", ])
# produce corresponding map
prediction_map_private <- lidaRtRee::aba_predict(model_private, metrics_map)
# extracted "public" stratum model from combined model
model_public <- list(model = model_aba_stratified_mixed$model$public, stats = model_aba_stratified_mixed$stats["public", ])
# produce corresponding map
prediction_map_public <- lidaRtRee::aba_predict(model_public, metrics_map)
```

```{r plotmapForestStratified, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 9, fig.height = 5}
# plot all maps
par(mfrow = c(2, 2))
# plot raster and vector
raster::plot(metrics_map$stratum, main = "Ownership", legend = FALSE, col = c("green", "blue"), xaxt = "n", yaxt = "n")
legend("bottom", legend = c("private", "public"), fill = c("green", "blue"))
limits <- range(c(raster::values(prediction_map_private), raster::values(prediction_map_public)))
plot(public_cropped, col = NA, add = TRUE)
raster::plot(prediction_map_mixed, main = "Stratified model", zlim = limits, xaxt = "n", yaxt = "n")
plot(public_cropped, col = NA, add = TRUE)
raster::plot(prediction_map_private, main = "Private model", zlim = limits, xaxt = "n", yaxt = "n", legend = FALSE)
raster::plot(prediction_map_public, main = "Public model", zlim = limits, xaxt = "n", yaxt = "n", legend = FALSE)
```

### Forest mask and thresholds

To avoid applying models in non-forest areas and to remove the extremes values that may have been predicted due to outliers in the ALS and metrics values, the function `lidaRtRee::clean_raster` can be applied:

* it applies a lower and upper threshold to map values,
* it sets to 0 the NA values in the map,
* it then multiplies the map with a mask where non-forest cells should have an NA value for those to propagate in the prediction map.

The first step is to rasterize the forest mask. In the current example, all the area is forested so we will used the polygon with ID 27 as mask.

```{r cleanMap, include=TRUE, message=FALSE, warning=FALSE}
# rasterize the forest mask
forest_mask <- raster::rasterize(forest, metrics_map, field = "FID")
# set values to 1 in polygon with ID 27, 0 outside
raster::values(forest_mask) <- ifelse(raster::values(forest_mask) == 27, 1, NA)
# crop forest vector
forest_cropped <- sf::st_crop(forest, raster::extent(metrics_map))
# clean raster
# threshold values and set to NA outside of forest
prediction_map_mixed_clean <- lidaRtRee::clean_raster(prediction_map_mixed, c(0, 80), forest_mask)
```

```{r plotcleanMap, include=TRUE, echo=FALSE, message=FALSE, warning=FALSE, fig.width = 12, fig.height = 2.5}
par(mfrow = c(1, 3))
raster::plot(forest_mask, main = "Forest mask")
plot(sf::st_geometry(forest_cropped), add = TRUE)
raster::plot(prediction_map_mixed, main = "Stratified predictions map")
raster::plot(prediction_map_mixed_clean, main = "Masked map")
plot(sf::st_geometry(forest_cropped), add = TRUE)
```

# Inference

Work in progress...