---
title: "R workflow for tree segmentation from ALS data"
author: "Jean-Matthieu Monnet"
date: "`r Sys.Date()`"
output:
  html_document: default
  pdf_document: default
papersize: a4
bibliography: "../bib/bibliography.bib"
---

```{r setup, include=FALSE}
# erase all
cat("\014")
rm(list = ls())
# knit options
knitr::opts_chunk$set(echo = TRUE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)
knitr::opts_chunk$set(fig.align = "center")
# for display of rgl in html
knitr::knit_hooks$set(webgl = rgl::hook_webgl)
# output to html
html <- TRUE
```

---
The code below presents a tree segmentation workflow from Airborne Laser Scanning (lidar remote sensing) data. The workflow is based on functions from `R` packages [lidaRtRee](https://cran.r-project.org/package=lidaRtRee) (tested with version `r packageVersion("lidaRtRee")`) and [lidR](https://cran.r-project.org/package=lidR) (tested with version `r packageVersion("lidR")`), and it includes the following steps:

* treetop detection, 
+ crown segmentation,
+ accuracy assessment with field inventory, 
+ species classification

Steps 1 and 3 are documented in [@Monnet10; @Monnet11c]. The detection performance of this algorithm was evaluated in a benchmark [@Eysn15].

Licence: GNU GPLv3 / [source page](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/tree.detection.Rmd) 

Many thanks to Pascal Obstétar for checking code and improvement suggestions.

## Material
### Field inventory

The field inventory corresponds to a 50 m x 50 m plot located in the Chablais mountain (France) [@Monnet11c, pp. 34]. All trees with a diameter at breast height above 7.5 cm are inventoried. The data is available in package `lidaRtRee`.

```{r loadTreeInventory, include = TRUE}
# load dataset from package (default)
data(tree_inventory_chablais3, package = "lidaRtRee")
```
Otherwise you can load your own data provided positions and heights are measured.
```{r prepareTreeInventory, eval=FALSE}
# import field inventory
fichier <- "chablais3_listeR.csv"
tree.inventory <- read.csv(file = fichier, sep = ";", header = F, stringsAsFactors = TRUE)
names(tree_inventory_chablais3) <- c("x", "y", "d", "h", "n", "s", "e", "t")
# save as rda for later access
# save(tree_inventory_chablais3,file="tree_inventory_chablais3.rda")
```
 Attributes are:

* `x` easting coordinate in Lambert 93
+ `y` northing coordinate in Lambert 93
+ `d` diameter at breast height (cm)
+ `h` tree height (m)
+ `n` tree number
+ `s` species abreviated as GESP (GEnus SPecies)
+ `e` appearance (0: missing or lying, 1: normal, 2: broken treetop, 3: dead with branches, 4: snag)
+ `t` tilted (0: no, 1: yes)
```{r displaytreeinventory, echo=FALSE}
head(tree_inventory_chablais3, n = 3L)
```

Function `plot_tree_inventory` is designed to plot forest inventory data. Main species are European beech (*Fagus sylvatica*), Norway spruce (*Picea abies*) and silver fir (*Abies alba*).

```{r plotTreeInventory, include = TRUE, out.width = '50%', fig.dim=c(6.5, 5.5)}
# display inventoried trees
lidaRtRee::plot_tree_inventory(tree_inventory_chablais3
  [, c("x", "y")],
  tree_inventory_chablais3$h,
  species = as.character(tree_inventory_chablais3$s)
)
```
The `ggplot2` package also provides nice outputs.

```{r ggplot2, include = TRUE, out.width = '50%', fig.dim=c(5.5, 5.5)}
# use table of species of package lidaRtRee to always use the same color for a given species
plot.species <- lidaRtRee::species_color()[levels(tree_inventory_chablais3$s), "col"]
library(ggplot2)
ggplot(tree_inventory_chablais3, aes(x = x, y = y, group = s)) +
  geom_point(aes(color = s, size = d)) +
  coord_sf(datum = 2154) +
  scale_color_manual(values = plot.species) +
  scale_radius(name = "Diameter") +
  geom_text(aes(label = n, size = 20), hjust = 0, vjust = 1) +
  labs(color = "Species") # titre de la légende
```

We define the region of interest (ROI) to crop ALS data to corresponding extent before further processing. ROI is set on the extent of tree inventory, plus a 10 meter buffer.

```{r roi, include = TRUE}
# buffer to apply around ROI (meters)
ROI_buffer <- 10
# ROI limits
ROI_range <- data.frame(round(apply(tree_inventory_chablais3[, c("x", "y")], 2, range)))
```

### Airborne Laser Scanning data

In this tutorial, ALS data available in the `lidaRtRee` package is used. 

```{r loadALS, include = TRUE, message = FALSE}
# load data in package lidaRtRee (default)
LASfile <- system.file("extdata", "las_chablais3.laz", package="lidaRtRee")
las_chablais3 <- lidR::readLAS(LASfile)
```

Otherwise, ALS data is loaded with functions of package `lidR`. First a catalog of files containing ALS data is built. Then points located inside our ROI are loaded.

```{r prepareALS, eval=FALSE, message = FALSE, warning = FALSE}
# directory for laz files
lazdir <- "../data/tree.detection"
# build catalog of files
# specifying ALS data
cata <- lidR::readALSLAScatalog(lazdir)
# set coordinate system
lidR::projection(cata) <- 2154
# extract points in ROI plus additional buffer
las_chablais3 <- lidR::clip_rectangle(
  cata,
  ROI_range$x[1] - ROI_buffer - 5,
  ROI_range$y[1] - ROI_buffer - 5,
  ROI_range$x[2] + ROI_buffer + 5,
  ROI_range$y[2] + ROI_buffer + 5
)
# save as rda for easier access:
# save(las_chablais3, file="las_chablais3.rda", compress = "bzip2")
```

## Data preparation
### Digital Elevation Models

From the ALS point cloud, digital elevation models are computed [@Monnet11c, pp. 43-46]. The Digital Terrain Model (DTM) represents the ground surface, it is computed by bilinear interpolation of points classified as ground. The Digital Surface Model (DSM) represents the canopy surface, it is computed by retaining in each raster's pixel the value of the highest point included in that pixel. The Canopy Height Model (CHM) is the normalized height of the canopy. It is computed by subtracting the DTM to the DSM.

```{r computeDEMs, include=TRUE, warning = FALSE}
# define extent and resolution of raster
output_raster <- raster::raster(resolution = 0.5,
                                xmn = ROI_range$x[1] - ROI_buffer,
                                xmx = ROI_range$x[2] + ROI_buffer,
                                ymn = ROI_range$y[1] - ROI_buffer,
                                ymx = ROI_range$y[2] + ROI_buffer
                                )
# terrain model computed from points classified as ground
dtm <- lidR::grid_terrain(las_chablais3, output_raster, lidR::tin())
# dtm_lidaRtRee <- lidaRtRee::points2DTM(lidR::filter_ground(las_chablais3),
# res = 0.5, ROI_range$x[1] - ROI_buffer,
# ROI_range$x[2] + ROI_buffer, ROI_range$y[1] - ROI_buffer,
# ROI_range$y[2] + ROI_buffer
# )
# surface model
dsm <- lidR::grid_canopy(lidR::clip_roi(las_chablais3, raster::extent(output_raster)), output_raster, lidR::p2r())
# dsm_lidaRtRee <- lidaRtRee::points2DSM(las_chablais3,
#   res = 0.5, dtm@extent@xmin,
#   dtm@extent@xmax, dtm@extent@ymin, dtm@extent@ymax
# )
# canopy height model
chm <- dsm - dtm
```

```{r plotDEMs, echo=FALSE, fig.width = 12, fig.height = 4.5, out.width='100%'}
par(mfrow = c(1, 3))
# display DTM
raster::plot(dtm, main = "DTM")
# display DSM
raster::plot(dsm, main = "DSM")
# display CHM
raster::plot(chm, main = "CHM")
```

### Visual comparison of field inventory and ALS data

A plot mask is computed from the inventoried positions, using a height-dependent buffer. Indeed, tree tops are not necessarily located verticaly above the trunk positions. In order to compare detected tree tops with inventoried trunks, buffers have to be applied to trunk positions to account for the non-verticality of trees.

```{r computeMask, include=TRUE, message=FALSE}
# plot mask computation based on inventoried positions
# convex hull of plot
mask_chull <- lidaRtRee::raster_chull_mask(tree_inventory_chablais3[, c("x", "y")], dsm)
# union of buffers around trees
mask_tree <- lidaRtRee::raster_xy_mask(
  tree_inventory_chablais3[, c("x", "y")],
  2.1 + 0.14 * tree_inventory_chablais3$h, dsm
)
# union of convexHull and tree buffers
mask_plot <- max(mask_chull, mask_tree)
# vectorize plot mask
mask_plot_v <- raster::rasterToPolygons(mask_plot, function(x) (x == 1), dissolve = T)
```
Displaying inventoried trees on the CHM shows a pretty good agreement of crowns visible in the CHM with trunk locations and sizes.

```{r plotPlot, include = TRUE, out.width = '70%', fig.dim=c(6.5, 4.5), warnings=FALSE}
# display CHM
raster::plot(chm,
  col = gray(seq(0, 1, 1 / 255)),
  main = "Canopy Height Model and tree positions"
)
# add inventoried trees
lidaRtRee::plot_tree_inventory(tree_inventory_chablais3[, c("x", "y")],
  tree_inventory_chablais3$h,
  species = as.character(tree_inventory_chablais3$s), add = TRUE
)
# display plot mask
raster::plot(mask_plot_v, border = "red", add = TRUE)
```

## Tree delineation
### Segmentation
Tree segmentation is performed on the Canopy Height Model by using a general function (`lidaRtRee::tree_segmentation`) which consists in the following steps:

* image pre-processing (non-linear filtering and smoothing for noise removal),
+ local maxima filtering and selection for apex (local maximum) detection,
+ image segmentation with a watershed algorithm for crown delineation.

The first two steps are documented in @Monnet11c, pp. 47-52.
```{r plotSegm, include=TRUE, fig.width = 12, fig.height = 4.5, out.width='100%'}
# tree detection (default settings), applied on canopy height model
segms <- lidaRtRee::tree_segmentation(chm)
#
par(mfrow = c(1, 3))
# display pre-processed chm
raster::plot(segms$smoothed_dem, main = "Pre-processed CHM")
# display selected local maxima
raster::plot(segms$local_maxima, main = "Selected local maxima")
# display segments, except ground segment
dummy <- segms$segments_id
dummy[dummy == 0] <- NA
raster::plot(dummy %% 8, main = "Segments (random colors)", col = rainbow(8), legend = FALSE)
```

### Extraction of apices positions and attributes
A `data.frame` of detected apices located in the plot mask is then extracted with the function `tree_extraction.` Segments can be converted from raster to polygons but the operation is quite slow. Attributes are :

* `id`: apex id
+ `x`: easting coordinate of apex
+ `y`: northing coordinate of apex
+ `h`: height of apex
+ `dom_radius`: distance of apex to nearest higher pixel of CHM
+ `s`: crown surface
+ `v`: crown volume
+ `sp`: crown surface inside plot
+ `vp`: crown volume inside plot


```{r plotSegmTrees, include=TRUE, out.width = '50%', fig.dim=c(4.5, 4.5)}
# tree extraction only inside plot mask for subsequent comparison
apices <- lidaRtRee::tree_extraction(
  segms$filled_dem,
  segms$local_maxima,
  segms$segments_id, mask_plot
)
head(apices, n = 3L)
# convert segments from raster to polygons
segments_v <- raster::rasterToPolygons(segms$segments_id, dissolve = T)
#
# display initial image
raster::plot(chm, col = gray(seq(0, 1, 1 / 255)), main = "CHM and detected positions")
# display segments border
sp::plot(segments_v, border = "white", add = T)
# display plot mask
sp::plot(mask_plot_v, border = "red", add = T)
# display detected apices
graphics::points(apices$x, apices$y, col = "blue", cex = apices$h / 20, pch = 2)
```

## Detection evaluation
### Tree matching

To assess detection accuracy, reference (field) trees should be linked to detected apices. Despite the possibility of error, automated matching has the advantage of making the comparison of results from different algorithms and settings reproducible and objective. The algorithm presented below is based on the 3D distance between detected treetops and inventory positions and heights [@Monnet10]. It returns an object with the pairs of matched reference trees and detected apices. The function `lidaRtRee::plot_matched` then plots the results.

```{r plotMached, include=TRUE, out.width = '70%', fig.dim=c(6.5, 4.5)}
# match detected apices with field trees based on relative distance of apices
matched <- lidaRtRee::tree_matching(
  tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h)
)
# display matching results
lidaRtRee::plot_matched(
  tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h), matched, chm, mask_plot_v
)
```

### Detection accuracy

```{r detectionStats, include=FALSE}
# height histogram of detections
detection_stats <- lidaRtRee::hist_detection(
  tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h), matched
)
```

Detection accuracy is evaluated thanks to the number of correct detections (`r detection_stats$true_detections`), false detections (`r detection_stats$false_detections`) and omissions (`r detection_stats$omissions`). In heterogeneous stands, detection accuracy is height-dependent, it is informative to display those categories on a height histogram.

```{r plotDetection, include=TRUE, out.width = '40%', fig.dim=c(4.5, 4.5)}
# height histogram of detections
detection_stats <- lidaRtRee::hist_detection(
  tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h), matched
)
```

### Height estimation accuracy

```{r heightRegression, include=FALSE}
height_reg <- lidaRtRee::height_regression(tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h),
  matched,
  species = tree_inventory_chablais3$s
)
```

For true detections, estimated heights can be compared to field-measured heights. Here, the root mean square error is `r round(height_reg$stats$rmse,1)`m, while the bias is `r round(height_reg$stats$bias,1)`m. The linear regression is displayed hereafter.

```{r plotRegression, include=TRUE, out.width = '40%', fig.dim=c(4.5, 4.5)}
# linear regression between reference height and estimated height
height_reg <- lidaRtRee::height_regression(tree_inventory_chablais3[, c("x", "y", "h")],
  cbind(apices@coords, apices$h),
  matched,
  species = tree_inventory_chablais3$s
)
```

## Species classification

The previous steps also output a segmentation of the CHM, i.e. each detected apex is associated to a crown segment in 2D. Next we aim at checking for relationships between the point cloud contained in the segment, and the species of the largest field tree contained in the segment. The processing steps are:

* calculate statistical indices describing the point cloud for each segment,
* extract the highest reference tree from each segment,
* exploratory analysis of the relationship between the indices and the species.

### Points in segments

Before computation of point cloud metrics in each segment, the whole point cloud is normalized to convert point altitude to height above ground. Points are then labeled with the id of the segment they belong to. A list of LAS objects corresponding to the segments is then created. 

```{r pointsSegments, include=TRUE, warning=FALSE, message=FALSE}
# normalize point cloud
lasn <- lidR::normalize_height(las_chablais3, lidR::tin())
# add segment id in LAS object
lasn <- lidR::merge_spatial(lasn, segms$segments_id, "seg_id")
# put all seg_id values in ordered list
liste_seg_id <- sort(unique(lasn$seg_id))
# set names of list equal to values
names(liste_seg_id) <- liste_seg_id
# extract point cloud for each segment id in a list
las_l <- lapply(liste_seg_id, function(x) {lidR::filter_poi(lasn, seg_id == x)})
```

### Metrics computation

Basic point cloud metrics are computed in each segment (maximum, mean, standard deviation of height, mean and standard deviation of intensity), with the function `lidaRtRee::clouds_metrics` which applies a function to all point clouds in a list, by passing it to `lidR::cloud_metrics`. Please refer to the help of this last function for the expected syntax of the user-defined function to compute metrics.
```{r metrics, include=TRUE, eval=TRUE}
# compute basic las metrics in each segment
metrics <- lidaRtRee::clouds_metrics(las_l, func = ~ list(
  maxZ = max(Z), meanZ = mean(Z),
  sdZ = sd(Z), meanI = mean(Intensity),
  sdI = sd(Intensity)
))
# add segment id attribute
metrics$seg_id <- row.names(metrics)
head(metrics, n = 3L)
```

### Merge with reference trees and detected apices

Computed metrics are merged with reference trees and detected apices, thanks to the segment id.
```{r mergeMetrics, include=TRUE, eval=TRUE}
# associate each reference tree with the segment that contains its trunk.
tree_inventory_chablais3$seg_id <- raster::extract(
  segms$segments_id,
  tree_inventory_chablais3[, c("x", "y")]
)
# create new data.frame by merging metrics and inventoried trees based on segment id
tree_metrics <- base::merge(tree_inventory_chablais3, metrics)
# remove non-tree segment
tree_metrics <- tree_metrics[tree_metrics$seg_id != 0, ]
# add metrics to extracted apices data.frame
apices <- base::merge(apices, metrics, by.x = "id", by.y = "seg_id", all.x = T)
```

Plotting the reference trees with the mean intensity of lidar points in the segments shows that when they are dominant, broadleaved trees tend to have higher mean intensity than coniferous trees.

```{r Intensity, include=TRUE, out.width = '70%', fig.dim=c(6.5, 4.5)}
# create raster of segment' mean intensity
r_mean_intensity_segm <- segms$segments_id
# match segment id with id in metrics data.frame
dummy <- match(raster::values(r_mean_intensity_segm), apices$id)
# replace segment id with corresponding mean intensity
raster::values(r_mean_intensity_segm) <- apices$meanI[dummy]
# display tree inventory with mean intensity in segment background
raster::plot(r_mean_intensity_segm, main = "Mean intensity in segment")
# display tree inventory
lidaRtRee::plot_tree_inventory(tree_inventory_chablais3[, c("x", "y")],
  tree_inventory_chablais3$h,
  species = as.character(tree_inventory_chablais3$s), add = T
)
```

### Exploratory analysis

A boxplot of mean intensity in segments per species shows that mean intensity distribution is different between species. The analysis can be run by considering only the highest inventoried trees in each segment, as smaller trees are likely to be suppressed and have smaller contribution to the point cloud. 

```{r metricsByHighestTree, include=FALSE, eval=TRUE}
# create new variable orderer by segment id then decreasing height
tree_metrics_h <- tree_metrics[order(tree_metrics$seg_id, -tree_metrics$h), ]
i <- 2
# leave only first (highest) tree per segment id
while (i < nrow(tree_metrics_h)) {
  if (tree_metrics_h$seg_id[i] == tree_metrics_h$seg_id[i + 1]) {
    tree_metrics_h <- tree_metrics_h[-(i + 1), ]
  } else {
    i <- i + 1
  }
}
tree_metrics_h$s <- factor(tree_metrics_h$s)
```

```{r boxplot1, include=TRUE, out.width = '80%', fig.dim=c(9.5, 5.5)}
par(mfrow = c(1, 2))
boxplot(meanI ~ s,
  data = tree_metrics[, c("s", "maxZ", "meanZ", "sdZ", "meanI", "sdI")],
  ylab = "Mean intensity in segment", xlab = "Specie",
  main = "All inventoried trees", las = 2, varwidth = TRUE
)
boxplot(meanI ~ s,
  data = tree_metrics_h, ylab = "Mean intensity in segment",
  xlab = "Specie", main = "Highest inventoried tree in segment", las = 2,
  varwidth = TRUE
)
```

```{r boxplot2, include=FALSE, out.width = '100%', fig.dim=c(9.5, 7.5)}
par(mfrow = c(2,3))
boxplot(maxZ ~ s,
  data = tree_metrics_h, ylab = "Max height in segment",
  xlab = "Specie", main = "Max height by species", las = 2,
  varwidth = TRUE
)
boxplot(meanZ ~ s,
  data = tree_metrics_h, ylab = "Mean height in segment",
  xlab = "Specie", main = "Mean height by species", las = 2,
  varwidth = TRUE
)
boxplot(sdZ ~ s,
  data = tree_metrics_h, ylab = "Sd(height) in segment",
  xlab = "Specie", main = "Sd(height) by species", las = 2,
  varwidth = TRUE
)
boxplot(meanI ~ s,
  data = tree_metrics_h, ylab = "Mean intensity in segment",
  xlab = "Specie", main = "Mean intensity by species", las = 2,
  varwidth = TRUE
)
boxplot(sdI ~ s,
  data = tree_metrics_h, ylab = "Sd(intensity) in segment",
  xlab = "Specie", main = "Sd(intensity) by species", las = 2,
  varwidth = TRUE
)
```

A linear discriminant analysis shows that it might be possible to discriminate between spruce, fir and the deciduous species, thanks to a combination of height and intensity variables.

```{r AFD, include=TRUE, out.width = '60%', fig.dim=c(6, 6)}
# principal component analysis
pca <-
  ade4::dudi.pca(tree_metrics_h[, c("maxZ", "meanZ", "sdZ", "meanI", "sdI")], scannf = F)
# linear discriminant analysis
lda <- ade4::discrimin(pca, tree_metrics_h$s, scannf = F, nf = 2)
plot(lda)
```

The following line creates a new factor variable `Groups` with three groups: the species PIAB, ABAL and all others together.

```{r AFD.groups, include=TRUE, out.width = '60%', fig.dim=c(6, 6)}
tree_metrics_h$Groups <- tree_metrics_h$s
levels(tree_metrics_h$Groups)[!is.element(levels(tree_metrics_h$Groups), c("ABAL", "PIAB"))] <-
  "Other"
```

A linear discriminant analysis is performed on this new variable.
```{r AFD.groups.update, include=TRUE, out.width = '60%', fig.dim=c(6, 6)}
lda <-
  ade4::discrimin(pca, tree_metrics_h$Groups, scannf = F, nf = 2)
plot(lda)
```

### Classification

The function `MASS::lda` is used for prediction purposes. First, the cross-validation option (`CV = TRUE`) is chosen in order to produce a confusion matrix in a prediction case.
```{r AFD.MASS, include=TRUE, out.width = '60%', fig.dim=c(6, 6)}
lda_MASS <-
  MASS::lda(tree_metrics_h[, c("maxZ", "meanZ", "sdZ", "meanI", "sdI")],
            tree_metrics_h$Groups, CV = TRUE)
# confusion matrix
matrix_confusion <- table(tree_metrics_h$Groups, lda_MASS$class)
matrix_confusion
# percentage of good classification
round(sum(diag(matrix_confusion)) / sum(matrix_confusion) * 100, 1)
# confidence interval of the percentage
binom.test(sum(diag(matrix_confusion)), sum(matrix_confusion))$conf.int
```

Then the model is fitted and applied to all segments of the map.
```{r prediction.MASS, include=TRUE, fig.dim=c(9, 9)}
# build model
lda_MASS <-
  MASS::lda(tree_metrics_h[, c("maxZ", "meanZ", "sdZ", "meanI", "sdI")],
            tree_metrics_h$Groups)
# apply model to metrics
metrics$predicted_s <-
  predict(lda_MASS, metrics[, c("maxZ", "meanZ", "sdZ", "meanI", "sdI")])$class
# apply model to trees
tree_metrics_h$predicted_s <- predict(lda_MASS,
                                      tree_metrics_h[, c("maxZ", "meanZ", "sdZ", "meanI", "sdI")])$class
```

To display the classification results, an image of the classified segments is created and the highest field trees in each segment are displayed on top of it. Detentions are correct when:

* purple dots are on purple segments (correct ABAL classification),
* blue dots are on blue segments (correct PIAB classification),
* other colors are on green segments (correct "others" classification).

```{r display.prediction.MASS, include=TRUE, fig.dim=c(7, 7)}
# create image of predicted species
species <- segms$segments_id
# replace segment id by predicted species in image
raster::values(species) <-
  metrics$predicted_s[match(raster::values(segms$segments_id), metrics$seg_id)]
# remove ground segment
species[segms$segments_id == 0] <- NA
# convert to factor raster
species <- raster::as.factor(species)
# build raster attribute table rat
rat <- raster::levels(species)[[1]]
rat$Species <- levels(metrics$predicted_s)
levels(species)[[1]] <- rat
# retrieve reference colors
target_col <- lidaRtRee::species_color()[rat$Species, "col"]
# set NA color to green
target_col[is.na(target_col)] <- "green"
# display results
raster::plot(species, col = target_col, legend = FALSE)
legend("bottomright", legend=rat$Species, fill=target_col)
sp::plot(segments_v, add = TRUE, border = "white")
lidaRtRee::plot_tree_inventory(tree_metrics_h
  [, c("x", "y")],
  tree_metrics_h$h,
  bg = lidaRtRee::species_color()[as.character(tree_metrics_h$s), "col"],
  col = "black",
  pch = 21, add = TRUE
)
```
```{r saveGIS, include=FALSE, warning=FALSE, message=FALSE}
# # extract all apices
# apices_all <- lidaRtRee::tree_extraction(segms$filled_dem, segms$local_maxima, segms$segments_id)
# # save outputs
# raster::writeRaster(chm, file = "./data/output/chm.tif", overwrite = TRUE)
# raster::writeRaster(species, file = "./data/output/r_species.tif", overwrite = TRUE)
# raster::writeRaster(segms$segments_id, file = "./data/output/r_segments.tif", overwrite = TRUE)
# rgdal::writeOGR(segments_v, "./data/output/", "v_segments", driver = "ESRI Shapefile", overwrite = TRUE)
# write.csv(apices_all, "./data/output/apices.csv", row.names = FALSE)
# write.csv(tree_metrics_h, "./data/output/correct_detections.csv", row.names = FALSE)
# write.csv(tree_inventory_chablais3, "./data/output/tree_inventory_chablais.csv", row.names = FALSE)
```
```{r test, include=FALSE, fig.dim=c(9, 9)}
# add attributes of apices to polygons by merging
# !!!! for some reason the segment_id is messed up during the process
# segments_v@data <- merge(segments_v@data, metrics, by.x = "segments_id", by.y = "seg_id")
# remove ground segment
# segments_v <- segments_v[segments_v$segments_id != 0,]
```

## Display point cloud

The point cloud can be displayed colored by segment, with poles at the location of inventoried trees.

```{r displayPointCloud, include=TRUE, eval=html, webgl=TRUE, fig.width=6, fig.height=6, warning=FALSE}
rgl::par3d(mouseMode = "trackball") # parameters for interaction with mouse
# select segment points and offset them to avoid truncated coordinates in 3d plot
points.seg <- lasn@data[which(lasn@data$seg_id != 0), c("X", "Y", "Z", "seg_id")]
points.seg$X <- points.seg$X - 974300
points.seg$Y <- points.seg$Y - 6581600
# plot point cloud
rgl::plot3d(points.seg[, c("X", "Y", "Z")], col = points.seg$seg_id %% 10 + 1, aspect = FALSE)
#
# add inventoried trees
tree_inventory_chablais3$z <- 0
for (i in 1:nrow(tree_inventory_chablais3))
{
  rgl::lines3d(
    rbind(
      tree_inventory_chablais3$x[i] - 974300,
      tree_inventory_chablais3$x[i] - 974300
    ),
    rbind(
      tree_inventory_chablais3$y[i] - 6581600,
      tree_inventory_chablais3$y[i] - 6581600
    ),
    rbind(
      tree_inventory_chablais3$z[i],
      tree_inventory_chablais3$z[i] + tree_inventory_chablais3$h[i]
    )
  )
}
```

```{r plotTreeModel1, include=FALSE, eval=FALSE}
# Inventoried trees

# Using package rLiDAR, a 3d view of the field inventory can be displayed

# shape different between coniferous / deciduous
rgl::rgl.open()
rgl::rgl.bg(color = "white")
for (i in 1:nrow(tree_inventory_chablais3))
{
  if (is.na(tree_inventory_chablais3$h[i]) |
      is.na(tree_inventory_chablais3$d[i])) {
    next
  }
  if (!is.element(
    as.character(tree_inventory_chablais3$s[i]),
    c("ABAL", "PIAB", "TABA")
  )) {
    rLiDAR::LiDARForestStand(
      crownshape = "halfellipsoid",
      CL = 0.6 * tree_inventory_chablais3$h[i],
      CW = tree_inventory_chablais3$h[i] / 4,
      HCB = 0.4 * tree_inventory_chablais3$h[i],
      dbh = tree_inventory_chablais3$d[i] / 50,
      resolution = "high",
      X = tree_inventory_chablais3$x[i],
      Y = tree_inventory_chablais3$y[i],
      mesh = F
    )
  } else {
    rLiDAR::LiDARForestStand(
      crownshape = "cone",
      CL = 0.5 * tree_inventory_chablais3$h[i],
      CW = tree_inventory_chablais3$h[i] / 4,
      HCB = 0.5 * tree_inventory_chablais3$h[i],
      dbh = tree_inventory_chablais3$d[i] / 50,
      resolution = "high",
      X = tree_inventory_chablais3$x[i],
      Y = tree_inventory_chablais3$y[i],
      mesh = F,
      stemcolor = "burlywood4",
      crowncolor = "darkgreen"
    )
  }
}
```


```{r plotTreeModel2, include=FALSE, eval=FALSE}
# virtual apices from detection
# shape different based on mean lidar intensity value (threshold 55)
library(rLiDAR)
rgl::rgl.open()
rgl::rgl.bg(color = "white")
for (i in 1:nrow(apices))
{
  if (apices$meanI[i] > 55) {
    rLiDAR::LiDARForestStand(
      crownshape = "halfellipsoid",
      CL = 0.6 * apices$h[i],
      CW = sqrt(4 * apices$s[i] / pi),
      HCB = 0.4 * apices$h[i],
      dbh = apices$h[i] / 50,
      resolution = "high",
      X = apices$x[i],
      Y = apices$y[i],
      mesh = F
    )
  } else {
    rLiDAR::LiDARForestStand(
      crownshape = "cone",
      CL = 0.5 * apices$h[i],
      CW = sqrt(4 * apices$s[i] / pi),
      HCB = 0.5 * apices$h[i],
      dbh = apices$h[i] / 50,
      resolution = "high",
      X = apices$x[i],
      Y = apices$y[i],
      mesh = F,
      stemcolor = "burlywood4",
      crowncolor = "darkgreen"
    )
  }
}
```

## Batch processing

The following code exemplifies how to process numerous LAS files and extract apices for the whole area with parallel processing. The processing runs faster if data is provided as chunks to the segmentation algorithm, and results are then aggregated, rather than running on the full coverage of the data. 
In order to avoid border effects, chunks are provided to the algorithm as overlapping tiles. Results are cropped to prevent the same tree from appearing twice in the final results. Tile size and buffer size are important parameters :

- tile size is a trade-off between the number of chunks to process and the amount of RAM required to process a single tile ;
- buffer size is a trade-off between redundant processing of the overlap area, and assuring that a tree is which treetop is located at the border of a tile has its full crown within the buffer size.

Tiles can be processed with parallel computing, within limits of the cluster's RAM and number of cores.

The steps for processing a batch of las/laz files are :

- build catalog of files
- provide tiling parameters
- provide segmentation parameters
- provide output parameters
- set cluster options for parallel processing
- compute the X and Y coordinates of tiles
- parallelize the processing with the package `future`, by sending to the parallel sessions the coordinates of tiles to process, and a function with the instructions to proceed :
    - load tile corresponding to the coordinates
    - compute CHM
    - segment and extract apices
- aggregate list of results

Be aware that in case tree segments are vectorized, some obtained polygons might overlap. The segmentation algorithm might not be deterministic and borders are sometimes not consistent when adjacent polygons are pasted from different tiles.

```{r batch, include=TRUE, out.width = '100%', fig.dim=c(8, 8), warning = FALSE}
rm(list = ls())
# BUILD CATALOG OF FILES
# folder containing the files
lazdir <- "../data/forest.structure.metrics"
# build catalog
cata <- lidR::readALSLAScatalog(lazdir)
# disable display of catalog processing
lidR::opt_progress(cata) <- FALSE
# set coordinate system
lidR::projection(cata) <- 2154
# option to read only xyzc attributes (coordinates, intensity, echo order and classification) from height files
lidR::opt_select(cata) <- "xyz"
#
# BATCH PROCESSING PARAMETERS
# tile size to split area into chunks to process
# trade-off between RAM capacity VS total number of tiles to process
tile_size <- 70 # here 70 for example purpose with small area
# buffer size: around each tile to avoid border effects on segmentation results
# trade-off between making sure a whole tree crown is processed in case its top
# is on the border VS duplicate processing
buffer_size <- 10 # 5 m is minimum, 10 is probably better depending on tree size
#
# TREE SEGMENTATION PARAMETERS
# set raster resolution
resolution <- 1
#
# OUTPUT PARAMETERS
# option to vectorize crowns (set to FALSE if too slow)
out_vectorize_apices <- TRUE
# output canopy height models ? (set to FALSE if too much RAM used)
out_chm <- TRUE
# save chms on disk
out_save_chm <- FALSE
#
# CLUSTER PARAMETERS
# specify to use two parallel sessions
future::plan("multisession", workers = 2L)
# remove warning when using random numbers in parallel sessions
options(future.rng.onMisuse = "ignore")
#
# COORDINATES OF TILES
# low left corner
x <- seq(
  from = floor(cata@bbox["x", "min"] / tile_size) * tile_size,
  by = tile_size,
  to = ceiling(cata@bbox["x", "max"] / tile_size - 1) * tile_size
) # [1:2]
length_x <- length(x)
y <- seq(
  from = floor(cata@bbox["y", "min"] / tile_size) * tile_size,
  by = tile_size,
  to = ceiling(cata@bbox["y", "max"] / tile_size - 1) * tile_size
) # [1:2]
length_y <- length(y)
# repeat coordinates for tiling while area
x <- as.list(rep(x, length_y))
y <- as.list(rep(y, each = length_x))
#
# PARALLEL PROCESSING
# function takes coordinates of tile as arguments
resultats <- future.apply::future_mapply(
  # i and j are the coordinated of the low left corner of the tile to process
  FUN = function(i, j) {
    # initialize output
    output <- list()
    output$name <- paste0(i, "_", j)
    # extract tile plus buffer
    las_tile <- lidR::clip_rectangle(
      cata,
      i - buffer_size,
      j - buffer_size,
      i + tile_size + buffer_size,
      j + tile_size + buffer_size
    )
    # check if points are present
    if (nrow(las_tile@data) > 0) {
      # normalization if required
      # las_tile <- lidR::normalize_height(las_tile, lidR::tin())
      # in this example LAS tiles are already normalized
      # compute canopy height model
      chm <- lidaRtRee::points2DSM(las_tile)
      # check all chm is not NA
      if (!all(is.na(raster::values(chm)))) {
        # output chm if asked for
        if (out_chm) {
          output$chm <- raster::crop(chm, raster::extent(
            i, i + tile_size,
            j, j + tile_size
          ))
        }
        # save on disk
        if (out_save_chm) {
          raster::writeRaster(raster::crop(
            chm,
            raster::extent(
              i, i + tile_size,
              j, j + tile_size
            )
          ),
          file = paste0("chm_", i, "_", j, ".tif"),
          overwrite = TRUE
          )
        }
        #
        # tree detection
        segms <- lidaRtRee::tree_segmentation(chm)
        # tree extraction
        apices <- lidaRtRee::tree_extraction(
          segms$filled_dem,
          segms$local_maxima,
          segms$segments_id
        )
        # remove apices in buffer area
        apices <- apices[apices$x >= i & apices$x < i + tile_size &
          apices$y >= j & apices$y < j + tile_size, ]
        # add tile id to apices to avoid duplicates in final file
        apices$tile <- paste0(i, "_", j)
        # convert to vectors if option is TRUE
        if (out_vectorize_apices) {
          # vectorize
          apices_v <- raster::rasterToPolygons(segms$segments_id, dissolve = T)
          # remove polygons which treetop is in buffer
          apices_v <- apices_v[is.element(apices_v$segments_id, apices$id), ]
          # names(apices_v) <- "id"
          # add attributes
          # errors when using sp::merge so using sp::over even if it is probably slower
          # merge(apices_v@data, apices, all.x = TRUE)
          apices_v@data <- cbind(apices_v@data, sp::over(apices_v, apices))
          # save in list
          output$apices_v <- apices_v
        }
        # save apices in list
        output$apices <- apices
      } # end of raster is not all NAs check
    } # end of nrow LAS check
    output
  }, x, y, SIMPLIFY = FALSE
) # function applied to the lists of coordinates (x and y)
#
# RESULTS AGGREGATION
# extract results from nested list into separate lists and then bind data
id <- unlist(lapply(resultats, function(x) x[["name"]]))
#
# apices
apices <- lapply(resultats, function(x) x[["apices"]])
# remove NULL elements
apices[sapply(apices, is.null)] <- NULL
# bind remaining elements
apices <- do.call(rbind, apices)
#
# chm
if (out_chm) {
  chm <- lapply(resultats, function(x) x[["chm"]])
  # merge chm
  # no names in list otherwise do.call returns an error
  chm_all <- do.call(raster::merge, chm)
  names(chm) <- id
}
# apices_v
if (out_vectorize_apices) {
  apices_v <- lapply(resultats, function(x) x[["apices_v"]])
  # remove NULL elements
  apices_v[sapply(apices_v, is.null)] <- NULL
  apices_v <- do.call(rbind, apices_v)
  # 1-pixel overlapping in apices_v might be present because image segmentation
  # is not fully identical in overlap areas of adjacent tiles.
}
```

The following image displays the results for the whole area.

```{r batch.plot, include=TRUE, out.width = '90%', fig.dim=c(8.5, 5.5)}
# threshold outsiders in chm
chm_all[chm_all > 40] <- 40
chm_all[chm_all < 0] <- 0
# display chm
raster::plot(chm_all,
  main = "Canopy Height Model and segments"
)
# display segments border
sp::plot(apices_v, border = "white", add = T)
# add apices
sp::plot(apices, cex = apices$h / 40, add = TRUE, pch = 2)
```

The following lines save outputs to files.

```{r batch.export, eval = FALSE}
# merged chm
raster::writeRaster(chm_all, file = "chm.tif", overwrite = TRUE)
# apices
sf::st_write(sf::st_as_sf(apices), 'apices_points.gpkg', 'apices_points', delete_dsn = TRUE)
# vectorized apices
if (out_vectorize_apices) sf::st_write(sf::st_as_sf(apices_v), 'v_apices_points.gpkg', 'v_apices_points', delete_dsn = T)
```

## References