Commit 28cdb6f0 authored by Monnet Jean-Matthieu's avatar Monnet Jean-Matthieu
Browse files

Initial commit aba model calibration

No related merge requests found
Showing with 949 additions and 85 deletions
+949 -85
......@@ -26,7 +26,7 @@ Required `R` libraries : `ggplot2`, `sf`, `ggmap`
### Tree-level inventory
96 plots of 15 m radius have been inventoried in the Quatre Montagnes area (Vercors Mountain, France). Plots are aggregated in clusters of 4. Data are provided in the folder "aba.model/field":
96 plots of 15 m radius have been inventoried in the Quatre Montagnes area (Vercors Mountain, France). Plots are aggregated in clusters of 4. Data are provided in the folder "data/aba.model/field":
+ one file per plot for tree information ("Verc-CLUSTERID-PLOTID-ArbresTerrain.csv")
- one file per cluster for plot center location ("Verc-CLUSTERID-PiquetsTerrain.csv")
......@@ -504,6 +504,8 @@ sf::st_crs(public) <- 2154
plots.sf <- sf::st_join(plots.sf, public)
# rename column and levels
plots.sf$stratum <- factor(ifelse(is.na(plots.sf$FID), "private", "public"))
# also add to data.frame
plots <- merge(plots, sf::st_drop_geometry(plots.sf[, c("plotId", "stratum")]))
# remove column
plots.sf$FID <- NULL
```
......
---
title: "Workflow for ABA prediction model calibration"
title: "R workflow for ABA prediction model calibration"
author: "Jean-Matthieu Monnet"
date: "Oct 14, 2020"
date: "`r Sys.Date()`"
output:
pdf_document: default
html_document: default
bibliography: workflow.treedetection.bib
papersize: a4
bibliography: "./bib/bibliography.bib"
---
```{r setup, include=FALSE}
library(knitr)
knitr::opts_chunk$set(echo = TRUE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=80),tidy=TRUE)
knitr::opts_chunk$set(fig.align = "center")
knitr::opts_chunk$set(fig.align = "center")
```
---
Licence: CC-BY
Source page: https://gitlab.irstea.fr/jean-matthieu.monnet/lidaRtRee/wikis/ABA-model-calibration
The code below presents a workflow to calibrate prediction models for the estimation of forest parameters from ALS-derived metrics, using the area-based approach (ABA). The workflow is based on functions from `R` packages `lidaRtRee` and `lidR`.
The code below presents a workflow to calibrate prediction models for the estimation of forest parameters from ALS-derived metrics, using the area-based approach (ABA).
Licence: CC-BY / [Source page](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/area-based.2.model.calibration.Rmd)
## Load required data
# Load data
The "Quatre Montagnes" dataset from France, prepared as described in https://gitlab.irstea.fr/jean-matthieu.monnet/lidaRtRee/wikis/ABA-data-preparation is loaded from R archives (rda files).
The "Quatre Montagnes" dataset from France, prepared as described in the [data preparation tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/area-based.1.data.preparation.Rmd) is loaded from the R archive files located in the folder "data/aba.model/output".
### Field data
## Field data
Field data should be organised in a data.frame named "plots" with at least two fields: plotID (unique plot identifier) and a forest stand parameter. A factor variable "stratum" is required for stratified models. Each line in the data.frame corresponds to a field plot.
The file "plots.rda" contains the field data, organized as a data.frame named `plots`. For subsequent use in the workflow, the data.frame should contain at least two fields: `plotId` (unique plot identifier) and a forest stand parameter. Each line in the data.frame corresponds to a field plot. A factor variable is required to calibrate stratified models. Plot coordinates are required for subsequent inference computations.
The provided dataset includes three forest stand variables :
The provided data set includes one categorical variable: `stratum`, which corresponds to forest ownership, XY coordinates and three forest stand parameters :
* basal area in m^2/ha (G.ha)
+ stem density in /ha (N.ha)
+ mean diameter at breast height in cm (D.mean.cm).
* basal area in m^2^/ha (`G.m2.ha`),
+ stem density in /ha (`N.ha`),
+ mean diameter at breast height in cm (`D.mean.cm`).
Scatterplots are presented below.
Scatterplots of stand parameters are presented below, colored by ownership (green for public forest, blue otherwise).
```{r loadFieldData, include=TRUE}
```{r loadFieldData, include=TRUE, message = FALSE, warning = FALSE}
# load plot-level data
load(file="./data.ABA.model/plots.rda")
load(file="./data/aba.model/output/plots.rda")
summary(plots)
# display forest variables
plot(plots[,c("G.ha", "N.ha", "D.mean.cm")])
plot(plots[,c("G.m2.ha", "N.ha", "D.mean.cm")],
col = ifelse(plots$stratum == "public", "green", "blue"))
```
### ALS data
## ALS data
Normalized ALS point clouds are loaded, as well as terrain statistics previously computed from the ALS ground points of each cloud. The dataset is loaded from R archives (rda files) previously prepared in https://gitlab.irstea.fr/jean-matthieu.monnet/lidaRtRee/wikis/ABA-data-preparation. Point clouds corresponding to each field plot are organized in a list of LAS objects from package lidR.
Normalized ALS point clouds extracted over each plot, as well as terrain statistics previously computed from the ALS ground points can also be prepared according to the [data preparation tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/area-based.1.data.preparation.Rmd). Point clouds corresponding to each field plot are organized in a list of LAS objects. Meta data of one LAS object are displayed below.
```{r loadFieldData, include=TRUE}
# list of LAS objects normalized point cloud inside plot extent
load("./data.ABA.model/llasn.rda")
# display one point cloud
lidR::plot(llasn[[1]])
#
```{r loadALSdData, include=TRUE, message = FALSE, warning = FALSE}
# list of LAS objects: normalized point clouds inside plot extent
load("./data/aba.model/output/llas.height.rda")
# display one point cloud # lidR::plot(llasn[[1]])
llas.height[[1]]
```
The first lines of the terrain statistics are displayed hereafter.
```{r loadTerrainStats, echo = FALSE}
# terrain statistics previously computed with (non-normalized) ground points inside each plot extent
load("./data.ABA.model/terrain.stats.rda")
head(terrain.stats, n=3)
# check that plots are ordered in the same way in all data.frames
# all(row.names(plots) == row.names(terrain.stats))
# all(row.names(plots) == names(llasn))
load("./data/aba.model/output/terrain.stats.rda")
head(terrain.stats[, 1:3], n=3)
```
The following lines ensure that the plots are ordered in the same way in the three data objects.
```{r orderData, include=TRUE}
llas.height <- llas.height[plots$plotId]
terrain.stats <- terrain.stats[plots$plotId,]
```
## Computation of ALS metrics from the point clouds
# ALS metrics computation
Two types of metrics can be computed :
Two types of metrics can be computed.
* Point cloud metrics are directly computed from the point cloud or from the derived surface models on the whole plot extent.
+ Tree-based metrics are computed from the characteristics of trees detected in the point cloud (or in the derived surface models).
* Point cloud metrics are directly computed from the point cloud or from the derived surface model on the whole plot extent. These are the metrics generally used in the area-based approach.
+ Tree metrics are computed from the characteristics of trees detected in the point cloud (or in the derived surface model). They are more CPU-intensive to compute and require ALS data with higher density, but in some cases they allow a slight improvement in models prediction accuracy.
Point cloud metrics are computed with the function `cloudMetrics` of lidaRtRee package, which applies the `cloud_metrics` function of lidR package to all point clouds in the list. Default computed metrics are those proposed by the function `stdmetrics` of the lidR package. Additionnal metrics are available with the `ABAmodelMetrics` of package lidaRtRee.
## Point cloud metrics
Point cloud metrics are computed with the function `lidaRtRee::cloudMetrics`, which applies the `lidR::cloud_metrics` to all point clouds in the list. Default computed metrics are those proposed by the function [`lidR::stdmetrics`](https://github.com/Jean-Romain/lidR/wiki/stdmetrics). Additional metrics are available with the function `lidaRtRee::ABAmodelMetrics`.
```{r computeMetrics, include=TRUE}
# compute point cloud metrics
metrics <- lidaRtRee::cloudMetrics(llasn, ~lidaRtRee::ABAmodelMetrics(Z, Intensity, ReturnNumber, Classification, 2))
head(metrics, n=3)
point.metrics <- lidaRtRee::cloudMetrics(llas.height, ~lidaRtRee::ABAmodelMetrics(Z, Intensity, ReturnNumber, Classification, 2))
round(head(point.metrics[, 1:8], n = 3),2)
```
Tree metrics rely on a preliminary detection of trees, which is performed with the `treeSegmentation` function of package lidaRtRee. For more information on tree detection, please refer to https://gitlab.irstea.fr/jean-matthieu.monnet/lidaRtRee/-/wikis/tree%20detection%20workflow. Tree segmentation requires point clouds or canopy height models with an additionnal buffer in order to avoid border effects when computing tree characteristics. Once trees are detected, metrics are derived with the function `stdTreeMetrics` of package lidaRtRee.
## Tree metrics
Tree metrics rely on a preliminary detection of trees, which is performed with the `lidaRtRee::treeSegmentation` function. For more details, please refer to the [tree detection tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/tree.detection.Rmd). Tree segmentation requires point clouds or canopy height models with an additional buffer in order to avoid border effects when computing tree characteristics. Once trees are detected, metrics are derived with the function `lidaRtRee::stdTreeMetrics`. A user-specific function can be specified to compute other metrics from the features of detected trees. Plot radius has to be specified as it is required to exclude trees detected outside of the plot, and to compute the plot surface. Tree segmentation is not relevant when the point cloud density is too low, typically below five points per m^2^. The function first computes a canopy height model which default resolution is 0.5 m, but this should be set to 1 m with low point densities.
```{r computeTreeMetrics, include=TRUE}
# load point clouds with 15m buffer
# load("./data.ABA.model/llasn.buffer.Rdata")
# resolution of canopy height model
# resolution of canopy height model (m)
resCHM <- 0.5
# specifiy plot radius to exclude trees located outside plots
plot.radius <- 15
# compute tree metrics
tree.metrics <- lidaRtRee::cloudTreeMetrics(llasn, plots[,c("X", "Y")], plot.radius, res=resCHM, func=function(x) {lidaRtRee::stdTreeMetrics(x, area.ha=pi*plot.radius^2/10000)})
head(metrics, n=3)
tree.metrics <- lidaRtRee::cloudTreeMetrics(llas.height, plots[, c("X", "Y")],
plot.radius, res = resCHM,
func=function(x)
{
lidaRtRee::stdTreeMetrics(x,
area.ha=pi*plot.radius^2/10000)
})
round(head(tree.metrics[, 1:5], n = 3), 2)
```
In case terrain metrics have been computed from the cloud of ground points only, they can also be added as independant variables.
## Other metrics
In case terrain metrics have been computed from the cloud of ground points only, they can also be added as variables, and so do other environmental variables which might be relevant in modeling.
```{r bindMetrics, include=TRUE}
metrics <- cbind(metrics[plots$plotId,],tree.metrics[plots$plotId,])
# add terrain stats
# metrics <- cbind(metrics, terrain.stats[placettes$plotID, 1:3])
# print("terrain stats are not used in calibration because function to compute them on grid is not implemented yet")
metrics <- cbind(point.metrics[plots$plotId, ],
tree.metrics[plots$plotId, ],
terrain.stats[plots$plotId, 1:3])
```
## Model calibration
# Model calibration
## Calibration for a single variable
Once a dependant variable (forest parameter of interest) has been chosen, the function `ABAmodel` of package lidaRtRee is used to select the linear regression model that yields the highest adjusted-R^2, while checking that some model assumptions remain verified. A box-Cox transformation of the dependant variable can be done to normalize its distribution, or a log transformation of all variables. Model details and cross-validation statistics are available from the returned object.
Once a dependent variable (forest parameter of interest) has been chosen, the function `lidaRtRee::ABAmodel` is used to select the linear regression model that yields the highest adjusted-R^2^ with a defined number of independent variables, while checking linear model assumptions. A Box-Cox transformation of the dependent variable can be applied to normalize its distribution, or a log transformation of all variables (parameter `transform`). Model details and cross-validation statistics are available from the returned object.
```{r model.calibration, include=TRUE}
###########################
# calibrate one model : for whole dataset or subset
variable <- "G.ha"
# using only a subsample
```{r modelCalibration, include=TRUE, message = FALSE, warning = FALSE}
variable <- "G.m2.ha"
# no subsample in this case
subsample <- 1:nrow(plots)
# model calibration
modell1 <- lidaRtRee::ABAmodel(plots[subsample,variable], metrics[subsample,], transform="boxcox", nmax=4, test=c("partial_p", "vif"), plots[subsample,c("X", "Y")])
# display linear regression model and additional statistics
modell1$model
modell1$stats
model.ABA <- lidaRtRee::ABAmodel(plots[subsample,variable], metrics[subsample,], transform="boxcox", nmax=4, xy = plots[subsample, c("X", "Y")])
# renames outputs with variable name
row.names(model.ABA$stats) <- names(model.ABA$model) <- variable
# display selected linear regression model
model.ABA$model[[variable]]
# display calibration and validation statistics
model.ABA$stats
```
```{r model.calibration, include=TRUE}
The function computes values predicted in leave-one-out cross-validation, by using the same combination of dependent variables and fitting the regression coefficients with all observations except one. Predicted values can be plotted against field values with the function `lidaRtRee::ABAmodelPlot`. It is also informative to check the correlation of prediction errors with other forest or environmental variables.
In this example, only tree metrics are selected in the basal area prediction model. The model seems to fail to predict large values. The prediction errors are positively correlated with basal area because large values are under-estimated.
```{r modelPlot, include=TRUE}
# check correlation between errors and other variables
round(cor(cbind(model.ABA$values$residual, plots[subsample, c("G.m2.ha","N.ha","D.mean.cm")], terrain.stats[subsample, 1:3])), 2)[1,]
# significance of correlation value
cor.test(model.ABA$values$residual, plots[subsample, variable])
# plot predicted VS field values
lidaRtRee::ABAmodelPlot(modell1, variable)
# check correlation between residuals and other variables
cor(cbind(placettes[sousech,c("G.ha","N.ha","D.mean.cm")], modell1$values, terrain.stats[sousech, 1:3]))
#
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, variable)
plot(plots[subsample, c("G.m2.ha")], model.ABA$values$residual, ylab = "Prediction errors", xlab = "Field values")
abline(h = 0, lty = 2)
```
In case only point cloud metrics are used as potential inputs, the errors are hardly better distributed. Coloring points by ownership shows that plots located in private forests have the largest basal area values which tend to be under-estimated.
```{r point.metricsOnly, include=TRUE}
model.ABA.point.metrics <- lidaRtRee::ABAmodel(plots[subsample,variable], point.metrics[subsample,], transform="boxcox", nmax=4, xy = plots[subsample, c("X", "Y")])
# renames outputs
row.names(model.ABA.point.metrics$stats) <- names(model.ABA.point.metrics$model) <- variable
# model.ABA.point.metrics$model[[variable]]
model.ABA.point.metrics$stats
# cor.test(model.ABA.point.metrics$values$residual, plots[subsample, variable])
par(mfrow=c(1,2))
# plot predicted VS field values
lidaRtRee::ABAmodelPlot(model.ABA.point.metrics, variable,
col = ifelse(plots$stratum == "public", "green", "blue"))
legend("topleft", c("public", "private"), col = c("green", "blue"), pch = 1)
plot(plots[subsample, c("G.m2.ha")],
model.ABA.point.metrics$values$residual,
ylab = "Prediction errors", xlab = "Field values",
col = ifelse(plots$stratum == "public", "green", "blue"))
abline(h = 0, lty = 2)
```
## Calibration for several variables
The following code calibrates models for several forest parameters. In case different transformations have to be performed on the parameters, models have to be calibrated one by one.
```{r multipleModels, include=TRUE, warning = FALSE, message = FALSE}
models.ABA <- list()
for (i in c("G.m2.ha", "D.mean.cm", "N.ha"))
{
models.ABA[[i]] <- lidaRtRee::ABAmodel(plots[,i], metrics, transform="boxcox", nmax=4, xy = plots[, c("X", "Y")])
}
# bind model stats in a data.frame
model.stats <- do.call(rbind, lapply(models.ABA, function(x){x[["stats"]]}))
```
The obtained models are presented below. The table columns correspond to:
* `n` number of plots,
* `metrics` selected in the model,
* `adj-R2.% ` adjusted R-squared of fitted model (%),
* `CV-R2.%` coefficient of determination of values predicted in cross-validation (CV) VS field values (%),
* `CV-RMSE.%` coefficient of variation of the Root Mean Square Errors of prediction in CV (%),
* `CV-RMSE` Root Mean Square Error of prediction in CV.
```{r multipleModelsTable, echo = FALSE, fig.width = 12, fig.height = 4.5}
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
#
##########################
# calibrate stratified models
variable <- "G.ha"
par(mfrow = c(1,3))
for (i in names(models.ABA))
{
lidaRtRee::ABAmodelPlot(models.ABA[[i]], i)
}
rm(models.ABA, model.stats)
```
# Stratified models
## Motivation
When calibrating a statistical relationship between forest stand parameteres, which are usually derived from diameter measurements, and ALS metrics, one relies on the hypothesis that the interaction of laser pulses with the leaves and branches structure is constant on the whole area. However, differences can be expected either due to variations in acquisition setttings (flight parameters, scanner model), in forests (stand structure and composition) or in topography (slope). Better models might be obtained when calibrating stratum-specific relationships, provided each stratum is more homogeneous regarding the laser / vegetation interaction. A trade-off has to be achieved between the within-strata homogeneity and the number of available plots for calibration in each stratum. A minimum number of plots is approximately 50, while 100 would be recommended. In this example we hypothesize that ownership reflects both structure and composition differences in forest stands.
## Calibration of stratum-specific models
Stratum-specific models are computed and stored in a list during a `for` loop. The function `lidaRtRee::ABAmodelCombineStrata` then combines the list of models corresponding to each stratum to compute aggregated statistics for all plots, making it easier to compare stratified with non-stratified models.
In this example, the model for "private" yields a large error on the plot "Verc-C5-1", which considerably lowers the accuracy of the stratified approach.
```{r stratifiedmodelCalibration, include=TRUE, warning = FALSE}
# stratification variable
strat <- "stratum"
#
modells <- list()
for (i in levels(placettes[,strat]))
# create list of models
model.ABA.stratified <- list()
# calibrate each stratum model
for (i in levels(plots[, strat]))
{
sousech <- which(placettes[,strat]==i)
if (length(sousech)>0)
subsample <- which(plots[,strat]==i)
if (length(subsample)>0)
{
modells[[i]] <- lidaRtRee::ABAmodel(placettes[sousech, variable], metrics[sousech,], transform="log", nmax=4, test=c("partial_p", "vif"), placettes[sousech,c("X", "Y")])
model.ABA.stratified[[i]] <- lidaRtRee::ABAmodel(plots[subsample, variable], metrics[subsample,], transform="boxcox", nmax=4, xy = plots[subsample,c("X", "Y")])
}
}
rm(strat, sousech, i)
# backup list of models for later use
model.ABA.stratified.boxcox <- model.ABA.stratified
# combine list of models into single object
modell <- lidaRtRee::ABAmodelCombineStrata(modells, placettes$plotID)
modell$stats
# example of combination of stratum-specific models obtained with different transformations (requires previous computation of modellsbox for boxcox et modellslog for log)
# modell <- combine.stratified.ABA.models(list(public=modellsbox[[1]], private=modellslog[[2]]), placettes)
#
lidaRtRee::ABAmodelPlot(modell, variable)
model.ABA.stratified <- lidaRtRee::ABAmodelCombineStrata(model.ABA.stratified, plots$plotID)
# model.ABA.stratified$stats
```
```{r stratifiedmodelTable, echo=FALSE, fig.height = 4.5, fig.width = 8}
# bind model stats in a data.frame for comparison
model.stats <- rbind(model.ABA$stats, model.ABA.stratified$stats)
row.names(model.stats)[1] <- "NOT.STRATIFIED"
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, paste0(variable, ", not stratified"))
lidaRtRee::ABAmodelPlot(model.ABA.stratified, paste0(variable, ", stratified"))
```
## Stratified models with stratum-specific variable tranformations
In case one wants to apply different variable transformations, or use different subsets of ALS metrics depending on the strata, the following example can be used. First models using only the point cloud metrics are calibrated without transformation of the data. The statistics for all plots are then calculated by combining the following stratum-specific models :
* public ownership, all metrics, Box-Cox transformation of basal area values (calibrated in the previous paragraph),
+ private ownership, only point cloud metrics, no data transformation.
```{r stratifiedmodelCalibrationTransforation, include=TRUE, warning = FALSE}
# create list of models for no transformation
model.ABA.stratified.none <- list()
# calibrate each stratum model
for (i in levels(plots[,strat]))
{
subsample <- which(plots[,strat]==i)
if (length(subsample)>0)
{
model.ABA.stratified.none[[i]] <- lidaRtRee::ABAmodel(plots[subsample, variable], point.metrics[subsample,], transform="none", xy = plots[subsample,c("X", "Y")])
}
}
# combine list of models into single object
model.ABA.stratified.mixed <- lidaRtRee::ABAmodelCombineStrata(list(private = model.ABA.stratified.none[["private"]], public = model.ABA.stratified.boxcox[["public"]]), plots$plotId)
# bind model stats in a data.frame for comparison
model.stats <- rbind(model.ABA$stats, model.ABA.stratified.mixed$stats)
row.names(model.stats)[1] <- "NOT.STRATIFIED"
```
```{r stratifiedmodelCalibrationTransformationTable, echo = FALSE, fig.height = 4.5, fig.width = 8}
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula", "transform")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "transform", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
# graphics
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, paste0(variable, ", not stratified"))
lidaRtRee::ABAmodelPlot(model.ABA.stratified.mixed, paste0(variable, ", stratified"))
```
# Save data before next tutorial
The following lines save the data required for the [area-based mapping step](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/area-based.3.mapping.Rmd).
```{r saveModels, include=TRUE}
# IN PROGRESS
```
\ No newline at end of file
No preview for this file type
File added
This source diff could not be displayed because it is too large. You can view the blob instead.
File added
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment