-
Dorchies David authored
Refs #5, #12
- Load data
- Field data
- ALS data
- ALS metrics computation
- Point cloud metrics
- Tree metrics
- Other metrics
- Model calibration
- Calibration for a single variable
- Calibration for several variables
- Stratified models
- Motivation
- Calibration of stratum-specific models
- Stratified models with stratum-specific variable tranformations
- Save data before next tutorial
title: "R workflow for ABA prediction model calibration"
author: "Jean-Matthieu Monnet"
date: "`r Sys.Date()`"
output:
html_document: default
pdf_document: default
papersize: a4
bibliography: "../bib/bibliography.bib"
knitr::opts_chunk$set(echo = TRUE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts=list(width.cutoff=80),tidy=TRUE)
knitr::opts_chunk$set(fig.align = "center")
The code below presents a workflow to calibrate prediction models for the estimation of forest parameters from ALS-derived metrics, using the area-based approach (ABA). The workflow is based on functions from R
packages lidaRtRee
and lidR
.
Licence: GNU GPLv3 / Source page
Load data
The "Quatre Montagnes" dataset from France, prepared as described in the data preparation tutorial is loaded from the R archive files located in the folder "data/aba.model/output".
Field data
The file "plots.rda" contains the field data, organized as a data.frame named plots
. For subsequent use in the workflow, the data.frame should contain at least two fields: plotId
(unique plot identifier) and a forest stand parameter. Each line in the data.frame corresponds to a field plot. A factor variable is required to calibrate stratified models. Plot coordinates are required for subsequent inference computations.
The provided data set includes one categorical variable: stratum
, which corresponds to forest ownership, XY coordinates and three forest stand parameters :
- basal area in m^2^/ha (
G.m2.ha
),
- stem density in /ha (
N.ha
), - mean diameter at breast height in cm (
D.mean.cm
).
Scatterplots of stand parameters are presented below, colored by ownership (green for public forest, blue otherwise).
# load plot-level data
load(file="../data/aba.model/output/plots.rda")
summary(plots)
# display forest variables
plot(plots[,c("G.m2.ha", "N.ha", "D.mean.cm")],
col = ifelse(plots$stratum == "public", "green", "blue"))
ALS data
Normalized ALS point clouds extracted over each plot, as well as terrain statistics previously computed from the ALS ground points can also be prepared according to the data preparation tutorial. Point clouds corresponding to each field plot are organized in a list of LAS objects. Meta data of one LAS object are displayed below.
# list of LAS objects: normalized point clouds inside plot extent
load("../data/aba.model/output/llas.height.rda")
# display one point cloud # lidR::plot(llasn[[1]])
llas.height[[1]]
The first lines of the terrain statistics are displayed hereafter.
# terrain statistics previously computed with (non-normalized) ground points inside each plot extent
load("../data/aba.model/output/metrics.terrain.rda")
head(metrics.terrain[, 1:3], n=3)
The following lines ensure that the plots are ordered in the same way in the three data objects.
llas.height <- llas.height[plots$plotId]
metrics.terrain <- metrics.terrain[plots$plotId,]
ALS metrics computation
Two types of metrics can be computed.
- Point cloud metrics are directly computed from the point cloud or from the derived surface model on the whole plot extent. These are the metrics generally used in the area-based approach.
- Tree metrics are computed from the characteristics of trees detected in the point cloud (or in the derived surface model). They are more CPU-intensive to compute and require ALS data with higher density, but in some cases they allow a slight improvement in models prediction accuracy.
Point cloud metrics
Point cloud metrics are computed with the function lidaRtRee::cloudMetrics
, which applies the lidR::cloud_metrics
to all point clouds in the list. Default computed metrics are those proposed by the function lidR::stdmetrics
. Additional metrics are available with the function lidaRtRee::ABAmodelMetrics
.
# define function for later use
aba.pointMetricsFUN <- ~lidaRtRee::ABAmodelMetrics(Z, Intensity, ReturnNumber, Classification, 2)
# apply function on each point cloud in list
metrics.points <- lidaRtRee::cloudMetrics(llas.height, aba.pointMetricsFUN)
round(head(metrics.points[, 1:8], n = 3),2)
Tree metrics
Tree metrics rely on a preliminary detection of trees, which is performed with the lidaRtRee::treeSegmentation
function. For more details, please refer to the tree detection tutorial. Tree segmentation requires point clouds or canopy height models with an additional buffer in order to avoid border effects when computing tree characteristics. Once trees are detected, metrics are derived with the function lidaRtRee::stdTreeMetrics
. A user-specific function can be specified to compute other metrics from the features of detected trees. Plot radius has to be specified as it is required to exclude trees detected outside of the plot, and to compute the plot surface. Tree segmentation is not relevant when the point cloud density is too low, typically below five points per m^2^. The function first computes a canopy height model which default resolution is 0.5 m, but this should be set to 1 m with low point densities.
# resolution of canopy height model (m)
aba.resCHM <- 0.5
# specify plot radius to exclude trees located outside plots
plot.radius <- 15
# compute tree metrics
metrics.tree <- lidaRtRee::cloudTreeMetrics(llas.height, plots[, c("X", "Y")],
plot.radius, res = aba.resCHM,
func=function(x)
{
lidaRtRee::stdTreeMetrics(x,
area.ha=pi*plot.radius^2/10000)
})
round(head(metrics.tree[, 1:5], n = 3), 2)
Other metrics
In case terrain metrics have been computed from the cloud of ground points only, they can also be added as variables, and so do other environmental variables which might be relevant in modeling.
metrics <- cbind(metrics.points[plots$plotId, ],
metrics.tree[plots$plotId, ],
metrics.terrain[plots$plotId, 1:3])
Model calibration
Calibration for a single variable
Once a dependent variable (forest parameter of interest) has been chosen, the function lidaRtRee::ABAmodel
is used to select the linear regression model that yields the highest adjusted-R^2^ with a defined number of independent variables, while checking linear model assumptions. A Box-Cox transformation of the dependent variable can be applied to normalize its distribution, or a log transformation of all variables (parameter transform
). Model details and cross-validation statistics are available from the returned object.
variable <- "G.m2.ha"
# no subsample in this case
subsample <- 1:nrow(plots)
# model calibration
model.ABA <- lidaRtRee::ABAmodel(plots[subsample,variable], metrics[subsample,], transform="boxcox", nmax=4, xy = plots[subsample, c("X", "Y")])
# renames outputs with variable name
row.names(model.ABA$stats) <- variable
# display selected linear regression model
model.ABA$model
# display calibration and validation statistics
model.ABA$stats
The function computes values predicted in leave-one-out cross-validation, by using the same combination of dependent variables and fitting the regression coefficients with all observations except one. Predicted values can be plotted against field values with the function lidaRtRee::ABAmodelPlot
. It is also informative to check the correlation of prediction errors with other forest or environmental variables.
In this example, only tree metrics are selected in the basal area prediction model. The model seems to fail to predict large values. The prediction errors are positively correlated with basal area because large values are under-estimated.
# check correlation between errors and other variables
round(cor(cbind(model.ABA$values$residual, plots[subsample, c("G.m2.ha","N.ha","D.mean.cm")], metrics.terrain[subsample, 1:3])), 2)[1,]
# significance of correlation value
cor.test(model.ABA$values$residual, plots[subsample, variable])
# plot predicted VS field values
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, main = variable)
plot(plots[subsample, c("G.m2.ha")], model.ABA$values$residual, ylab = "Prediction errors", xlab = "Field values")
abline(h = 0, lty = 2)
In case only point cloud metrics are used as potential inputs, the errors are hardly better distributed. Coloring points by ownership shows that plots located in private forests have the largest basal area values which tend to be under-estimated.
model.ABA.metrics.points <- lidaRtRee::ABAmodel(plots[subsample,variable], metrics.points[subsample,], transform="boxcox", nmax=4, xy = plots[subsample, c("X", "Y")])
# renames outputs
row.names(model.ABA.metrics.points$stats) <- names(model.ABA.metrics.points$model) <- variable
# model.ABA.metrics.points$model[[variable]]
model.ABA.metrics.points$stats
# cor.test(model.ABA.metrics.points$values$residual, plots[subsample, variable])
par(mfrow=c(1,2))
# plot predicted VS field values
lidaRtRee::ABAmodelPlot(model.ABA.metrics.points, main = variable,
col = ifelse(plots$stratum == "public", "green", "blue"))
legend("topleft", c("public", "private"), col = c("green", "blue"), pch = 1)
plot(plots[subsample, c("G.m2.ha")],
model.ABA.metrics.points$values$residual,
ylab = "Prediction errors", xlab = "Field values",
col = ifelse(plots$stratum == "public", "green", "blue"))
abline(h = 0, lty = 2)
Calibration for several variables
The following code calibrates models for several forest parameters. In case different transformations have to be performed on the parameters, models have to be calibrated one by one.
models.ABA <- list()
for (i in c("G.m2.ha", "D.mean.cm", "N.ha"))
{
models.ABA[[i]] <- lidaRtRee::ABAmodel(plots[,i], metrics, transform="boxcox", nmax=4, xy = plots[, c("X", "Y")])
}
# bind model stats in a data.frame
model.stats <- do.call(rbind, lapply(models.ABA, function(x){x[["stats"]]}))
The obtained models are presented below. The table columns correspond to:
-
n
number of plots, -
metrics
selected in the model, -
adj-R2.%
adjusted R-squared of fitted model (%), -
CV-R2.%
coefficient of determination of values predicted in cross-validation (CV) VS field values (%), -
CV-RMSE.%
coefficient of variation of the Root Mean Square Errors of prediction in CV (%), -
CV-RMSE
Root Mean Square Error of prediction in CV.
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
#
par(mfrow = c(1,3))
for (i in names(models.ABA))
{
lidaRtRee::ABAmodelPlot(models.ABA[[i]], main = i)
}
rm(models.ABA, model.stats)
Stratified models
Motivation
When calibrating a statistical relationship between forest stand parameteres, which are usually derived from diameter measurements, and ALS metrics, one relies on the hypothesis that the interaction of laser pulses with the leaves and branches structure is constant on the whole area. However, differences can be expected either due to variations in acquisition setttings (flight parameters, scanner model), in forests (stand structure and composition) or in topography (slope). Better models might be obtained when calibrating stratum-specific relationships, provided each stratum is more homogeneous regarding the laser / vegetation interaction. A trade-off has to be achieved between the within-strata homogeneity and the number of available plots for calibration in each stratum. A minimum number of plots is approximately 50, while 100 would be recommended. In this example we hypothesize that ownership reflects both structure and composition differences in forest stands.
Calibration of stratum-specific models
Stratum-specific models are computed and stored in a list during a for
loop. The function lidaRtRee::ABAmodelCombineStrata
then combines the list of models corresponding to each stratum to compute aggregated statistics for all plots, making it easier to compare stratified with non-stratified models.
In this example, the model for "private" yields a large error on the plot "Verc-C5-1", which considerably lowers the accuracy of the stratified approach.
# stratification variable
strat <- "stratum"
# create list of models
model.ABA.stratified <- list()
# calibrate each stratum model
for (i in levels(plots[, strat]))
{
subsample <- which(plots[,strat]==i)
if (length(subsample)>0)
{
model.ABA.stratified[[i]] <- lidaRtRee::ABAmodel(plots[subsample, variable], metrics[subsample,], transform="boxcox", nmax=4, xy = plots[subsample,c("X", "Y")])
}
}
# backup list of models for later use
model.ABA.stratified.boxcox <- model.ABA.stratified
# combine list of models into single object
model.ABA.stratified <- lidaRtRee::ABAmodelCombineStrata(model.ABA.stratified, plots$plotId)
# model.ABA.stratified$stats
# bind model stats in a data.frame for comparison
model.stats <- rbind(model.ABA$stats, model.ABA.stratified$stats)
row.names(model.stats)[1] <- "NOT.STRATIFIED"
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, main = paste0(variable, ", not stratified"))
lidaRtRee::ABAmodelPlot(model.ABA.stratified, main = paste0(variable, ", stratified"))
Stratified models with stratum-specific variable tranformations
In case one wants to apply different variable transformations, or use different subsets of ALS metrics depending on the strata, the following example can be used. First models using only the point cloud metrics are calibrated without transformation of the data. The statistics for all plots are then calculated by combining the following stratum-specific models :
- public ownership, all metrics, Box-Cox transformation of basal area values (calibrated in the previous paragraph),
- private ownership, only point cloud metrics, no data transformation.
# create list of models for no transformation
model.ABA.stratified.none <- list()
# calibrate each stratum model
for (i in levels(plots[,strat]))
{
subsample <- which(plots[,strat]==i)
if (length(subsample)>0)
{
model.ABA.stratified.none[[i]] <- lidaRtRee::ABAmodel(plots[subsample, variable], metrics.points[subsample,], transform="none", xy = plots[subsample,c("X", "Y")])
}
}
# combine list of models into single object
model.ABA.stratified.mixed <- lidaRtRee::ABAmodelCombineStrata(list(private = model.ABA.stratified.none[["private"]], public = model.ABA.stratified.boxcox[["public"]]), plots$plotId)
# bind model stats in a data.frame for comparison
model.stats <- rbind(model.ABA$stats, model.ABA.stratified.mixed$stats)
row.names(model.stats)[1] <- "NOT.STRATIFIED"
# prepare output for report
table.output <- cbind(model.stats[, c("n", "formula", "transform")],
round(model.stats[, c("adjR2", "looR2", "cvrmse")]*100, 1),
data.frame(rmse = round(model.stats[, "rmse"], 1)))
names(table.output) <- c("n", "metrics", "transform", "adj-R2.%", "CV-R2.%", "CV-RMSE.%", "CV-RMSE")
knitr::kable(table.output)
# graphics
par(mfrow=c(1,2))
lidaRtRee::ABAmodelPlot(model.ABA, main = paste0(variable, ", not stratified"))
lidaRtRee::ABAmodelPlot(model.ABA.stratified.mixed, main = paste0(variable, ", stratified"))
Save data before next tutorial
The following lines save the data required for the area-based mapping step.
save(model.ABA.stratified.mixed, model.ABA, aba.pointMetricsFUN, aba.resCHM, file = "../data/aba.model/output/models.rda")