Updated area-based after correction of point metrics (buffer removal before computation).

601fe438 · Monnet Jean-Matthieu · 9781a728 · 601fe438 · 601fe438 · 601fe438
Commit 601fe438 authored 3 years ago by Monnet Jean-Matthieu
Hide whitespace changes
Inline Side-by-side

Showing

with 134 additions and 169 deletions
+134 -169
--- a/R/area-based.2.model.calibration.Rmd
+++ b/R/area-based.2.model.calibration.Rmd
@@ -29,7 +29,7 @@ Many thanks to Pascal Obstétar for checking code and improvement suggestions.

 # Load data

-The "Quatre Montagnes" dataset from France, prepared as described in the [data preparation tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/area-based.1.data.preparation.Rmd) is loaded from the R archive files located in the folder "data/aba.model/output".
+The "Quatre Montagnes" dataset from France, prepared as described in the [data preparation tutorial](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/area-based.1.data.preparation.Rmd) is loaded from the `R` archive files located in the folder "data/aba.model/output".

 ## Field data

@@ -88,7 +88,7 @@ Two types of vegetation metrics can be computed.

 ## Point cloud metrics

-Point cloud metrics are computed with the function `lidaRtRee::clouds_metrics`, which applies the `lidR::cloud_metrics` to all point clouds in a list. Default computed metrics are those proposed by the function [`lidR::stdmetrics`](https://github.com/Jean-Romain/lidR/wiki/stdmetrics). Additional metrics are available with the function `lidaRtRee::aba_metrics`. The buffer points, which are located outside of the plot extent inventoried on the field, should be removed before computing those metrics
+Point cloud metrics are computed with the function `lidaRtRee::clouds_metrics`, which applies the function `lidR::cloud_metrics` to all point clouds in a list. Default computed metrics are those proposed by the function [`lidR::stdmetrics`](https://github.com/Jean-Romain/lidR/wiki/stdmetrics). Additional metrics are available with the function `lidaRtRee::aba_metrics`. The buffer points, which are located outside of the plot extent inventoried on the field, should be removed before computing those metrics.

 ```{r computeMetrics, include=TRUE}
 # define function for later use
@@ -159,37 +159,27 @@ model_aba$stats

 The function computes values predicted in leave-one-out cross-validation, by using the same combination of dependent variables and fitting the regression coefficients with all observations except one. Predicted values can be plotted against field values with the function `lidaRtRee::aba_plot`. It is also informative to check the correlation of prediction errors with other forest or environmental variables.

-In this example, only tree metrics are selected in the basal area prediction model. The model seems to fail to predict large values. The prediction errors are positively correlated with basal area because large values are under-estimated.
+The model seems to fail to predict large values, and the prediction errors are positively correlated with basal area.

-```{r modelPlot, include=TRUE, fig.height = 4.5, fig.width = 8}
+```{r modelCorrelation, include=TRUE, fig.height = 4.5, fig.width = 8}
 # check correlation between errors and other variables
 round(cor(cbind(model_aba$values$residual, plots[subsample, c("G_m2_ha", "N_ha", "D_mean_cm")], metrics_terrain[subsample, 1:3])), 2)[1, ]
 # significance of correlation value
 cor.test(model_aba$values$residual, plots[subsample, variable])
-# plot predicted  VS field values
-par(mfrow = c(1, 2))
-lidaRtRee::aba_plot(model_aba, main = variable)
-plot(plots[subsample, c("G_m2_ha")], model_aba$values$residual, ylab = "Prediction errors", xlab = "Field values")
-abline(h = 0, lty = 2)
 ```
-In case only point cloud metrics are used as potential inputs, the errors are hardly better distributed. Coloring points by ownership shows that plots located in private forests have the largest basal area values which tend to be under-estimated.
-
-```{r metrics_pointsOnly, include=TRUE, fig.height = 4.5, fig.width = 8}
-model_aba_metrics_points <- lidaRtRee::aba_build_model(plots[subsample, variable], metrics_points[subsample, ], transform = "boxcox", nmax = 4, xy = plots[subsample, c("X", "Y")])
-# renames outputs
-row.names(model_aba_metrics_points$stats) <- names(model_aba_metrics_points$model) <- variable
-# model_aba_metrics_points$model[[variable]]
-model_aba_metrics_points$stats
-# cor.test(model_aba_metrics_points$values$residual, plots[subsample, variable])
+
+Coloring points by ownership shows that plots located in private forests have the largest basal area values.
+
+```{r modelPlot, include=TRUE, fig.height = 4.5, fig.width = 8}
 par(mfrow = c(1, 2))
 # plot predicted  VS field values
-lidaRtRee::aba_plot(model_aba_metrics_points,
+lidaRtRee::aba_plot(model_aba,
  main = variable,
  col = ifelse(plots$stratum == "public", "green", "blue")
 )
 legend("topleft", c("public", "private"), col = c("green", "blue"), pch = 1)
 plot(plots[subsample, c("G_m2_ha")],
-  model_aba_metrics_points$values$residual,
+  model_aba$values$residual,
  ylab = "Prediction errors", xlab = "Field values",
  col = ifelse(plots$stratum == "public", "green", "blue")
 )
@@ -220,6 +210,8 @@ The obtained models are presented below. The table columns correspond to:
 * `CV-RMSE.%` coefficient of variation of the Root Mean Square Errors of prediction in CV (%),
 * `CV-RMSE` Root Mean Square Error of prediction in CV.

+The two largest (outlier) values of mean diameter are underestimated by the model, which greatly decreases the accuracy statistics. This might be explained by the fact that when trees reach maturity, diameter growth continues while height growth almost stops. As the ALS point cloud mostly contains height information, there is some signal saturation for high mean diameter values. It might also be the case for high biomass values.
+
 ```{r multipleModelsTable, echo = FALSE, fig.width = 12, fig.height = 4.5}
 # prepare output for report
 table_output <- cbind(
@@ -242,14 +234,12 @@ rm(models_aba, model_stats)

 ## Motivation

-When calibrating a statistical relationship between forest stand parameters, which are usually derived from diameter measurements, and ALS metrics, one relies on the hypothesis that the interaction of laser pulses with the leaves and branches structure is constant on the whole area. However, differences can be expected either due to variations in acquisition settings (flight parameters, scanner model), in forests (stand structure and composition) or in topography (slope). Better models might be obtained when calibrating stratum-specific relationships, provided each stratum is more homogeneous regarding the laser / vegetation interaction. A trade-off has to be achieved between the within-strata homogeneity and the number of available plots for calibration in each stratum. A minimum number of plots is approximately 50, while 100 would be recommended. In this example we hypothesize that ownership reflects both structure and composition differences in forest stands.
+When calibrating a statistical relationship between forest stand parameters, which are usually derived from diameter measurements, and ALS metrics, one relies on the hypothesis that the interaction of laser pulses with the leaves and branches structure is constant on the whole area. However, differences can be expected either due to variations in acquisition settings (flight parameters, scanner model), in forests (stand structure and composition) or in topography (slope). Better models might be obtained when calibrating stratum-specific relationships, provided each stratum is more homogeneous regarding the laser interaction with the vegetation. A trade-off has to be achieved between the within-strata homogeneity and the number of available plots for calibration in each stratum. A minimum number of plots is approximately 50, while 100 would be recommended. In this example we hypothesize that ownership reflects both structure and composition differences in forest stands.

 ## Calibration of stratum-specific models

 Stratum-specific models are computed and stored in a list during a `for` loop. The function `lidaRtRee::aba_combine_strata` then combines the list of models corresponding to each stratum to compute aggregated statistics for all plots, making it easier to compare stratified with non-stratified models.

-In this example, the model for "private" ownership yields a large error on the plot "Verc-C5-1", which considerably lowers the accuracy of the stratified approach.
-
 ```{r stratifiedmodelCalibration, include=TRUE, warning = FALSE}
 # stratification variable
 strat <- "stratum"
@@ -335,6 +325,9 @@ The following lines save the data required for the [area-based mapping step](htt
 save(model_aba_stratified_mixed, model_aba, aba_point_metrics_fun, aba_res_chm,
  file = "../data/aba.model/output/models.rda"
 )
+```
+
+```{r saveForlidaRtRee, include=FALSE, eval=FALSE}
 # save data for lidaRtRee package
 # quatre_montagnes <- cbind(plots, metrics)
 # save(quatre_montagnes, file = "quatre_montagnes.rda")

--- a/data/aba.model/output/models.rda
+++ b/data/aba.model/output/models.rda
--- a/export/area-based.2.model.calibration.html
+++ b/export/area-based.2.model.calibration.html
--- a/export/area-based.2.model.calibration.pdf
+++ b/export/area-based.2.model.calibration.pdf
--- a/export/area-based.3.mapping.and.inference.html
+++ b/export/area-based.3.mapping.and.inference.html
--- a/export/area-based.3.mapping.and.inference.pdf
+++ b/export/area-based.3.mapping.and.inference.pdf