Added preprocessing workflow

9b21e3e2 · Monnet Jean-Matthieu · 5bbf31fb · 9b21e3e2 · 9b21e3e2 · 9b21e3e2
Commit 9b21e3e2 authored 3 years ago by Monnet Jean-Matthieu
Expand all Hide whitespace changes
Inline Side-by-side

Showing

with 1147 additions and 0 deletions
+1147 -0
--- a/R/ALS_data_preprocessing.Rmd
+++ b/R/ALS_data_preprocessing.Rmd
+---
+title: "R workflow for ALS data pre-processing"
+author: "Jean-Matthieu Monnet"
+date: "`r Sys.Date()`"
+output:
+  html_document: default
+  pdf_document: default
+papersize: a4
+bibliography: "../bib/bibliography.bib"
+---
+
+```{r setup, include=FALSE}
+# erase all
+cat("\014")
+rm(list = ls())
+# knit options
+knitr::opts_chunk$set(echo = TRUE)
+# Set so that long lines in R will be wrapped:
+knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)
+knitr::opts_chunk$set(fig.align = "center")
+# for display of rgl in html
+knitr::knit_hooks$set(webgl = rgl::hook_webgl)
+# output to html
+html <- TRUE
+```
+
+---
+
+```{r, include=FALSE}
+options(width = 60)
+local({
+  hook_output <- knitr::knit_hooks$get('output')
+  knitr::knit_hooks$set(output = function(x, options) {
+    if (!is.null(options$max.height)) options$attr.output <- c(
+      options$attr.output,
+      sprintf('style="max-height: %s;"', options$max.height)
+    )
+    hook_output(x, options)
+  })
+})
+```
+
+The `R` code below is designed for checking and pre-processing Airborne Laser Scanning (ALS, or lidar remote sensing) data. The workflow is based on functions of the package `lidR`, and includes the following steps:
+
+* check the content of one file, 
+ create images and statistics from multiple files,
+ compute digital surface models.
+
+Licence: GNU GPLv3 / [source page](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/ALS_data_preprocessing.Rmd) 
+
+# ALS data
+
+Files required for the tutorial are too large to be hosted on the gitlab repository, they can be downloaded as a [zip file](https://drive.google.com/u/0/uc?export=download&id=1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i) from Google drive, and should be extracted in the folder "data/aba.model/ALS/" before proceeding with the processing. Files can be automatically downloaded thanks to the `googledrive` package with the following code, this requires authenticating yourself and authorizing the package to deal on your behalf with Google Drive.
+
+```{r downloadGoogledrive, include = TRUE, eval = FALSE}
+# set temporary file
+temp <- tempfile(fileext = ".zip")
+# download file from google drive
+dl <- googledrive::drive_download(googledrive::as_id("1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i"),
+                     path = temp,
+                     overwrite = TRUE)
+# unzip to folder
+out <- unzip(temp, exdir = "../data/aba.model/ALS/")
+# remove temporary file
+unlink(temp)
+```
+
+# Data checking
+
+## Single file
+
+### Load file
+
+ALS data are usually stored in files respecting the [ASPRS LAS specifications](https://www.asprs.org/divisions-committees/lidar-division/laser-las-file-format-exchange-activities). Last version of LAS file format is 1.4. LAS files contain:
+
+* metadata regarding the acquisition,
+* basic attributes of each recorded point,
+* additional attributes for each point (optional),
+* waveforms (optional).
+
+A single LAS file can be loaded in R with the `readLAS` function, which returns an object of class LAS. Some checks are automatically performed when the file is read. Function options allow to skip loading certain attributes, which might help saving  time and memory. The object contains several slots, including:
+
+* `header`: metadata read from the file,
+* `data`: point cloud attributes,
+* `bbox`: the bounding box of the data,
+* `proj4string`: projection information.
+
+It is advisable to fill the projection information if missing, as spatial objects computed from the point cloud will inherit the information.
+
+```{r ALS.load, include = TRUE}
+# read file
+point_cloud <- lidR::readLAS("../data/aba.model/ALS/tiles.laz/899500_6447500.laz")
+# set projection info (epsg code of Lambert 93)
+lidR::projection(point_cloud) <- 2154
+# summary of object content
+point_cloud
+```
+
+### Metadata
+
+The `header` slot contains information about the file version, data creation, presence of optional attributes, some point statistics, and offset/scale factors used internally to compute coordinates.
+
+```{r ALS.header, eval=TRUE, max.height='200px'}
+point_cloud@header
+```
+### Basic attributes
+
+The `data` slot contains point attributes
+
+**`X`, `Y`, `Z`**: coordinates. The point cloud can be displayed with `lidR::plot`
+```{r ALS.coordinates, eval=FALSE}
+lidR::plot(point_cloud)
+```
+
+**`gpstime`**: time of emission of the pulse associated to the echo. Provided the precision is sufficient, it allows to retrieve echoes originating from the same pulse.
+
+```{r ALS.gps, eval=TRUE}
+# exemple of points with same gps time
+point_cloud@data[point_cloud$gpstime==point_cloud$gpstime[38],1:7]
+```
+**`Intensity`**: amplitude of signal peak associated to the echo. It is usually the raw value recorded by the receiver. In case of a diffuse, circular target entirely covered by the laser beam, the relationship between the transmitted ($P_t$) and received ($P_r$) laser power is [@BALTSAVIAS1999199]:
+
+$$
+P_r = \rho \frac{M^2 {D_r}^2 {D_{tar}^2}}{4 R^2 (R\gamma + D)^2} P_t
+$$
+with:
+
+* $\frac{\rho}{\pi}$ a bidirectional Lambertian reflection function
+* $M$ the atmospheric transmission
+* $D$ the aperture diameter of the laser emitter
+* $D_r$ diameter of receiving optics
+* $D_{tar}$ diameter of target object
+* $R$ range (distance between object and sensor)
+* $\gamma$ laser beam divergence
+
+Assuming $D \ll R\gamma$, it can be simplified to:
+
+$$
+P_r = \rho D_{tar}^2 \times \frac{1}{R^4} \times M^2 \times \frac{P_t D_r^2}{\gamma^2} \times \frac{1}{4}
+$$
+
+If one considers that the power transmitted by the emitter $P_t$ is constant and that the distance to target $R$ does not vary during the acquisition, then the amplitude is proportional to a combination of the geometric and radiometric properties of the target $\rho D_{tar}^2$. The case of multiple echoes for a single pulse is more complex.
+
+In case the aircraft trajectory is available, it is possible to normalize amplitude values from the range ($R$) effect. One must however be aware that the resulting distribution of amplitude values might not be homogeneous because in areas with longer range the signal to noise ratio is lower.
+
+```{r ALS.intensity, eval=TRUE}
+hist(point_cloud@data$Intensity, xlab = "Raw value", main = "Histogram of Intensity")
+```
+
+**Echo position within returns from the same pulse**: `ReturnNumber` is the order of arrival of the echo among the `NumberOfReturns` echoes associated to a given pulse. Echoes are sometimes referred to as:
+
+* single if `NumberOfReturns = ReturnNumber = 1`
+* first if `ReturnNumber = 1`
+* last if `NumberOfReturns = ReturnNumber`
+* intermediate if `1 < ReturnNumber < NumberOfReturns`
+
+The contingency table of `ReturnNumber` and `NumberOfReturns` should have no values below the diagonal, and approximately the same values in each column.
+
+```{r ALS.return, eval=TRUE}
+table(point_cloud@data$ReturnNumber, point_cloud@data$NumberOfReturns)
+```
+
+**`ScanDirectionFlag`** and **`EdgeOfFlightline`** indicate the scanning direction and if the scanner approached the edge of scan lines.
+
+```{r ALS.scanedga, eval=TRUE}
+table(point_cloud@data$ScanDirectionFlag, point_cloud@data$EdgeOfFlightline)
+```
+
+**`Classification`** is usually obtained by an automated analysis of the point cloud geometry (e.g. Terrascan software uses the TIN adaptive algorithm by @Axelsson00) followed by manual validation and edition. The categories available in the classification are varying, the minimum being to identify a set of ground points. Classes are defined by numbers which should respect the ASPRS LAS specifications:
+
+* 0 Created, never classified 
+* 1 Unclassified
+* 2 Ground 
+* 3 Low Vegetation 
+* 4 Medium Vegetation 
+* 5 High Vegetation 
+* 6 Building 
+* 7 Low Point (noise) 
+* 8 Model Key-point (mass point) 
+* 9 Water.
+
+Some specific fields might be added by the data provider during processing or for specific cases. In this file, some points where left with a classification value of 12.
+
+```{r ALS.Classification, eval=TRUE}
+table(point_cloud@data$Classification)
+```
+
+**Flags** `Synthetic`, `Keypoint`, and `Withheld` indicate points which had a particular role in the classification process.
+
+**`ScanAngleRank`** is the scan angle associated to the pulse. Origin of values might correspond to the horizontal or vertical. In most cases it is the scan angle relative to the laser scanner, but sometimes values relative to nadir are indicated. In the case of this file values are from 60 to 120: the scan range is ± 30 degrees on both sides of the scanner, with 90 the value when pulses are emitted downwards.
+
+```{r ALS.scanangle, eval=TRUE}
+hist(point_cloud@data$ScanAngleRank, main = "Histogram of scan angles", xlab = "Angle (degrees)")
+```
+**`UserData`** is a field to store additional information, but it is limited in capacity. In latest version of LAS files, additional attributes can be added.
+
+**`PointSourceID`** is usually a value indicating the flight strip number.
+
+```{r ALS.pointSource, eval=TRUE}
+table(point_cloud@data$PointSourceID)
+# boxplot(ScanAngleRank ~ PointSourceID, data = point_cloud@data)
+```
+
+### Select and display a specific area
+
+The `lidR` package has functions to extract an area of interest from a LAS object or a set of LAS files. Then the `lidR::plot` function can be used to display the point cloud, colored by different attributes.
+
+```{r extractPointCloud, include=TRUE, eval=TRUE}
+# extract 15 meter radius disk at center of data
+selection <- lidR::clip_circle(point_cloud, 899750, 6447750, 15)
+```
+
+```{r displayPointCloud, include=TRUE, eval=html, webgl=TRUE, fig.width=6, fig.height=6, warning=FALSE}
+rgl::par3d(mouseMode = "trackball") # parameters for interaction with mouse
+# colored by height (default)
+lidR::plot(selection)
+# lidR::plot(selection, color = "Intensity")
+# lidR::plot(selection, color = "Classification")
+# lidR::plot(selection, color = "ReturnNumber")
+```
+
+## Multiple files
+
+### Build catalog
+
+ALS data from an acquisition are usually delivered in tiles, i.e. files which extents correspond to a division of the acquisition area in non-overlapping rectangles. The `lidR` package contains a catalog engine which enables to consider a set of files simultaneously by referencing them in a catalog object. Functions can then be applied to the whole dataset, or after extraction of subsets based on location or attributes. 
+
+```{r catalog.load, include = TRUE, out.width = '50%', fig.dim=c(3.5, 3), max.height='200px'}
+# build catalog from folder
+cata <- lidR::catalog("../data/aba.model/ALS/tiles.laz/")
+# set projection
+lidR::projection(cata) <- 2154
+cata
+# information about individual files in the data slot
+cata@data
+# plot extent of files
+lidR::plot(cata)
+```
+
+### Checking specifications
+
+For checking purposes one might be interested to map the following features, in order to estimate the potential of the data or to evaluate the acquisition with respect to the technical specifications:
+
+* pulse density,
+* point density,
+* ground point density,
+* strip overlap.
+
+The spatial resolution of the map should be relevant with respect to the announced specifications, and reach a trade-off between the amount of spatial detail and the possibility to evaluate the whole dataset at a glance.
+
+The `lidR::grid_metrics` function is very efficient to derive statistics maps from the point cloud. Point statistics are computed for all pixels at the given resolution, based on points located in the cells. Point density patterns are usually linked to variations in aircraft speed or strip overlap. In this case the density variation observed between the tiles is probably due to errors in processing.
+
+```{r catalog.grid_metrics, include = TRUE, out.width = '50%', fig.dim=c(6.5, 3.5)}
+# resolution
+res <- 5
+# build function to compute desired statistics
+f <-
+  function(rn, c, pid) { # function takes 3 inputs (ReturnNumber, Classificaiton and PointId attributes)
+    list(
+      # number of points
+      npoints = length(rn),
+      # number of first points (proxy for pulse number)
+      npulses = sum(rn == 1), 
+      # number of points of class ground
+      nground = sum(c == 2), 
+      # number of unique values of strips
+      nstrip = length(unique(pid)) 
+    )
+  }
+# apply the function to the catalog, indicating which attributes to use, and output resolution
+metrics <- lidR::grid_metrics(cata, ~f(ReturnNumber, Classification, PointSourceID), res = res)
+# convert to density
+for (i in c("npoints", "npulses", "nground")) {
+  metrics[[i]] <- metrics[[i]] / res ^ 2
+}
+raster::plot(metrics)
+# percentage of 5 m cells with no ground points
+p_no_ground <- round(sum(raster::values(metrics$nground)==0) / length(raster::values(metrics$nground)) * 100, 1)
+# percentage of 5 m cells with less than 5 pulses / m2
+p_pulse_5 <- round(sum(raster::values(metrics$npulses)<10) / length(raster::values(metrics$npulses)) * 100, 1)
+```
+
+From those maps summary statistics can be derived, e.g. the percentage of cells with no ground points (`r p_no_ground`). Checking that the spatial distribution of pulses is homogeneous can be informative.
+
+```{r catalog.maps}
+hist(raster::values(metrics$npulses), main = "Histogram of pulse density", xlab = "Pulse density /m2", ylab = "Number of 5m cells")
+raster::plot(metrics$npulses < 5, main = "Areas with pulse density < 5 /m2")
+
+```
+
+# Data pre-processing
+
+## Point cloud normalization
+
+Normalization of a point cloud refers to the operation that consists in replacing the altitude coordinate by the height to the ground in the `Z` attribute. This is an important pre-requisite for the analysis of the vegetation 3D structure from the point cloud.
+A pre-existing digital terrain model (DTM) can be used for normalizing, or it can be computed on-the-fly for this purpose. Obtaining relevant height values requires a DTM with sufficient resolution or the presence of enough ground points inside the data. To make sure that correct values are obtained at the border of the region of interest, it is advisable to add a buffer area before normalization.
+
+The next line normalizes the point cloud using a TIN constructed from the points classified as ground (class 2). The height values are stored in the `Z` attribute.  The altitude value is copied into a new attribute called `Zref`. When the LAS object is written to a LAS file, this field is lost, unless the file version allows the writing of additional fields.
+
+```{r normalization.LAS, include = TRUE, fig.dim=c(8, 4)}
+# normalize with tin algorithm for computed of surface from ground points
+selection_n <- lidR::normalize_height(selection, lidR::tin())
+# lidR::plot(selection_n)
+par(mfrow=c(1,2))
+hist(selection$Z, main = "Histogram of altitude values", xlab = "Altitude (m)")
+hist(selection_n$Z, main = "Histogram of height values", xlab = "Height (m)")
+```
+
+
+
+## Computation of digital elevation models
+
+Digital elevation models (DEMs) are raster images where each cell value corresponds to the height or altitude of the earth surface. They are are easier to display and use than point clouds but the information of the 3D structure of vegetation is mostly lost during computation. Three main types of DEMs can be computed from ALS point clouds.
+
+### Digital terrain model (DTM)
+
+It represents the altitude of the bare Earth surface. It is computed using points classified as ground. For better representation of certain topographical features, some additional points or lines might be added. The function `lidR::grid_terrain` proposes different algorithms for computation, and works either with catalogs or LAS objects.
+
+```{r dem.terrain, include = TRUE, message = FALSE}
+# create dtm at 0.5 m resolution with TIN algorithm
+dtm <- lidR::grid_terrain(cata, res = 0.5, lidR::tin())
+dtm
+```
+
+Functions from the `raster` package are available to derive topographical rasters (slope, aspect, hillshade, contour lines) from the DTM.
+
+```{r dem.terrain.derived, include = TRUE, message = FALSE}
+dtm_slope <- raster::terrain(dtm)
+dtm_aspect <- raster::terrain(dtm, opt = "aspect")
+dtm_hillshade <- raster::hillShade(dtm_slope, dtm_aspect)
+raster::plot(dtm_hillshade, col = gray(seq(from = 0, to = 1, by = 0.01)), main = "Hillshade")
+```
+
+### Digital surface model (DSM)
+
+It represents the altitude of the Earth surface, including natural and man-made objects. It is computed using all points. The function `lidR::grid_canopy` proposes different algorithms for that purpose, and works either with catalogs or LAS objects. A straightforward way to compute the DSM is to retain the value of the highest point present in each pixel. In this case, choosing an adequate resolution with respect to the point density should limit the proportion of empty cells.
+
+```{r dem.surface, include = TRUE, message = FALSE}
+# create dsm at 0.5 m resolution with highest point per pixel algorithm
+dsm <- lidR::grid_canopy(cata, res = 0.5, lidR::p2r())
+dsm
+```
+
+### Canopy Height Model (CHM)
+
+It represents the height of objects of the Earth surface. It can be computed either by subtracting the DTM to the DSM, or by using the normalized point cloud as input in the DSM computation workflow (slightly more precise than previous option).
+
+```{r dem.chm, include = TRUE, message = FALSE}
+# create chm from dsm and dtm
+chm <- dsm - dtm
+chm
+# create chm from normalized point cloud
+cata_norm <- lidR::catalog("../data/aba.model/ALS/tiles.norm.laz/")
+# set projection
+chm_norm <- lidR::grid_canopy(cata_norm, res = 0.5, lidR::p2r())
+chm_norm
+# boxplot(raster::values(chm - chm_norm))
+```
+
+If high points are present in the dataset, it is sometimes useful to apply a threshold to the CHM values.
+
+```{r dem.chm.threshold, include = TRUE, message = FALSE}
+threshold <- 40
+chm[chm > threshold] <- threshold
+raster::plot(chm, main = "Canopy Height Model (m)")
+```
+
+## Batch processing of files
+
+To process multiple files and save outputs, the catalog engine can be used by specifying the output options.
+
+```{r normalization.files, eval = FALSE, warning = FALSE}
+# use parallelisation for faster processing
+library(foreach)
+# create parallel frontend, specify to use two parallel sessions
+doFuture::registerDoFuture()
+future::plan("multisession", workers = 2L)
+#
+# output resolution
+res <- 0.5
+# process by file
+lidR::opt_chunk_size(cata) <- 0
+#
+# buffer size for DTM computation and normalization
+lidR::opt_chunk_buffer(cata) <- 10
+# DTM output file template
+lidR::opt_output_files(cata) <- "dtm_{ORIGINALFILENAME}"
+# create DTM
+dtm <- lidR::grid_terrain(cata, 0.5, lidR::tin())
+#
+# laz compression
+lidR::opt_laz_compression(cata) <- TRUE
+# LAZ output file template
+lidR::opt_output_files(cata) <- "norm_{ORIGINALFILENAME}"
+# create normalized files
+cata.norm <- lidR::normalize_height(cata, lidR::tin())
+#
+# buffer size for dsm
+lidR::opt_chunk_buffer(cata) <- 0
+# DSM output file template
+lidR::opt_output_files(cata) <- "dsm_{ORIGINALFILENAME}"
+# create DSM
+dsm <- lidR::grid_canopy(cata, 0.5, lidR::p2r())
+#
+# create CHM from normalized files
+# process by file
+lidR::opt_chunk_size(cata_norm) <- 0
+# buffer size
+lidR::opt_chunk_buffer(cata_norm) <- 0
+# CHM output file template
+lidR::opt_output_files(cata_norm) <- "chm_{ORIGINALFILENAME}"
+# create CHM
+chm <- lidR::grid_canopy(cata_norm, 0.5, lidR::p2r())
+```
+
+## References
\ No newline at end of file
--- a/bib/bibliography.bib
+++ b/bib/bibliography.bib
@@ -76,3 +76,27 @@ ISSN={1999-4907},
  abstract = {Abstract Habitat suitability models (HSMs) are widely used to plan actions for species of conservation interest. Models that will be turned into conservation actions need predictors that are both ecologically pertinent and fit managers’ conceptual view of ecosystems. Remote sensing technologies such as light detection and ranging (LiDAR) can describe landscapes at high resolution over large spatial areas and have already given promising results for modeling forest species distributions. The point-cloud (PC) area-based LiDAR variables are often used as environmental variables in HSMs and have more recently been complemented by object-oriented (OO) metrics. However, the efficiency of each type of variable to capture structural information on forest bird habitat has not yet been compared. We tested two hypotheses: (1) the use of OO variables in HSMs will give similar performance as PC area-based models; and (2) OO variables will improve model robustness to LiDAR datasets acquired at different times for the same area. Using the case of a locally endangered forest bird, the capercaillie (Tetrao urogallus), model performance and predictions were compared between the two variable types. Models using OO variables showed slightly lower discriminatory performance than PC area-based models (average ΔAUC = −0.032 and −0.01 for females and males, respectively). OO-based models were as robust (absolute difference in Spearman rank correlation of predictions ≤ 0.21) or more robust than PC area-based models. In sum, LiDAR-derived PC area-based metrics and OO metrics showed similar performance for modeling the distribution of the capercaillie. We encourage the further exploration of OO metrics for creating reliable HSMs, and in particular testing whether they might help improve the scientist–stakeholder interface through better interpretability.},
  year = {2020}
 }
+
+@article{BALTSAVIAS1999199,
+title = {Airborne laser scanning: basic relations and formulas},
+journal = {ISPRS Journal of Photogrammetry and Remote Sensing},
+volume = {54},
+number = {2},
+pages = {199-214},
+year = {1999},
+issn = {0924-2716},
+doi = {10.1016/S0924-2716(99)00015-5},
+url = {https://www.sciencedirect.com/science/article/pii/S0924271699000155},
+author = {E.P Baltsavias},
+keywords = {Airborne laser scanning, Terminology, Basic relations, Formulas, 3D accuracy analysis},
+abstract = {An overview of basic relations and formulas concerning airborne laser scanning is given. They are divided into two main parts, the first treating lasers and laser ranging, and the second one referring to airborne laser scanning. A separate discussion is devoted to the accuracy of 3D positioning and the factors influencing it. Examples are given for most relations, using typical values for ALS and assuming an airplane platform. The relations refer mostly to pulse lasers, but CW lasers are also treated. Different scan patterns, especially parallel lines, are treated. Due to the complexity of the relations, some formulas represent approximations or are based on assumptions like constant flying speed, vertical scan, etc.}
+}
+
+@InProceedings{Axelsson00,
+  Title                    = {{DEM} generation from laser scanner data using adaptive {TIN} models},
+  Author                   = {Axelsson, P.},
+  Booktitle                = {XIXth ISPRS Congress, IAPRS},
+  Year                     = {2000},
+  Pages                    = {110--117},
+  Volume                   = {XXXIII},
+}
\ No newline at end of file
--- a/export/ALS_data_preprocessing.html
+++ b/export/ALS_data_preprocessing.html
--- a/export/ALS_data_preprocessing.pdf
+++ b/export/ALS_data_preprocessing.pdf