ALS_data_preprocessing.Rmd 19.8 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: "R workflow for ALS data pre-processing"
author: "Jean-Matthieu Monnet"
date: "`r Sys.Date()`"
output:
  html_document: default
  pdf_document: default
papersize: a4
bibliography: "../bib/bibliography.bib"
---

```{r setup, include=FALSE}
# erase all
cat("\014")
rm(list = ls())
# knit options
knitr::opts_chunk$set(echo = TRUE)
# Set so that long lines in R will be wrapped:
knitr::opts_chunk$set(tidy.opts = list(width.cutoff = 80), tidy = TRUE)
knitr::opts_chunk$set(fig.align = "center")
# for display of rgl in html
knitr::knit_hooks$set(webgl = rgl::hook_webgl)
# output to html
html <- TRUE
```

---

```{r, include=FALSE}
options(width = 60)
local({
  hook_output <- knitr::knit_hooks$get('output')
  knitr::knit_hooks$set(output = function(x, options) {
    if (!is.null(options$max.height)) options$attr.output <- c(
      options$attr.output,
      sprintf('style="max-height: %s;"', options$max.height)
    )
    hook_output(x, options)
  })
})
```

43
The `R` code below is designed for checking and pre-processing Airborne Laser Scanning (ALS, or lidar remote sensing) data. The workflow is based on functions of the package [lidR](https://cran.r-project.org/package=lidR) (tested with version `r packageVersion("lidR")`), and includes the following steps:
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

* check the content of one file, 
+ create images and statistics from multiple files,
+ compute digital surface models.

Licence: GNU GPLv3 / [source page](https://gitlab.irstea.fr/jean-matthieu.monnet/lidartree_tutorials/-/blob/master/R/ALS_data_preprocessing.Rmd) 

# ALS data

Files required for the tutorial are too large to be hosted on the gitlab repository, they can be downloaded as a [zip file](https://drive.google.com/u/0/uc?export=download&id=1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i) from Google drive, and should be extracted in the folder "data/aba.model/ALS/" before proceeding with the processing. Files can be automatically downloaded thanks to the `googledrive` package with the following code, this requires authenticating yourself and authorizing the package to deal on your behalf with Google Drive.

```{r downloadGoogledrive, include = TRUE, eval = FALSE}
# set temporary file
temp <- tempfile(fileext = ".zip")
# download file from google drive
dl <- googledrive::drive_download(googledrive::as_id("1ripo-PLZ8_IjE7rAQ2RECj-fjg1UpC5i"),
                     path = temp,
                     overwrite = TRUE)
# unzip to folder
out <- unzip(temp, exdir = "../data/aba.model/ALS/")
# remove temporary file
unlink(temp)
```

# Data checking

## Single file

### Load file

ALS data are usually stored in files respecting the [ASPRS LAS specifications](https://www.asprs.org/divisions-committees/lidar-division/laser-las-file-format-exchange-activities). Last version of LAS file format is 1.4. LAS files contain:

* metadata regarding the acquisition,
* basic attributes of each recorded point,
* additional attributes for each point (optional),
* waveforms (optional).

81
A single LAS file can be loaded in R with the `lidR::readLAS` function, which returns an object of class LAS. Some checks are automatically performed when the file is read. Function options allow to skip loading certain attributes, which might help saving time and memory. The object contains several slots, including:
82
83

* `header`: metadata read from the file,
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
84
85
* `data`: point cloud attributes.
* `crs`: projection information.
86

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
87
The bounding box of the data can be accessed with the function `sf::st_bbox`.
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
88

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
89
90
91
It is advisable to fill the projection information if missing, as spatial objects computed from the point cloud will inherit the information. 

A warning is issued when loading the following example file because unexpected values are present in the scan angle field.
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110

```{r ALS.load, include = TRUE}
# read file
point_cloud <- lidR::readLAS("../data/aba.model/ALS/tiles.laz/899500_6447500.laz")
# set projection info (epsg code of Lambert 93)
lidR::projection(point_cloud) <- 2154
# summary of object content
point_cloud
```

### Metadata

The `header` slot contains information about the file version, data creation, presence of optional attributes, some point statistics, and offset/scale factors used internally to compute coordinates.

```{r ALS.header, eval=TRUE, max.height='200px'}
point_cloud@header
```
### Basic attributes

111
The `data` slot contains point attributes. Some attributes may be present or not, or have different names depending on the format version of the LAS file. The main attributes (coordinates, intensity, classification, return number) are always present.
112
113
114
115

**`X`, `Y`, `Z`**: coordinates. The point cloud can be displayed with `lidR::plot`
```{r ALS.coordinates, eval=FALSE}
lidR::plot(point_cloud)
116
117
118
# with lidR 4.0.1 and rgl >= 0.108 the point cloud is not centered on bounding box
# you might consider installing an older version of rgl
# install.packages("https://cran.r-project.org/src/contrib/Archive/rgl/rgl_0.107.14.tar.gz")
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
```

**`gpstime`**: time of emission of the pulse associated to the echo. Provided the precision is sufficient, it allows to retrieve echoes originating from the same pulse.

```{r ALS.gps, eval=TRUE}
# exemple of points with same gps time
point_cloud@data[point_cloud$gpstime==point_cloud$gpstime[38],1:7]
```
**`Intensity`**: amplitude of signal peak associated to the echo. It is usually the raw value recorded by the receiver. In case of a diffuse, circular target entirely covered by the laser beam, the relationship between the transmitted ($P_t$) and received ($P_r$) laser power is [@BALTSAVIAS1999199]:

$$
P_r = \rho \frac{M^2 {D_r}^2 {D_{tar}^2}}{4 R^2 (R\gamma + D)^2} P_t
$$
with:

* $\frac{\rho}{\pi}$ a bidirectional Lambertian reflection function
* $M$ the atmospheric transmission
* $D$ the aperture diameter of the laser emitter
* $D_r$ diameter of receiving optics
* $D_{tar}$ diameter of target object
* $R$ range (distance between object and sensor)
* $\gamma$ laser beam divergence

Assuming $D \ll R\gamma$, it can be simplified to:

$$
P_r = \rho D_{tar}^2 \times \frac{1}{R^4} \times M^2 \times \frac{P_t D_r^2}{\gamma^2} \times \frac{1}{4}
$$

If one considers that the power transmitted by the emitter $P_t$ is constant and that the distance to target $R$ does not vary during the acquisition, then the amplitude is proportional to a combination of the geometric and radiometric properties of the target $\rho D_{tar}^2$. The case of multiple echoes for a single pulse is more complex.

In case the aircraft trajectory is available, it is possible to normalize amplitude values from the range ($R$) effect. One must however be aware that the resulting distribution of amplitude values might not be homogeneous because in areas with longer range the signal to noise ratio is lower.

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
152
```{r ALS.intensity, eval=TRUE, fig.width=4}
153
hist(point_cloud$Intensity, xlab = "Raw value", main = "Histogram of Intensity")
154
155
156
157
```

**Echo position within returns from the same pulse**: `ReturnNumber` is the order of arrival of the echo among the `NumberOfReturns` echoes associated to a given pulse. Echoes are sometimes referred to as:

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
158
159
160
161
* *single* if `NumberOfReturns = ReturnNumber = 1`
* *first* if `ReturnNumber = 1`
* *last* if `NumberOfReturns = ReturnNumber`
* *intermediate* if `1 < ReturnNumber < NumberOfReturns`
162
163
164
165

The contingency table of `ReturnNumber` and `NumberOfReturns` should have no values below the diagonal, and approximately the same values in each column.

```{r ALS.return, eval=TRUE}
166
table(point_cloud$ReturnNumber, point_cloud$NumberOfReturns)
167
168
169
170
171
```

**`ScanDirectionFlag`** and **`EdgeOfFlightline`** indicate the scanning direction and if the scanner approached the edge of scan lines.

```{r ALS.scanedga, eval=TRUE}
172
table(point_cloud$ScanDirectionFlag, point_cloud$EdgeOfFlightline)
173
174
```

175
**`Classification`** is usually obtained by an automated analysis of the point cloud geometry (e.g. Terrascan software uses the TIN adaptive algorithm by @Axelsson00) followed by manual validation and edition. The categories of the classification depend on the specifications, the minimum is to identify a set of ground points. Classes are defined by numbers which should respect the ASPRS LAS specifications:
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190

* 0 Created, never classified 
* 1 Unclassified
* 2 Ground 
* 3 Low Vegetation 
* 4 Medium Vegetation 
* 5 High Vegetation 
* 6 Building 
* 7 Low Point (noise) 
* 8 Model Key-point (mass point) 
* 9 Water.

Some specific fields might be added by the data provider during processing or for specific cases. In this file, some points where left with a classification value of 12.

```{r ALS.Classification, eval=TRUE}
191
table(point_cloud$Classification)
192
193
194
195
196
197
```

**Flags** `Synthetic`, `Keypoint`, and `Withheld` indicate points which had a particular role in the classification process.

**`ScanAngleRank`** is the scan angle associated to the pulse. Origin of values might correspond to the horizontal or vertical. In most cases it is the scan angle relative to the laser scanner, but sometimes values relative to nadir are indicated. In the case of this file values are from 60 to 120: the scan range is ± 30 degrees on both sides of the scanner, with 90 the value when pulses are emitted downwards.

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
198
```{r ALS.scanangle, eval=TRUE, fig.width=4}
199
hist(point_cloud$ScanAngleRank, main = "Histogram of scan angles", xlab = "Angle (degrees)")
200
```
201
**`UserData`** is a field to store additional information, but it is limited in capacity. In the latest version of LAS files, additional attributes can be added.
202
203
204
205

**`PointSourceID`** is usually a value indicating the flight strip number.

```{r ALS.pointSource, eval=TRUE}
206
table(point_cloud$PointSourceID)
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# boxplot(ScanAngleRank ~ PointSourceID, data = point_cloud@data)
```

### Select and display a specific area

The `lidR` package has functions to extract an area of interest from a LAS object or a set of LAS files. Then the `lidR::plot` function can be used to display the point cloud, colored by different attributes.

```{r extractPointCloud, include=TRUE, eval=TRUE}
# extract 15 meter radius disk at center of data
selection <- lidR::clip_circle(point_cloud, 899750, 6447750, 15)
```

```{r displayPointCloud, include=TRUE, eval=html, webgl=TRUE, fig.width=6, fig.height=6, warning=FALSE}
rgl::par3d(mouseMode = "trackball") # parameters for interaction with mouse
# colored by height (default)
lidR::plot(selection)
# lidR::plot(selection, color = "Intensity")
# lidR::plot(selection, color = "Classification")
# lidR::plot(selection, color = "ReturnNumber")
```

## Multiple files

### Build catalog

232
ALS data from an acquisition are usually delivered in tiles, i.e. files which extents correspond to a division of the acquisition area in non-overlapping rectangles. The `lidR` package provides a catalog engine which enables to consider a set of files simultaneously by referencing them in a catalog object. Functions can then be applied to the whole dataset, or after extraction of subsets based on location or attributes. Point density patterns are usually linked to variations in aircraft speed, changes in land cover type, or strip overlap.
233

234
```{r catalog.load, include = TRUE, fig.dim=c(5, 3), max.height='200px'}
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
# build catalog from folder
cata <- lidR::catalog("../data/aba.model/ALS/tiles.laz/")
# set projection
lidR::projection(cata) <- 2154
cata
# information about individual files in the data slot
cata@data
# plot extent of files
lidR::plot(cata)
```

### Checking specifications

For checking purposes one might be interested to map the following features, in order to estimate the potential of the data or to evaluate the acquisition with respect to the technical specifications:

* pulse density,
* point density,
* ground point density,
* strip overlap.

The spatial resolution of the map should be relevant with respect to the announced specifications, and reach a trade-off between the amount of spatial detail and the possibility to evaluate the whole dataset at a glance.

257
The `lidR::pixel_metrics` function is very efficient to derive statistics maps from the point cloud. Point statistics are computed for all pixels at the given resolution, based on points contained in the cells (the function `lidR::rasterize_density` is specifically designed to compute point density). In this example the density variation observed between the tiles is due to the fact that points from a given flight strip where written in the eastern tile but not in the western one.
258

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
259
```{r catalog.grid_metrics, include = TRUE, fig.width=9, fig.height = 5, warning = FALSE}
260
261
262
263
# resolution
res <- 5
# build function to compute desired statistics
f <-
264
  function(rn, c, pid) { # function takes 3 inputs (ReturnNumber, Classification and PointId attributes)
265
266
267
268
    list(npoints = length(rn), # number of points
         npulses = sum(rn == 1), # number of first points (proxy for pulse number)
         nground = sum(c == 2), # number of points of class ground
         nstrip = length(unique(pid))) # number of unique values of strips
269
270
  }
# apply the function to the catalog, indicating which attributes to use, and output resolution
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
271
metrics <- lidR::pixel_metrics(cata, ~f(ReturnNumber, Classification, PointSourceID), res = res)
272
# convert number to density in first three layers
273
274
275
for (i in c("npoints", "npulses", "nground")) {
  metrics[[i]] <- metrics[[i]] / res ^ 2
}
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
276
terra::plot(metrics)
277
# percentage of 5 m cells with no ground points
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
278
p_no_ground <- round(sum(terra::values(metrics$nground)==0) / length(terra::values(metrics$nground)) * 100, 1)
279
# percentage of 5 m cells with less than 5 pulses / m2
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
280
p_pulse_5 <- round(sum(terra::values(metrics$npulses)<10) / length(terra::values(metrics$npulses)) * 100, 1)
281
282
283
284
```

From those maps summary statistics can be derived, e.g. the percentage of cells with no ground points (`r p_no_ground`). Checking that the spatial distribution of pulses is homogeneous can be informative.

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
285
286
287
288
```{r catalog.maps, fig.width = 9, fig.height = 3}
par(mfrow = c(1,1))
layout(matrix(c(1,2,2), nrow = 1, ncol = 3, byrow = TRUE))
# par(fig=c(0,3,0,10)/10)
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
289
hist(terra::values(metrics$npulses), main = "Histogram of pulse density", xlab = "Pulse density /m2", ylab = "Number of 5m cells")
Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
290
# par(fig=c(3,10,0,10)/10)
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
291
terra::plot(metrics$npulses < 5, main = "Areas with pulse density < 5 /m2")
292
293
294
295
296
297
298
299
300
301

```

# Data pre-processing

## Point cloud normalization

Normalization of a point cloud refers to the operation that consists in replacing the altitude coordinate by the height to the ground in the `Z` attribute. This is an important pre-requisite for the analysis of the vegetation 3D structure from the point cloud.
A pre-existing digital terrain model (DTM) can be used for normalizing, or it can be computed on-the-fly for this purpose. Obtaining relevant height values requires a DTM with sufficient resolution or the presence of enough ground points inside the data. To make sure that correct values are obtained at the border of the region of interest, it is advisable to add a buffer area before normalization.

302
The next line normalizes the point cloud using a TIN constructed from the points classified as ground (class 2). The height values are stored in the `Z` attribute. The altitude value is copied into a new attribute called `Zref`. When the LAS object is written to a LAS file, this field is lost, unless the file version allows the writing of additional fields.
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320

```{r normalization.LAS, include = TRUE, fig.dim=c(8, 4)}
# normalize with tin algorithm for computed of surface from ground points
selection_n <- lidR::normalize_height(selection, lidR::tin())
# lidR::plot(selection_n)
par(mfrow=c(1,2))
hist(selection$Z, main = "Histogram of altitude values", xlab = "Altitude (m)")
hist(selection_n$Z, main = "Histogram of height values", xlab = "Height (m)")
```



## Computation of digital elevation models

Digital elevation models (DEMs) are raster images where each cell value corresponds to the height or altitude of the earth surface. They are are easier to display and use than point clouds but the information of the 3D structure of vegetation is mostly lost during computation. Three main types of DEMs can be computed from ALS point clouds.

### Digital terrain model (DTM)

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
321
It represents the altitude of the bare earth surface. It is computed using points classified as ground. For better representation of certain topographical features, some additional points or lines might be added. The function `lidR::rasterize_terrain` proposes different algorithms for computation, and works either with catalogs or LAS objects.
322
323
324

```{r dem.terrain, include = TRUE, message = FALSE}
# create dtm at 0.5 m resolution with TIN algorithm
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
325
dtm <- lidR::rasterize_terrain(cata, res = 0.5, lidR::tin())
326
327
328
dtm
```

Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
329
Functions from the `terra` package are available to derive topographical rasters (slope, aspect, hillshade, contour lines) from the DTM.
330

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
331
```{r dem.terrain.derived, include = TRUE, message = FALSE, fig.width = 6.3, fig.height = 3.2}
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
332
333
334
335
dtm_slope <- terra::terrain(dtm, unit = "radians")
dtm_aspect <- terra::terrain(dtm, v = "aspect",  unit = "radians")
dtm_hillshade <- terra::shade(dtm_slope, dtm_aspect)
terra::plot(dtm_hillshade, col = gray(seq(from = 0, to = 1, by = 0.01)), main = "Hillshade")
336
337
338
339
```

### Digital surface model (DSM)

Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
340
It represents the altitude of the earth surface, including natural and man-made objects. It is computed using all points. The function `lidR::rasterize_canopy` proposes different algorithms for that purpose, and works either with catalogs or LAS objects. A straightforward way to compute the DSM is to retain the value of the highest point present in each pixel. In this case, choosing an adequate resolution with respect to the point density should limit the proportion of empty cells.
341
342
343

```{r dem.surface, include = TRUE, message = FALSE}
# create dsm at 0.5 m resolution with highest point per pixel algorithm
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
344
dsm <- lidR::rasterize_canopy(cata, res = 0.5, lidR::p2r())
345
346
347
348
349
dsm
```

### Canopy Height Model (CHM)

350
It represents the height of objects of the earth surface. It can be computed either by subtracting the DTM to the DSM, or by using the normalized point cloud as input in the DSM computation workflow (slightly more precise than previous option).
351
352
353
354
355
356
357
358

```{r dem.chm, include = TRUE, message = FALSE}
# create chm from dsm and dtm
chm <- dsm - dtm
chm
# create chm from normalized point cloud
cata_norm <- lidR::catalog("../data/aba.model/ALS/tiles.norm.laz/")
# set projection
359
lidR::projection(cata_norm) <- 2154
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
360
chm_norm <- lidR::rasterize_canopy(cata_norm, res = 0.5, lidR::p2r())
361
chm_norm
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
362
# boxplot(terra::values(chm - chm_norm))
363
364
```

365
If high points are present in the point cloud, a threshold can be applied to the CHM values to remove outliers.
366

Monnet Jean-Matthieu's avatar
Monnet Jean-Matthieu committed
367
```{r dem.chm.threshold, include = TRUE, message = FALSE, fig.width = 6.3, fig.height = 3.2}
368
369
threshold <- 40
chm[chm > threshold] <- threshold
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
370
terra::plot(chm, main = "Canopy Height Model (m)")
371
372
373
374
```

## Batch processing of files

375
To process multiple files and save outputs, specify the output options of the catalog engine.
376
377

```{r normalization.files, eval = FALSE, warning = FALSE}
378
# use parallelization for faster processing
379
380
381
382
383
384
385
386
387
388
389
390
391
# create parallel frontend, specify to use two parallel sessions
future::plan("multisession", workers = 2L)
#
# output resolution
res <- 0.5
# process by file
lidR::opt_chunk_size(cata) <- 0
#
# buffer size for DTM computation and normalization
lidR::opt_chunk_buffer(cata) <- 10
# DTM output file template
lidR::opt_output_files(cata) <- "dtm_{ORIGINALFILENAME}"
# create DTM
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
392
dtm <- lidR::rasterize_terrain(cata, 0.5, lidR::tin())
393
394
395
396
397
398
399
400
401
402
403
404
405
#
# laz compression
lidR::opt_laz_compression(cata) <- TRUE
# LAZ output file template
lidR::opt_output_files(cata) <- "norm_{ORIGINALFILENAME}"
# create normalized files
cata.norm <- lidR::normalize_height(cata, lidR::tin())
#
# buffer size for dsm
lidR::opt_chunk_buffer(cata) <- 0
# DSM output file template
lidR::opt_output_files(cata) <- "dsm_{ORIGINALFILENAME}"
# create DSM
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
406
dsm <- lidR::rasterize_canopy(cata, 0.5, lidR::p2r())
407
408
409
410
411
412
413
414
415
#
# create CHM from normalized files
# process by file
lidR::opt_chunk_size(cata_norm) <- 0
# buffer size
lidR::opt_chunk_buffer(cata_norm) <- 0
# CHM output file template
lidR::opt_output_files(cata_norm) <- "chm_{ORIGINALFILENAME}"
# create CHM
Monnet Jean-Matthieu's avatar
WIP 1    
Monnet Jean-Matthieu committed
416
chm <- lidR::rasterize_canopy(cata_norm, 0.5, lidR::p2r())
417
418
419
```

## References