tutorial.Rmd 14.8 KB
Newer Older
Florian de Boissieu's avatar
Florian de Boissieu committed
1
---
Florian de Boissieu's avatar
Florian de Boissieu committed
2
title: "Tutorial for biodivMapR"
Florian de Boissieu's avatar
Florian de Boissieu committed
3
4
5
6
7
8
9
10
11
12
13
author: "Jean-Baptiste Féret, Florian de Boissieu"
date: "`r Sys.Date()`"
output: 
  html_vignette:
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

14
15
16
17
18
19
20
21
22
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=FALSE
)
```


Florian de Boissieu's avatar
Florian de Boissieu committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
This tutorial aims at describing the processing workflow and giving the associated code to compute $\alpha$ and $\beta$ diversity maps on an extraction of Sentinel-2 image taken over Cameroun forest. The workflow is composed of the following steps:

* define the processing parameters

    * input / output files paths
    * output spatial resolution
    * computing options
    
* compute the $\alpha$ and $\beta$ diversity maps:
    * compute PCA and select best components
    * compute the diversity maps
    
* validate results comparing to field plots measurements



# Processing parameters

## Input / Output files
jbferet's avatar
jbferet committed
42
43
The input images are expected to be in ENVI HDR format, BIL interleaved. 
The functions `perform_radiometric_filtering` and `perform_PCA` start with a procedure checkin if the image format is as expected. if not, the functions will retrun a message explaining the problem and will stop the process.
Florian de Boissieu's avatar
Florian de Boissieu committed
44

jbferet's avatar
jbferet committed
45
If the format is not ENVI format BIL interleaved, the function `raster2BIL` allows conversion into proper format and returns updated `Input_Image_File`
Florian de Boissieu's avatar
Florian de Boissieu committed
46

jbferet's avatar
jbferet committed
47
48
49
50
51
Spectral bands should be defined if the image is multi or hyperspectral image. 

A mask can also be set to work on a selected part of the input image. The mask is expected to be a raster in the same format as the image (ENVI HDR), with values 0 = masked or 1 = selected. Only one band is required. If no mask is to be used set `Input_Mask_File = FALSE`.

The output directory defined with `Output_Dir` will contain all the results. For each image processed, a subdirectory will be automatically created after its name.
Florian de Boissieu's avatar
Florian de Boissieu committed
52

53
54

```{r Input / Output files}
jbferet's avatar
jbferet committed
55
Input_Image_File  = system.file('extdata', 'RASTER', 'S2A_T33NUD_20180104_Subset', package = 'biodivMapR')
Florian de Boissieu's avatar
Florian de Boissieu committed
56

jbferet's avatar
jbferet committed
57
58
59
60
# Input.Image.File  = raster2BIL(Raster.Path = Input.Image.File,
#                                        Sensor = 'SENTINEL_2A',
#                                        Convert.Integer = TRUE,
#                                        Output.Directory = '~/test')
Florian de Boissieu's avatar
Florian de Boissieu committed
61

jbferet's avatar
jbferet committed
62
63
64
Input_Mask_File   = FALSE

Output_Dir        = 'RESULTS'
Florian de Boissieu's avatar
Florian de Boissieu committed
65
66
```

67
68
The image provided with the package is a subset of tile T33NUD acquired by Sentinel-2A satellite over Cameroonese rainforest in January 4th, 2018. 
<img align="bottom" width="100%" height="100%" src="../man/figures/01_RGB_S2A_T33NUD_20180104_Subset.png">
69

Florian de Boissieu's avatar
Florian de Boissieu committed
70
## Spatial resolution
jbferet's avatar
jbferet committed
71
The algorithm estimates \alpha and \beta diversity within a window, that is also the output spatial resolution. It is defined in number of pixels of the input image with parameter `window_size`, e.g. `window_size = 10` meaning a window of 10x10 pixels. It will be the spatial resolution of the ouput rasters.
Florian de Boissieu's avatar
Florian de Boissieu committed
72
73

As a rule of thumb, spatial units between 0.25 and 4 ha usually match with ground data.
jbferet's avatar
jbferet committed
74
if `window_size` is too small, it will result in low number of pixels per spatial unit, hence limited range of variation of diversity in the image.
Florian de Boissieu's avatar
Florian de Boissieu committed
75

jbferet's avatar
jbferet committed
76
In this example, the spatial resolution of the input raster is 10m. Setting `window_size = 10` will result in diversity maps of spatial resolution 100m x 100m.
Florian de Boissieu's avatar
Florian de Boissieu committed
77

78
```{r Spatial resolution}
jbferet's avatar
jbferet committed
79
window_size = 10
Florian de Boissieu's avatar
Florian de Boissieu committed
80
81
82
83
84
```

## PCA filtering
If set to `TRUE`, a second filtering based on PCA outliers is processed.

85
```{r PCA filtering}
86
FilterPCA = FALSE
Florian de Boissieu's avatar
Florian de Boissieu committed
87
88
89
90
91
```

## Computing options
The use of computing ressources can be controled with the following parameters: 

jbferet's avatar
jbferet committed
92
93
* `nbCPU` controls the parallelisation of the processing: how many CPUs will be asssigned for multithreading, 
* `MaxRAM` controls the size in GB of the input image chunks processed by each thread (this does not correspond to the max amount of RAM allocated),
jbferet's avatar
jbferet committed
94
* `nbclusters` controls the number of clusters defined by k-means clustering for each repetition. Images showing moderate diversity (temperate forests) may need lower number of clusters (20), whereas highly diverse tropical forest require more (50). The larger the value the longer the computation time. Users are invited to experiment with different numbers of clusters in order to identify the snsitivity of divrsity patterns to this parameter.
Florian de Boissieu's avatar
Florian de Boissieu committed
95

96
```{r Computing options}
jbferet's avatar
jbferet committed
97
nbCPU         = 2
Florian de Boissieu's avatar
Florian de Boissieu committed
98
99
100
101
102
103
104
MaxRAM        = 0.5
nbclusters    = 50
```

# Main processing worflow
## Mask non vegetated / shaded / cloudy pixels

105
```{r Mask non vegetated / shaded / cloudy pixels}
jbferet's avatar
jbferet committed
106
107
108
NDVI_Thresh = 0.5
Blue_Thresh = 500
NIR_Thresh  = 1500
Florian de Boissieu's avatar
Florian de Boissieu committed
109
print("PERFORM RADIOMETRIC FILTERING")
jbferet's avatar
jbferet committed
110
111
112
Input_Mask_File = perform_radiometric_filtering(Input_Image_File, Input_Mask_File, Output_Dir,
                                            NDVI_Thresh = NDVI_Thresh, Blue_Thresh = Blue_Thresh,
                                            NIR_Thresh = NIR_Thresh)
Florian de Boissieu's avatar
Florian de Boissieu committed
113
114
115
```

## PCA
116
A pixel-based PCA is run on the input image across the spectral bands to select the most interesting spectral information relative to spectral diversity and remove shaded pixels, spatial noise and sensor artefacts. 
Florian de Boissieu's avatar
Florian de Boissieu committed
117

118
119
120
121
122
123
124
125
126
127
The resulting PCA raster are then stored in a binalry file in the results directory, which in our case corresponds to
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/PCA/OutputPCA_8_PCs`

This PCA raster file can be displayed using QGIS or any GIS / image processing software. Here, the PCs corresponding to our image look like this: 

<img align="bottom" width="100%" height="100%" src="../man/figures/02_PCs_comp.png">

This PCA band selection left to user judgement, who then writes the band to be kept in a `.txt` file located in the same directory as the PCA raster file. The file is automatically created and ready to edit with function `select_PCA_components`. Each selected band should be identified per line in this file. 
The main goal of PC selection is to discard PCs showing no relevant information corresponding to vegetation, or including artifacts possibly explained by sensor properties. It is somehow a subjective process, and we are currently working on automatic selection of these components.

jbferet's avatar
jbferet committed
128
For this example, PCA bands 1, 2, 4, 5, 6 and 8 can be kept if writing the following lines in file `Selected_Components.txt` opened for edition (do not forget carriage return after last value):
Florian de Boissieu's avatar
Florian de Boissieu committed
129
130
131
```
1
2
132
4
Florian de Boissieu's avatar
Florian de Boissieu committed
133
5
134
135
6
8
jbferet's avatar
jbferet committed
136

Florian de Boissieu's avatar
Florian de Boissieu committed
137
```
138
PC#3 and PC#7 were discarded as the main patterns observed for these components did not correspond to vegetation patterns. As a rule of thumb, between 2 and 6 selected PCs are usually sufficient to catch the main diversity patterns, but this selection strongly depends on the conditions of acquisition, the type of sensor, and the complexity of the vegetation being observed.
Florian de Boissieu's avatar
Florian de Boissieu committed
139
140
Here is the code to perform PCA and select PCA bands:

141
```{r PCA}
Florian de Boissieu's avatar
Florian de Boissieu committed
142
print("PERFORM PCA ON RASTER")
jbferet's avatar
jbferet committed
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
PCA_Output        = perform_PCA(Input_Image_File, Input_Mask_File, Output_Dir,
                               FilterPCA = TRUE, nbCPU = nbCPU,MaxRAM = MaxRAM)
# path for the PCA raster
PCA_Files         = PCA_Output$PCA_Files
# number of pixels used for each partition used for k-means clustering
Pix_Per_Partition = PCA_Output$Pix_Per_Partition
# number of partitions used for k-means clustering
nb_partitions     = PCA_Output$nb_partitions
# path for the updated mask
Input_Mask_File   = PCA_Output$MaskPath
# parameters of the PCA model
PCA_model         = PCA_Output$PCA_model
# definition of spectral bands to be excluded from the analysis
SpectralFilter    = PCA_Output$SpectralFilter

Florian de Boissieu's avatar
Florian de Boissieu committed
158
print("Select PCA components for diversity estimations")
jbferet's avatar
jbferet committed
159
select_PCA_components(Input_Image_File, Output_Dir, PCA_Files, File_Open = TRUE)
Florian de Boissieu's avatar
Florian de Boissieu committed
160
161
```

162

Florian de Boissieu's avatar
Florian de Boissieu committed
163
164
## $\alpha$ and $\beta$ diversity maps

jbferet's avatar
jbferet committed
165
The first step towards  \alpha and \beta diversity mapping corresponds to the computation of a `SpectralSpecies` map, which identifies the cluster ('spectral species') assigned to each pixel in the image, after k-means clustering is performed.
166
167
168


```{r Spectral species map}
Florian de Boissieu's avatar
Florian de Boissieu committed
169
print("MAP SPECTRAL SPECIES")
jbferet's avatar
jbferet committed
170
171
map_spectral_species(Input_Image_File, Output_Dir, PCA_Files,PCA_model, SpectralFilter, Input_Mask_File,
                     Pix_Per_Partition, nb_partitions, nbCPU=nbCPU, MaxRAM=MaxRAM)
172
173
174
175
176
```

SpectralSpecies is then stored in a raster file located here:

`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/SpectralSpecies`
Florian de Boissieu's avatar
Florian de Boissieu committed
177

178
179
180
181
$\alpha$ and $\beta$ diversity maps, as well as validation, are based on this `SpectralSpecies` raster. 

The code to compute $\alpha$ and $\beta$ diversity maps from this file is as follows:
```{r alpha and beta diversity maps}
Florian de Boissieu's avatar
Florian de Boissieu committed
182
183
print("MAP ALPHA DIVERSITY")
# Index.Alpha   = c('Shannon','Simpson')
jbferet's avatar
jbferet committed
184
185
186
Index_Alpha   = c('Shannon')
map_alpha_div(Input_Image_File, Output_Dir, window_size,
              nbCPU=nbCPU, MaxRAM=MaxRAM, Index_Alpha = Index_Alpha)
Florian de Boissieu's avatar
Florian de Boissieu committed
187
188

print("MAP BETA DIVERSITY")
jbferet's avatar
jbferet committed
189
190
map_beta_div(Input_Image_File, Output_Dir, window_size, nb_partitions=nb_partitions,
             nbCPU=nbCPU, MaxRAM=MaxRAM)
Florian de Boissieu's avatar
Florian de Boissieu committed
191
192
```

193
194
195
196
197
198
199
200
201
202
203
204
$\alpha$ and $\beta$ diversity maps are then stored in raster files located here: 
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/ALPHA`
and here:
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/BETA`

Different rasters can be produced and users are invited to refer to the documentation for more options.

Here, processing our example leads to the following $\alpha$ and $\beta$ diversity maps

<img align="bottom" width="100%" height="100%" src="../man/figures/03_AlphaBeta.png">


205
# $\alpha$ and $\beta$ diversity indices from vector layer
206
The following code computes $\alpha$ and $\beta$ diversity from field plots and extracts the corresponding diversity indices from previouly computed `SpectralSpecies` raster in order to perform validation.
Florian de Boissieu's avatar
Florian de Boissieu committed
207

208
```{r alpha and beta diversity indices from vector layer}
Florian de Boissieu's avatar
Florian de Boissieu committed
209
210
211
212
213
214
215
# location of the spectral species raster needed for validation
TypePCA     = 'SPCA'
Dir.Raster  = file.path(Output.Dir,basename(Input.Image.File),TypePCA,'SpectralSpecies')
Name.Raster = 'SpectralSpecies'
Path.Raster = file.path(Dir.Raster,Name.Raster)

# location of the directory where shapefiles used for validation are saved
Florian de Boissieu's avatar
Florian de Boissieu committed
216
vect        = system.file('extdata', 'VECTOR', package = 'biodivMapR')
Florian de Boissieu's avatar
Florian de Boissieu committed
217
218
219
Shannon.All = list() # ??

# list vector data
jbferet's avatar
jbferet committed
220
Path.Vector         = list_shp(vect)
Florian de Boissieu's avatar
Florian de Boissieu committed
221
222
223
Name.Vector         = tools::file_path_sans_ext(basename(Path.Vector))

# get alpha and beta diversity indicators corresponding to shapefiles
224
Biodiv.Indicators           = diversity_from_plots(Raster = Path.Raster, Plots = Path.Vector,NbClusters = nbclusters)
Florian de Boissieu's avatar
Florian de Boissieu committed
225
226
227
228
229
230
231
# if no name
Biodiv.Indicators$Name.Plot = seq(1,length(Biodiv.Indicators$Shannon[[1]]),by = 1)
Shannon.RS                  = c(Biodiv.Indicators$Shannon)[[1]]
```

The tables are then written to tab-seperated files.

232
```{r Write validation}
Florian de Boissieu's avatar
Florian de Boissieu committed
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
# write RS indicators
####################################################
# write indicators for alpha diversity
Path.Results = file.path(Output.Dir, basename(Input.Image.File), TypePCA, 'VALIDATION')
dir.create(Path.Results, showWarnings = FALSE, recursive = TRUE)
ShannonIndexFile <- file.path(Path.Results, "ShannonIndex.tab")
write.table(Shannon.RS, file = ShannonIndexFile, sep = "\t", dec = ".", na = " ", 
            row.names = Biodiv.Indicators$Name.Plot, col.names= F, quote=FALSE)

Results =  data.frame(Name.Vector, Biodiv.Indicators$Richness, Biodiv.Indicators$Fisher,                                Biodiv.Indicators$Shannon, Biodiv.Indicators$Simpson)
names(Results)  = c("ID_Plot", "Species_Richness", "Fisher", "Shannon", "Simpson")
write.table(Results, file = paste(Path.Results,"AlphaDiversity.tab",sep=''), sep="\t", dec=".",               na=" ", row.names = F, col.names= T,quote=FALSE)

# write indicators for beta diversity
BC_mean = Biodiv.Indicators$BCdiss
colnames(BC_mean) = rownames(BC_mean) = Biodiv.Indicators$Name.Plot
write.table(BC_mean, file = paste(Path.Results,"BrayCurtis.csv",sep=''), sep="\t", dec=".", na=" ", row.names = F, col.names= T,quote=FALSE)

```
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317

These results can then be displayed according to the need for further analysis. Here, for the purpose of illustration, we provide a code in order to visualize the differences among field plots located in the image: we first perform a PCoA on the Bray Curtis dissimilarity matrix computed from the field plots:


```{r PCoA on Field Plots}
# apply ordination using PCoA (same as done for map_beta_div)
library(labdsv)
MatBCdist = as.dist(BC_mean, diag = FALSE, upper = FALSE)
BetaPCO   = pco(MatBCdist, k = 3)

```

The plots corresponding to forested areas with high, medium and low diversity, as well as low vegetation/degraded forest close tomain roads are distributed as follows: 

<img align="bottom" width="100%" height="100%" src="../man/figures/04_RGB_FieldLegend.png">

Here, we produce figures in order to locate the different types of vegetation in the PCoA space:

```{r plot PCoA & Shannon}
# very uglily assign vegetation type to polygons in shapefiles
nbSamples = c(6,4,7,7)
vg        = c('Forest high diversity', 'Forest low diversity', 'Forest medium diversity', 'low vegetation')
Type_Vegetation = c()
for (i in 1: length(nbSamples)){
  for (j in 1:nbSamples[i]){
    Type_Vegetation = c(Type_Vegetation,vg[i])
  }
}

# create data frame including alpha and beta diversity
library(ggplot2)
Results     =  data.frame('vgtype'=Type_Vegetation,'pco1'= BetaPCO$points[,1],'pco2'= BetaPCO$points[,2],'pco3' = BetaPCO$points[,3],'shannon'=Shannon.RS)

# plot field data in the PCoA space, with size corresponding to shannon index
ggplot(Results, aes(x=pco1, y=pco2, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA2.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)


ggplot(Results, aes(x=pco1, y=pco3, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)

ggplot(Results, aes(x=pco2, y=pco3, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA2_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)

```

The resulting figures are displayed here: 

<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA1_vs_PcoA2.png">
<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA1_vs_PcoA3.png">
<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA2_vs_PcoA3.png">