tutorial.Rmd 13.8 KB
Newer Older
Florian de Boissieu's avatar
Florian de Boissieu committed
1
---
Florian de Boissieu's avatar
Florian de Boissieu committed
2
title: "Tutorial for biodivMapR"
Florian de Boissieu's avatar
Florian de Boissieu committed
3
4
5
6
7
8
9
10
11
12
13
author: "Jean-Baptiste Féret, Florian de Boissieu"
date: "`r Sys.Date()`"
output: 
  html_vignette:
    number_sections: true
vignette: >
  %\VignetteIndexEntry{Tutorial}
  %\VignetteEngine{knitr::rmarkdown}
  \usepackage[utf8]{inputenc}
---

14
15
16
17
18
19
20
21
22
```{r setup, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  eval=FALSE
)
```


Florian de Boissieu's avatar
Florian de Boissieu committed
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
This tutorial aims at describing the processing workflow and giving the associated code to compute $\alpha$ and $\beta$ diversity maps on an extraction of Sentinel-2 image taken over Cameroun forest. The workflow is composed of the following steps:

* define the processing parameters

    * input / output files paths
    * output spatial resolution
    * computing options
    
* compute the $\alpha$ and $\beta$ diversity maps:
    * compute PCA and select best components
    * compute the diversity maps
    
* validate results comparing to field plots measurements



# Processing parameters

## Input / Output files
jbferet's avatar
jbferet committed
42
43
The input images are expected to be in ENVI HDR format, BIL interleaved. To check if the flormat is good use fucntion `check_data`.
If not they should be converted with function `raster2BIL`.
Florian de Boissieu's avatar
Florian de Boissieu committed
44
45
46

A mask can also be set to work on a selected part of the input image. The mask is expected to be a raster in the same format as the image (ENVI HDR), with values 0 = masked or 1 = selected. If no mask is to be used set `Input.Mask.File = FALSE`.

47
The output directory defined with `Output.Dir` will contain all the results. For each image processed, a subdirectory will be automatically created after its name.
Florian de Boissieu's avatar
Florian de Boissieu committed
48

49
50

```{r Input / Output files}
Florian de Boissieu's avatar
Florian de Boissieu committed
51
Input.Image.File  = system.file('extdata', 'RASTER', 'S2A_T33NUD_20180104_Subset', package = 'biodivMapR')
52
check_data(Input.Image.File)
Florian de Boissieu's avatar
Florian de Boissieu committed
53

54
Input.Image.File  = raster2BIL(Raster.Path = Input.Image.File,
Florian de Boissieu's avatar
Florian de Boissieu committed
55
56
57
58
59
60
61
62
                                       Sensor = 'SENTINEL_2A',
                                       Convert.Integer = TRUE,
                                       Output.Directory = '~/test')
Input.Mask.File   = FALSE

Output.Dir        = 'RESULTS'
```

63
64
The image provided with the package is a subset of tile T33NUD acquired by Sentinel-2A satellite over Cameroonese rainforest in January 4th, 2018. 
<img align="bottom" width="100%" height="100%" src="../man/figures/01_RGB_S2A_T33NUD_20180104_Subset.png">
65

Florian de Boissieu's avatar
Florian de Boissieu committed
66
## Spatial resolution
jbferet's avatar
jbferet committed
67
The algorithm estimates \alpha and \beta diversity within a window, that is also the output spatial resolution. It is defined in number of pixel s of the input image with parameter `window_size`, e.g. `window_size = 10` meaning a window of 10x10 pixels. It will be the spatial resolution of the ouput rasters.
Florian de Boissieu's avatar
Florian de Boissieu committed
68
69

As a rule of thumb, spatial units between 0.25 and 4 ha usually match with ground data.
jbferet's avatar
jbferet committed
70
A window_size too small results in low number of pixels per spatial unit, hence limited range of variation of diversity in the image.
Florian de Boissieu's avatar
Florian de Boissieu committed
71

jbferet's avatar
jbferet committed
72
In this example, the spatial resolution of the input raster. Setting `window_size = 10` will result in diversity maps of spatial resolution 100x100m.
Florian de Boissieu's avatar
Florian de Boissieu committed
73

74
```{r Spatial resolution}
jbferet's avatar
jbferet committed
75
window_size = 10
Florian de Boissieu's avatar
Florian de Boissieu committed
76
77
78
79
80
```

## PCA filtering
If set to `TRUE`, a second filtering based on PCA outliers is processed.

81
```{r PCA filtering}
82
FilterPCA = FALSE
Florian de Boissieu's avatar
Florian de Boissieu committed
83
84
85
86
87
```

## Computing options
The use of computing ressources can be controled with the following parameters: 

jbferet's avatar
jbferet committed
88
89
90
* `nbCPU` controls the parallelisation of the processing: how many CPUs will be asssigned for multithreading, 
* `MaxRAM` controls the size in GB of the input image chunks processed by each thread (this does not correspond to the max amount of RAM allocated),
* `nbclusters` controls the number of clusters defined by k-means clustering for each repetition. Images showing moderate diversity (temperate forests) may need lower number of clusters (20), whereas highly diverse tropical forest require more (50). The larger the value the longer the computation time. 
Florian de Boissieu's avatar
Florian de Boissieu committed
91

92
```{r Computing options}
Florian de Boissieu's avatar
Florian de Boissieu committed
93
94
95
96
97
98
99
100
nbCPU         = 4
MaxRAM        = 0.5
nbclusters    = 50
```

# Main processing worflow
## Mask non vegetated / shaded / cloudy pixels

101
```{r Mask non vegetated / shaded / cloudy pixels}
Florian de Boissieu's avatar
Florian de Boissieu committed
102
103
104
105
NDVI.Thresh = 0.5
Blue.Thresh = 500
NIR.Thresh  = 1500
print("PERFORM RADIOMETRIC FILTERING")
jbferet's avatar
jbferet committed
106
ImPathShade = perform_radiometric_filtering(Input.Image.File, Input.Mask.File, Output.Dir,
Florian de Boissieu's avatar
Florian de Boissieu committed
107
108
109
110
111
                                            NDVI.Thresh = NDVI.Thresh, Blue.Thresh = Blue.Thresh,
                                            NIR.Thresh = NIR.Thresh)
```

## PCA
112
A pixel-based PCA is run on the input image across the spectral bands to select the most interesting spectral information relative to spectral diversity and remove shaded pixels, spatial noise and sensor artefacts. 
Florian de Boissieu's avatar
Florian de Boissieu committed
113

114
115
116
117
118
119
120
121
122
123
124
The resulting PCA raster are then stored in a binalry file in the results directory, which in our case corresponds to
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/PCA/OutputPCA_8_PCs`

This PCA raster file can be displayed using QGIS or any GIS / image processing software. Here, the PCs corresponding to our image look like this: 

<img align="bottom" width="100%" height="100%" src="../man/figures/02_PCs_comp.png">

This PCA band selection left to user judgement, who then writes the band to be kept in a `.txt` file located in the same directory as the PCA raster file. The file is automatically created and ready to edit with function `select_PCA_components`. Each selected band should be identified per line in this file. 
The main goal of PC selection is to discard PCs showing no relevant information corresponding to vegetation, or including artifacts possibly explained by sensor properties. It is somehow a subjective process, and we are currently working on automatic selection of these components.

For this example, PCA bands 1, 2, 4, 5, 6 and 8 can be kept if writing the following lines in file `selected_components.txt` opened for edition (do not forget carriage return after last value):
Florian de Boissieu's avatar
Florian de Boissieu committed
125
126
127
```
1
2
128
4
Florian de Boissieu's avatar
Florian de Boissieu committed
129
5
130
131
6
8
jbferet's avatar
jbferet committed
132

Florian de Boissieu's avatar
Florian de Boissieu committed
133
```
134
PC#3 and PC#7 were discarded as the main patterns observed for these components did not correspond to vegetation patterns. As a rule of thumb, between 2 and 6 selected PCs are usually sufficient to catch the main diversity patterns, but this selection strongly depends on the conditions of acquisition, the type of sensor, and the complexity of the vegetation being observed.
Florian de Boissieu's avatar
Florian de Boissieu committed
135
136
Here is the code to perform PCA and select PCA bands:

137
```{r PCA}
Florian de Boissieu's avatar
Florian de Boissieu committed
138
print("PERFORM PCA ON RASTER")
jbferet's avatar
jbferet committed
139
PCA.Files  = perform_PCA(Input.Image.File, ImPathShade, Output.Dir,
Florian de Boissieu's avatar
Florian de Boissieu committed
140
141
                               FilterPCA = TRUE, nbCPU = nbCPU, MaxRAM = MaxRAM)
print("Select PCA components for diversity estimations")
jbferet's avatar
jbferet committed
142
select_PCA_components(Input.Image.File, Output.Dir, PCA.Files, File.Open = TRUE)
Florian de Boissieu's avatar
Florian de Boissieu committed
143
144
```

145

Florian de Boissieu's avatar
Florian de Boissieu committed
146
147
## $\alpha$ and $\beta$ diversity maps

148
149
150
151
The first step towards  \alpha and \beta diversity mapping corresponds to the computation of a `SpectralSpecies` map, which identifies the centroid a the cluster ('spectral species') assigned to each pixel in the image, after k-means clustering is performed


```{r Spectral species map}
Florian de Boissieu's avatar
Florian de Boissieu committed
152
print("MAP SPECTRAL SPECIES")
jbferet's avatar
jbferet committed
153
map_spectral_species(Input.Image.File, Output.Dir, PCA.Files,
Florian de Boissieu's avatar
Florian de Boissieu committed
154
                     nbCPU = nbCPU, MaxRAM = MaxRAM)
155
156
157
158
159
```

SpectralSpecies is then stored in a raster file located here:

`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/SpectralSpecies`
Florian de Boissieu's avatar
Florian de Boissieu committed
160

161
162
163
164
$\alpha$ and $\beta$ diversity maps, as well as validation, are based on this `SpectralSpecies` raster. 

The code to compute $\alpha$ and $\beta$ diversity maps from this file is as follows:
```{r alpha and beta diversity maps}
Florian de Boissieu's avatar
Florian de Boissieu committed
165
166
167
print("MAP ALPHA DIVERSITY")
# Index.Alpha   = c('Shannon','Simpson')
Index.Alpha   = c('Shannon')
jbferet's avatar
jbferet committed
168
map_alpha_div(Input.Image.File, Output.Dir, window_size,
Florian de Boissieu's avatar
Florian de Boissieu committed
169
170
171
                    nbCPU = nbCPU, MaxRAM = MaxRAM, Index.Alpha = Index.Alpha)

print("MAP BETA DIVERSITY")
jbferet's avatar
jbferet committed
172
map_beta_div(Input.Image.File, Output.Dir, window_size,
Florian de Boissieu's avatar
Florian de Boissieu committed
173
174
175
                   nbCPU = nbCPU, MaxRAM = MaxRAM)
```

176
177
178
179
180
181
182
183
184
185
186
187
$\alpha$ and $\beta$ diversity maps are then stored in raster files located here: 
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/ALPHA`
and here:
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/BETA`

Different rasters can be produced and users are invited to refer to the documentation for more options.

Here, processing our example leads to the following $\alpha$ and $\beta$ diversity maps

<img align="bottom" width="100%" height="100%" src="../man/figures/03_AlphaBeta.png">


188
# $\alpha$ and $\beta$ diversity indices from vector layer
189
The following code computes $\alpha$ and $\beta$ diversity from field plots and extracts the corresponding diversity indices from previouly computed `SpectralSpecies` raster in order to perform validation.
Florian de Boissieu's avatar
Florian de Boissieu committed
190

191
```{r alpha and beta diversity indices from vector layer}
Florian de Boissieu's avatar
Florian de Boissieu committed
192
193
194
195
196
197
198
# location of the spectral species raster needed for validation
TypePCA     = 'SPCA'
Dir.Raster  = file.path(Output.Dir,basename(Input.Image.File),TypePCA,'SpectralSpecies')
Name.Raster = 'SpectralSpecies'
Path.Raster = file.path(Dir.Raster,Name.Raster)

# location of the directory where shapefiles used for validation are saved
Florian de Boissieu's avatar
Florian de Boissieu committed
199
vect        = system.file('extdata', 'VECTOR', package = 'biodivMapR')
Florian de Boissieu's avatar
Florian de Boissieu committed
200
201
202
Shannon.All = list() # ??

# list vector data
203
Path.Vector         = list.shp(vect)
Florian de Boissieu's avatar
Florian de Boissieu committed
204
205
206
207
Name.Vector         = tools::file_path_sans_ext(basename(Path.Vector))

# read raster data including projection
RasterStack         = stack(Path.Raster)
208
Projection.Raster   = get_projection(Path.Raster,'raster')
Florian de Boissieu's avatar
Florian de Boissieu committed
209
210

# get alpha and beta diversity indicators corresponding to shapefiles
211
Biodiv.Indicators           = diversity_from_plots(Raster = Path.Raster, Plots = Path.Vector,NbClusters = nbclusters)
Florian de Boissieu's avatar
Florian de Boissieu committed
212
213
214
215
216
217
218
# if no name
Biodiv.Indicators$Name.Plot = seq(1,length(Biodiv.Indicators$Shannon[[1]]),by = 1)
Shannon.RS                  = c(Biodiv.Indicators$Shannon)[[1]]
```

The tables are then written to tab-seperated files.

219
```{r Write validation}
Florian de Boissieu's avatar
Florian de Boissieu committed
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
# write RS indicators
####################################################
# write indicators for alpha diversity
Path.Results = file.path(Output.Dir, basename(Input.Image.File), TypePCA, 'VALIDATION')
dir.create(Path.Results, showWarnings = FALSE, recursive = TRUE)
ShannonIndexFile <- file.path(Path.Results, "ShannonIndex.tab")
write.table(Shannon.RS, file = ShannonIndexFile, sep = "\t", dec = ".", na = " ", 
            row.names = Biodiv.Indicators$Name.Plot, col.names= F, quote=FALSE)

Results =  data.frame(Name.Vector, Biodiv.Indicators$Richness, Biodiv.Indicators$Fisher,                                Biodiv.Indicators$Shannon, Biodiv.Indicators$Simpson)
names(Results)  = c("ID_Plot", "Species_Richness", "Fisher", "Shannon", "Simpson")
write.table(Results, file = paste(Path.Results,"AlphaDiversity.tab",sep=''), sep="\t", dec=".",               na=" ", row.names = F, col.names= T,quote=FALSE)

# write indicators for beta diversity
BC_mean = Biodiv.Indicators$BCdiss
colnames(BC_mean) = rownames(BC_mean) = Biodiv.Indicators$Name.Plot
write.table(BC_mean, file = paste(Path.Results,"BrayCurtis.csv",sep=''), sep="\t", dec=".", na=" ", row.names = F, col.names= T,quote=FALSE)

```
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306

These results can then be displayed according to the need for further analysis. Here, for the purpose of illustration, we provide a code in order to visualize the differences among field plots located in the image: we first perform a PCoA on the Bray Curtis dissimilarity matrix computed from the field plots:


```{r PCoA on Field Plots}
# apply ordination using PCoA (same as done for map_beta_div)
library(labdsv)
MatBCdist = as.dist(BC_mean, diag = FALSE, upper = FALSE)
BetaPCO   = pco(MatBCdist, k = 3)

```

The plots corresponding to forested areas with high, medium and low diversity, as well as low vegetation/degraded forest close tomain roads are distributed as follows: 

<img align="bottom" width="100%" height="100%" src="../man/figures/04_RGB_FieldLegend.png">

Here, we produce figures in order to locate the different types of vegetation in the PCoA space:

```{r plot PCoA & Shannon}
# very uglily assign vegetation type to polygons in shapefiles
nbSamples = c(6,4,7,7)
vg        = c('Forest high diversity', 'Forest low diversity', 'Forest medium diversity', 'low vegetation')
Type_Vegetation = c()
for (i in 1: length(nbSamples)){
  for (j in 1:nbSamples[i]){
    Type_Vegetation = c(Type_Vegetation,vg[i])
  }
}

# create data frame including alpha and beta diversity
library(ggplot2)
Results     =  data.frame('vgtype'=Type_Vegetation,'pco1'= BetaPCO$points[,1],'pco2'= BetaPCO$points[,2],'pco3' = BetaPCO$points[,3],'shannon'=Shannon.RS)

# plot field data in the PCoA space, with size corresponding to shannon index
ggplot(Results, aes(x=pco1, y=pco2, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA2.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)


ggplot(Results, aes(x=pco1, y=pco3, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)

ggplot(Results, aes(x=pco2, y=pco3, color=vgtype,size=shannon)) +
  geom_point(alpha=0.6) +
  scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA2_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
       scale = 1, width = NA, height = NA, units = c("in", "cm", "mm"),
       dpi = 600, limitsize = TRUE)

```

The resulting figures are displayed here: 

<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA1_vs_PcoA2.png">
<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA1_vs_PcoA3.png">
<img align="bottom" width="100%" height="100%" src="../man/figures/BetaDiversity_PcoA2_vs_PcoA3.png">