---
title: "Tutorial for biodivMapR"
author: "Jean-Baptiste Féret, Florian de Boissieu"
date: "`r Sys.Date()`"
output:
html_vignette:
number_sections: true
vignette: >
%\VignetteIndexEntry{Tutorial}
%\VignetteEngine{knitr::rmarkdown}
\usepackage[utf8]{inputenc}
---
```{r setup, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
eval=FALSE
)
```
This tutorial aims at describing the processing workflow and giving the associated code to compute $\alpha$ and $\beta$ diversity maps on an extraction of Sentinel-2 image taken over Cameroun forest. The workflow is composed of the following steps:
* define the processing parameters
* input / output files paths
* output spatial resolution
* computing options
* compute the $\alpha$ and $\beta$ diversity maps:
* compute PCA and select best components
* compute the diversity maps
* validate results comparing to field plots measurements
Below is the typical flow chart of the computation of diversity maps with __biodivMapR__ :
# Define processing parameters
## Input / Output files
The input images are expected to be in ENVI HDR format, BIL interleaved.
The functions `perform_radiometric_filtering` and `perform_PCA` start with a procedure checkin if the image format is as expected. if not, the functions will retrun a message explaining the problem and will stop the process.
If the format is not ENVI format BIL interleaved, the function `raster2BIL` allows conversion into proper format and returns updated `Input_Image_File`
Spectral bands should be defined if the image is multi or hyperspectral image.
A mask can also be set to work on a selected part of the input image. The mask is expected to be a raster in the same format as the image (ENVI HDR), with values 0 = masked or 1 = selected. Only one band is required. If no mask is to be used set `Input_Mask_File = FALSE`.
The output directory defined with `Output_Dir` will contain all the results. For each image processed, a subdirectory will be automatically created after its name.
```{r Input / Output files}
Input_Image_File = system.file('extdata', 'RASTER', 'S2A_T33NUD_20180104_Subset', package = 'biodivMapR')
# Input.Image.File = raster2BIL(Raster.Path = Input.Image.File,
# Sensor = 'SENTINEL_2A',
# Convert.Integer = TRUE,
# Output.Directory = '~/test')
Input_Mask_File = FALSE
Output_Dir = 'RESULTS'
```
The image provided with the package is a subset of tile T33NUD acquired by Sentinel-2A satellite over Cameroonese rainforest in January 4th, 2018.
## Spatial resolution
The algorithm estimates \alpha and \beta diversity within a window, that is also the output spatial resolution. It is defined in number of pixels of the input image with parameter `window_size`, e.g. `window_size = 10` meaning a window of 10x10 pixels. It will be the spatial resolution of the ouput rasters.
As a rule of thumb, spatial units between 0.25 and 4 ha usually match with ground data.
if `window_size` is too small, it will result in low number of pixels per spatial unit, hence limited range of variation of diversity in the image.
In this example, the spatial resolution of the input raster is 10m. Setting `window_size = 10` will result in diversity maps of spatial resolution 100m x 100m.
```{r Spatial resolution}
window_size = 10
```
## PCA filtering
If set to `TRUE`, a second filtering based on PCA outliers is processed.
```{r PCA filtering}
FilterPCA = FALSE
```
## Computing options
The use of computing ressources can be controled with the following parameters:
* `nbCPU` controls the parallelisation of the processing: how many CPUs will be asssigned for multithreading,
* `MaxRAM` controls the size in GB of the input image chunks processed by each thread (this does not correspond to the max amount of RAM allocated),
* `nbclusters` controls the number of clusters defined by k-means clustering for each repetition. Images showing moderate diversity (temperate forests) may need lower number of clusters (20), whereas highly diverse tropical forest require more (50). The larger the value the longer the computation time. Users are invited to experiment with different numbers of clusters in order to identify the snsitivity of divrsity patterns to this parameter.
```{r Computing options}
nbCPU = 2
MaxRAM = 0.5
nbclusters = 50
```
# Compute diversity maps
## Mask non vegetated / shaded / cloudy pixels
```{r Mask non vegetated / shaded / cloudy pixels}
NDVI_Thresh = 0.5
Blue_Thresh = 500
NIR_Thresh = 1500
print("PERFORM RADIOMETRIC FILTERING")
Input_Mask_File = perform_radiometric_filtering(Input_Image_File, Input_Mask_File, Output_Dir,
NDVI_Thresh = NDVI_Thresh, Blue_Thresh = Blue_Thresh,
NIR_Thresh = NIR_Thresh)
```
## PCA
A pixel-based PCA is run on the input image across the spectral bands to select the most interesting spectral information relative to spectral diversity and remove shaded pixels, spatial noise and sensor artefacts.
The resulting PCA raster are then stored in a binalry file in the results directory, which in our case corresponds to
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/PCA/OutputPCA_8_PCs`
This PCA raster file can be displayed using QGIS or any GIS / image processing software. Here, the PCs corresponding to our image look like this:
This PCA band selection left to user judgement, who then writes the band to be kept in a `.txt` file located in the same directory as the PCA raster file. The file is automatically created and ready to edit with function `select_PCA_components`. Each selected band should be identified per line in this file.
The main goal of PC selection is to discard PCs showing no relevant information corresponding to vegetation, or including artifacts possibly explained by sensor properties. It is somehow a subjective process, and we are currently working on automatic selection of these components.
For this example, PCA bands 1, 2, 5, 6 and 8 can be kept if writing the following lines in file `Selected_Components.txt` opened for edition (do not forget carriage return after last value):
```
1
2
5
6
8
```
PC#3 and PC#7 were discarded as the main patterns observed for these components did not correspond to vegetation patterns. As a rule of thumb, between 2 and 6 selected PCs are usually sufficient to catch the main diversity patterns, but this selection strongly depends on the conditions of acquisition, the type of sensor, and the complexity of the vegetation being observed.
Here is the code to perform PCA and select PCA bands:
```{r PCA}
print("PERFORM PCA ON RASTER")
PCA_Output = perform_PCA(Input_Image_File, Input_Mask_File, Output_Dir,
FilterPCA = TRUE, nbCPU = nbCPU,MaxRAM = MaxRAM)
# path for the PCA raster
PCA_Files = PCA_Output$PCA_Files
# number of pixels used for each partition used for k-means clustering
Pix_Per_Partition = PCA_Output$Pix_Per_Partition
# number of partitions used for k-means clustering
nb_partitions = PCA_Output$nb_partitions
# path for the updated mask
Input_Mask_File = PCA_Output$MaskPath
# parameters of the PCA model
PCA_model = PCA_Output$PCA_model
# definition of spectral bands to be excluded from the analysis
SpectralFilter = PCA_Output$SpectralFilter
print("Select PCA components for diversity estimations")
select_PCA_components(Input_Image_File, Output_Dir, PCA_Files, File_Open = TRUE)
```
## $\alpha$ and $\beta$ diversity maps
The first step towards \alpha and \beta diversity mapping corresponds to the computation of a `SpectralSpecies` map, which identifies the cluster ('spectral species') assigned to each pixel in the image, after k-means clustering is performed.
```{r Spectral species map}
print("MAP SPECTRAL SPECIES")
map_spectral_species(Input_Image_File, Output_Dir, PCA_Files, PCA_model, SpectralFilter, Input_Mask_File,
Pix_Per_Partition, nb_partitions, nbCPU=nbCPU, MaxRAM=MaxRAM,
nbclusters = nbclusters, TypePCA = TypePCA, CR = TRUE)
```
SpectralSpecies is then stored in a raster file located here:
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/SpectralSpecies`
$\alpha$ and $\beta$ diversity maps, as well as validation, are based on this `SpectralSpecies` raster.
The code to compute $\alpha$ and $\beta$ diversity maps from this file is as follows:
```{r alpha and beta diversity maps}
print("MAP ALPHA DIVERSITY")
# Index.Alpha = c('Shannon','Simpson')
Index_Alpha = c('Shannon')
map_alpha_div(Input_Image_File, Output_Dir, window_size, nbCPU=nbCPU,
MaxRAM=MaxRAM, Index_Alpha = Index_Alpha, nbclusters = nbclusters)
print("MAP BETA DIVERSITY")
map_beta_div(Input_Image_File, Output_Dir, window_size, nb_partitions=nb_partitions,
nbCPU=nbCPU, MaxRAM=MaxRAM, nbclusters = nbclusters)
```
$\alpha$ and $\beta$ diversity maps are then stored in raster files located here:
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/ALPHA`
and here:
`RESULTS/S2A_T33NUD_20180104_Subset/SPCA/BETA`
Different rasters can be produced and users are invited to refer to the documentation for more options.
Here, processing our example leads to the following $\alpha$ and $\beta$ diversity maps
# Compute diversity indices from a vector layer
The following code computes $\alpha$ and $\beta$ diversity from field plots and extracts the corresponding diversity indices from previouly computed `SpectralSpecies` raster in order to perform validation.
```{r alpha and beta diversity indices from vector layer}
# location of the spectral species raster needed for validation
TypePCA = 'SPCA'
Dir.Raster = file.path(Output.Dir,basename(Input.Image.File),TypePCA,'SpectralSpecies')
Name.Raster = 'SpectralSpecies'
Path.Raster = file.path(Dir.Raster,Name.Raster)
# location of the directory where shapefiles used for validation are saved
vect = system.file('extdata', 'VECTOR', package = 'biodivMapR')
Shannon.All = list() # ??
# list vector data
Path.Vector = list_shp(vect)
Name.Vector = tools::file_path_sans_ext(basename(Path.Vector))
# get alpha and beta diversity indicators corresponding to shapefiles
Biodiv.Indicators = diversity_from_plots(Raster = Path.Raster, Plots = Path.Vector,NbClusters = nbclusters)
# if no name
Biodiv.Indicators$Name.Plot = seq(1,length(Biodiv.Indicators$Shannon[[1]]),by = 1)
Shannon.RS = c(Biodiv.Indicators$Shannon)[[1]]
```
The tables are then written to tab-seperated files.
```{r Write validation}
# write RS indicators
####################################################
# write indicators for alpha diversity
Path.Results = file.path(Output.Dir, basename(Input.Image.File), TypePCA, 'VALIDATION')
dir.create(Path.Results, showWarnings = FALSE, recursive = TRUE)
ShannonIndexFile <- file.path(Path.Results, "ShannonIndex.tab")
write.table(Shannon.RS, file = ShannonIndexFile, sep = "\t", dec = ".", na = " ",
row.names = Biodiv.Indicators$Name.Plot, col.names= F, quote=FALSE)
Results = data.frame(Name.Vector, Biodiv.Indicators$Richness, Biodiv.Indicators$Fisher, Biodiv.Indicators$Shannon, Biodiv.Indicators$Simpson)
names(Results) = c("ID_Plot", "Species_Richness", "Fisher", "Shannon", "Simpson")
write.table(Results, file = paste(Path.Results,"AlphaDiversity.tab",sep=''), sep="\t", dec=".", na=" ", row.names = F, col.names= T,quote=FALSE)
# write indicators for beta diversity
BC_mean = Biodiv.Indicators$BCdiss
colnames(BC_mean) = rownames(BC_mean) = Biodiv.Indicators$Name.Plot
write.table(BC_mean, file = paste(Path.Results,"BrayCurtis.csv",sep=''), sep="\t", dec=".", na=" ", row.names = F, col.names= T,quote=FALSE)
```
These results can then be displayed according to the need for further analysis. Here, for the purpose of illustration, we provide a code in order to visualize the differences among field plots located in the image: we first perform a PCoA on the Bray Curtis dissimilarity matrix computed from the field plots:
```{r PCoA on Field Plots}
# apply ordination using PCoA (same as done for map_beta_div)
library(labdsv)
MatBCdist = as.dist(BC_mean, diag = FALSE, upper = FALSE)
BetaPCO = pco(MatBCdist, k = 3)
```
The plots corresponding to forested areas with high, medium and low diversity, as well as low vegetation/degraded forest close tomain roads are distributed as follows:
Here, we produce figures in order to locate the different types of vegetation in the PCoA space:
```{r plot PCoA & Shannon}
# very uglily assign vegetation type to polygons in shapefiles
nbSamples = c(6,4,7,7)
vg = c('Forest high diversity', 'Forest low diversity', 'Forest medium diversity', 'low vegetation')
Type_Vegetation = c()
for (i in 1: length(nbSamples)){
for (j in 1:nbSamples[i]){
Type_Vegetation = c(Type_Vegetation,vg[i])
}
}
# create data frame including alpha and beta diversity
library(ggplot2)
Results = data.frame('vgtype'=Type_Vegetation,'pco1'= BetaPCO$points[,1],'pco2'= BetaPCO$points[,2],'pco3' = BetaPCO$points[,3],'shannon'=Shannon.RS)
# plot field data in the PCoA space, with size corresponding to shannon index
ggplot(Results, aes(x=pco1, y=pco2, color=vgtype,size=shannon)) +
geom_point(alpha=0.6) +
scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA2.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
scale = 1, width = 6, height = 4, units = "in",
dpi = 600, limitsize = TRUE)
ggplot(Results, aes(x=pco1, y=pco3, color=vgtype,size=shannon)) +
geom_point(alpha=0.6) +
scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA1_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
scale = 1, width = 6, height = 4, units = "in",
dpi = 600, limitsize = TRUE)
ggplot(Results, aes(x=pco2, y=pco3, color=vgtype,size=shannon)) +
geom_point(alpha=0.6) +
scale_color_manual(values=c("#e6140a", "#e6d214", "#e68214", "#145ae6"))
filename = file.path(Path.Results,'BetaDiversity_PcoA2_vs_PcoA3.png')
ggsave(filename, plot = last_plot(), device = 'png', path = NULL,
scale = 1, width = 6, height = 4, units = "in",
dpi = 600, limitsize = TRUE)
```
The resulting figures are displayed here: