Quentin Yeche · Quentin Yeche · Quentin Yeche · Quentin Yeche · Quentin Yeche · 62c8f9c9
Expand all Hide whitespace changes
Inline Side-by-side

Showing

with 211 additions and 131 deletions
+211 -131
--- a/04-Zonal-stats-spectral-signatures.ipynb
+++ b/04-Zonal-stats-spectral-signatures.ipynb
 %% Cell type:markdown id:ba202a3a-c9bf-4df5-aa1e-688edc575137 tags:
  
 # IV. Zonal statistics, spectral signatures and vegetation index
  
 %% Cell type:markdown id:9e2184c7-c94a-4eec-906b-f36281433717 tags:
  
 ---
 **Author(s):** Kenji Ose, Quentin Yeche, Dino Ienco - [UMR TETIS](https://umr-tetis.fr) / [INRAE](https://www.inrae.fr/)
  
 ---
  
 %% Cell type:markdown id:2bd8ed9c-f1af-40d7-9be4-516846a2bca1 tags:
  
 ## 1. Introduction
  
 Here are presented two solutions for computing zonal statistics on Sentinel-2 image with polygon vector file. We will try two librairies : `xrspatial` and `rasterstats`.
  
 %% Cell type:markdown id:4f4d36e3 tags:
  
 ### 1.1. Outline
  
 In [section 4](#4-loading-the-polygon-vector-file) we will be loading vector data from a file and converting it to a raster so that we can use it to delimit zones
  
 In [section 5](#5-computing-of-zonal-statistics) we will be computing zonal statistics using two different methods, and getting the results back in the form of an xarray Dataset.
  
 In [section 6](#6-spectral-signatures) we will plot spectral signatures.
  
 Finally [section 7](#7-band-math-and-vegetation-index) will cover calculating and display indexes such as the NDVI.
  
 %% Cell type:markdown id:0bb2c59d-1d5e-4d01-ab56-36e2736987b1 tags:
  
 ## 2. Library imports
  
 As usual, we import all the required Python libraries. The two new ones in this notebook are `xrspatial` and `rasterstats` which are two libraries for computing zonal statistics.
  
 %% Cell type:code id:493ef787 tags:
  
 ``` python
 # this cell is only useful if you're using an environment like Google Collab or
 # Microsoft Planetary Computer's servers
 def test_import_or_install(imports):
  import importlib
  restart_needed = False
  for import_name, pip_name in imports:
    try:
      importlib.import_module(import_name, package=None)
    except ModuleNotFoundError:
      if not restart_needed:
          restart_needed= True
          print('\033[91m' + ("ONE OR MORE LIBRARIES HAVE TO BE INSTALLED, "
          "PLEASE RESTART THE NOTEBOOK AFTER THIS CELL FINISHES EXECUTING "
          "TO ENSURE PROPER FUNCTIONALITY") + "\x1b[0m")
      %pip install {pip_name}
  if restart_needed:
    print('\033[91m' + ("LIBRARIES HAVE BEEN INSTALLED. "
          "PLEASE RESTART THE NOTEBOOK NOW ") + "\x1b[0m")
  
 imports = [('pystac_client', 'pystac-client'),
           ('planetary_computer', 'planetary-computer'),
           ('stackstac', 'stackstac'),
           ('xrspatial', 'xarray-spatial'),
           ('rasterstats', 'rasterstats')
           ]
  
 test_import_or_install(imports)
 ```
  
 %% Cell type:code id:8bfabb34-d174-45e0-aa3d-34f5ae585819 tags:
  
 ``` python
 # STAC access
 import pystac_client
 import planetary_computer
  
 # (geo)dataframes
 import pandas as pd
 import geopandas as gpd
  
 # xarrays
-    import xarray as xr
+import xarray as xr
  
 from rasterio import features
  
 # library for turning STAC objects into xarrays
 import stackstac
  
 # visualization
 from matplotlib import pyplot as plt
  
 # libraries for zonal stats
 import xrspatial
 import rasterstats
  
 # miscellanous
 import numpy as np
 from IPython.display import display
 ```
  
 %% Cell type:markdown id:f6558a2f-e365-4591-bc1a-5dfbb448fc44 tags:
  
 ## 3. Creating a `DataArray` from STAC object
  
 ### 3.1. Getting a Sentinel-2 STAC Item
  
 As a practical use case let's consider that we have identified the STAC Item we're interested in (see [this notebook](Joensuu_01-STAC.ipynb) for a refresher), and we also have an area of interest defined as a bounding box.
  
 %% Cell type:code id:1dc9ac4a-eab8-4d38-89a6-27fe00d2a841 tags:
  
 ``` python
 # Access to Planetary Computer API
 root_catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
 )
  
 item_id = 'S2A_MSIL2A_20201213T104441_R008_T31TEJ_20201214T083443'
 stac_item = root_catalog.get_collection("sentinel-2-l2a").get_item(item_id)
 ```
  
 %% Cell type:markdown id:a6b88951-0ebb-44a6-bae2-dee1f4e3c3bf tags:
  
 ### 3.2. Loading the Sentinel-2 image
  
 We stack the item's assets, which are the spectral bands, into a `DataArray` with a resampling at a spatial resolution of 10m.
  
 We also collect metadata that will be useful for the following processing steps.
  
 %% Cell type:code id:6aad6b84-4a5c-4f74-aa0d-6c5db13a143b tags:
  
 ``` python
 # bounding box expressed in Lat/Lon
 aoi_bounds = (3.875107329166124, 43.48641456618909, 4.118824575734205, 43.71739887308995)
  
 # bands of interest
 bands_of_interest = ['B02','B03','B04','B05','B06','B07','B08','B11','B12']
  
 FILL_VALUE = 2**16-1
  
 ds = stackstac.stack(
                stac_item,
                assets = bands_of_interest,
                resolution=10,
                dtype="uint16",
                fill_value=FILL_VALUE,
                bounds_latlon=aoi_bounds,
                    ).squeeze()
  
 s2_ref_crs = ds.crs
 s2_ref_trf = ds.transform
 s2_ref_shape = (ds['y'].size, ds['x'].size)
  
 print(f'- S2 CRS: {s2_ref_crs}')
 print(f'- S2 affine transform: \n{s2_ref_trf}')
 print(f'- S2 YX dimension: {s2_ref_shape}')
 ```
  
 %% Output
  
    - S2 CRS: epsg:32631
    - S2 affine transform:
    | 10.00, 0.00, 570490.00|
    | 0.00,-10.00, 4841100.00|
    | 0.00, 0.00, 1.00|
    - S2 YX dimension: (2590, 1999)
  
    /home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/stackstac/prepare.py:364: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
      times = pd.to_datetime(
  
 %% Cell type:markdown id:694954ca-b4c3-4a6f-891e-1fc16c9dca44 tags:
  
 ## 4. Loading the polygon vector file
  
 ### 4.1. Conversion into data array
  
 The vector file, named `sample.geojson`, contains vector data defining 13 rectangular zones which will be our area of interest. The file has an attribute table with the following information:
 - **fid**: unique ID [integer]
 - **geometry**: coordinates of entity's polygon <[list]
 - **landcover**: label [string]
  
 %% Cell type:code id:145557a7 tags:
  
 ``` python
 zones_df = gpd.read_file('sample.geojson')
 print(f"GeoDataFrame uses {zones_df.crs}")
 zones_df
 ```
  
 %% Output
  
    GeoDataFrame uses EPSG:4326
  
        fid   landcover                                           geometry
    0     1     urban01  POLYGON ((3.91165 43.57305, 3.91165 43.57490, ...
    1     2     urban02  POLYGON ((3.93971 43.57152, 3.93971 43.57383, ...
    2     3     urban03  POLYGON ((3.87511 43.60920, 3.87511 43.61179, ...
    3     4  baresoil01  POLYGON ((3.93528 43.62391, 3.93528 43.62714, ...
    4     5  baresoil02  POLYGON ((3.98472 43.64148, 3.98472 43.64420, ...
    5     6      agri01  POLYGON ((4.04561 43.62706, 4.04561 43.62959, ...
    6     7      agri02  POLYGON ((4.10558 43.65974, 4.10558 43.66209, ...
    7     8      agri03  POLYGON ((4.11217 43.59764, 4.11217 43.60231, ...
    8     9     water01  POLYGON ((4.01531 43.56791, 4.01531 43.58062, ...
    9    10     water02  POLYGON ((3.98148 43.48641, 3.98148 43.52013, ...
    10   11    forest01  POLYGON ((3.99250 43.69722, 3.99250 43.69882, ...
    11   12    forest02  POLYGON ((4.01740 43.70491, 4.01740 43.70872, ...
    12   13    forest03  POLYGON ((4.00686 43.71606, 4.00686 43.71740, ...
  
 %% Cell type:markdown id:c3eaa66d tags:
  
 The GeoDataFrame uses the CRS EPSG:4326. However we previously saw that our Sentinel-2 data has EPSG:32631 for its CRS. We need to reproject one of them to match the other's CRS. It is almost always recommended to reproject vector data rather than raster data. Since vector data is essentially continuous reprojecting is seamless. On the other hand raster data is discrete (pixels) so a reprojection typically entails interpolating new pixel values which is undesirable.
  
 Conveniently `geopandas` includes a `to_crs` method to reproject a GeoDataFrame to a new CRS.
  
 %% Cell type:code id:a418cab0-4f65-489d-905b-06d19c7ec451 tags:
  
 ``` python
 zones_df = zones_df.to_crs(s2_ref_crs)
 zones_df.crs
 ```
  
 %% Output
  
    <Projected CRS: EPSG:32631>
    Name: WGS 84 / UTM zone 31N
    Axis Info [cartesian]:
    - E[east]: Easting (metre)
    - N[north]: Northing (metre)
    Area of Use:
    - name: Between 0°E and 6°E, northern hemisphere between equator and 84°N, onshore and offshore. Algeria. Andorra. Belgium. Benin. Burkina Faso. Denmark - North Sea. France. Germany - North Sea. Ghana. Luxembourg. Mali. Netherlands. Niger. Nigeria. Norway. Spain. Togo. United Kingdom (UK) - North Sea.
    - bounds: (0.0, 0.0, 6.0, 84.0)
    Coordinate Operation:
    - name: UTM zone 31N
    - method: Transverse Mercator
    Datum: World Geodetic System 1984 ensemble
    - Ellipsoid: WGS 84
    - Prime Meridian: Greenwich
  
 %% Cell type:markdown id:1e5bd96b tags:
  
 In order to compute zonal statistics, first we have to convert the vector zone data into a labeled raster with `rasterio.features`. Labels must be of integer type.
  
 %% Cell type:code id:13043345 tags:
  
 ``` python
 geometry = zones_df[['geometry', 'fid']].values.tolist()
  
 # creating a raster from the vector data, which an overall shape identical to ds
 # and using the same transform as ds
 zones_mask = features.rasterize(geometry, out_shape=s2_ref_shape, fill=0, transform=s2_ref_trf)
 # converting the numpy array in a DataArray so the types are the same
 zones_mask= xr.DataArray(zones_mask, dims=ds.isel(band=0).dims)
 zones_mask
 ```
  
 %% Output
  
    <xarray.DataArray (y: 2590, x: 1999)>
    array([[0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           ...,
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
    Dimensions without coordinates: y, x
  
 %% Cell type:code id:d2504d24 tags:
  
 ``` python
 ds.sel(band=['B04', 'B03', 'B02']).plot.imshow(robust=True)
 plt.gca().set_aspect('equal')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:6b18ec15-7683-473a-b61c-e7adfd6e054e tags:
  
 ### 4.2. Displaying the labeled image
  
 %% Cell type:code id:f8a06a63-e809-4707-9d4e-25328d79d009 tags:
  
 ``` python
 zones_mask.plot.imshow(add_colorbar=False)
 plt.gca().set_aspect('equal')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:3d220867-33ea-49c6-902e-6e6d8a823a1a tags:
  
 ## 5. Computing of zonal statistics
  
 ### 5.1. Solution 1: with `xrspatial`
  
 Here, we compute statistics based on each Sentinel-2 bands and merge the results into the vector file attribute table.
  
 We will use the `zonal_stats` function of `xrspatial`. However this function only handles one band at a time. So we create a function, named `s2_zonal_stats`, that calls `xrspatial.zonal_stats` once for each band and aggregates the results into an xarray Dataset.
  
 %% Cell type:code id:6cc24380 tags:
  
 ``` python
 def s2_zonal_stats(ds, zones_mask, zones_df, pre_compute = True):
    # compute is needed at some point because
    # both ds and patch_mask must be numpy backed DataArrays.
    # It is faster to pre-compute all bands at once
    # but requires more memory
    if pre_compute:
        ds = ds.compute()
    final = zones_df.copy()
    band_xarrays = []
    for band_name, single_band in ds.groupby('band'):
        # will do nothing if pre_compute = True
        # otherwise will turn single_band into a numpy backed DataArray
        single_band = single_band.compute()
        sign_spectral = xrspatial.zonal_stats(zones_mask,
                                 single_band,
                                 stats_funcs = ['count','min','mean','max'], nodata_values = 0)
  
        # merging the zonal stats with the zone id fid
        # using how='right' because we want to discard any zone which isn'try
        # in zones_df (in this case it's the rest of the images which zonal_stats considers
        # its own zone)
        sign_spectral = sign_spectral.merge(zones_df['fid'],
                                            how='right',
                                            left_on='zone',
                                            right_on='fid')
  
        sign_spectral = sign_spectral.drop('zone', axis=1).set_index('fid', drop=True)
        band_xarrays.append(sign_spectral.to_xarray())
  
  
    final = xr.concat(band_xarrays, 'band')
    final = final.assign_coords({'band': ('band', ds.band.data),
                                 'landcover':('fid', zones_df['landcover'])
                                 })
  
  
    return final
 ```
  
 %% Cell type:code id:fe128acd tags:
  
 ``` python
 stats = s2_zonal_stats(ds, zones_mask, zones_df)
 stats
 ```
  
 %% Output
  
    <xarray.Dataset>
    Dimensions:    (fid: 13, band: 9)
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
    Data variables:
        count      (band, fid) float64 531.0 693.0 928.0 ... 294.0 1.435e+03 195.0
        min        (band, fid) float64 133.0 277.0 70.0 77.0 ... 400.0 233.0 626.0
        mean       (band, fid) float64 704.0 1.803e+03 650.6 ... 358.1 1.131e+03
        max        (band, fid) float64 1.64e+03 3.57e+03 ... 527.0 1.571e+03
  
 %% Cell type:markdown id:e24456cd-2f73-42e1-be4d-c96df37bd39a tags:
  
 ### 5.2 Solution 2: with `rasterstats`
  
 %% Cell type:markdown id:b4a354d0-4b1b-4912-84ed-29e1150d566d tags:
  
 The main difference of `rasterstats`'s version of `zonal_stats` is that it can be used on a STAC Item with vector data directly. The `ds` and `zones_mask` DataArrays are absolutely unneeded. In this case as well it calculates stats on one band again. Similarly to before, let's create a function, named `s2_zonal_stats2` which calls `rasterstats.zonal_stats` for each band.
  
 %% Cell type:code id:fac946b7 tags:
  
 ``` python
 def s2_zonal_stats2(stac_item, band_names, zones_df):
    fids = zones_df['fid']
    band_xarrays = []
    for band_name in band_names:
        zs = rasterstats.zonal_stats(zones_df, stac_item.assets[band_name].href, stats="count min mean max median")
        sign_spectral = pd.concat([fids, pd.DataFrame(zs)], axis=1).set_index('fid', drop=True)
        band_xarrays.append(sign_spectral.to_xarray())
  
    final = xr.concat(band_xarrays, 'band')
    final = final.assign_coords({'band': ('band', band_names),
                                 'landcover':('fid', zones_df['landcover'])
                                 })
  
    return final
 ```
  
 %% Cell type:code id:74a328bc tags:
  
 ``` python
 stats2 = s2_zonal_stats2(stac_item, bands_of_interest, zones_df)
 stats2
 ```
  
 %% Output
  
    /home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/rasterstats/io.py:328: NodataWarning: Setting nodata to -999; specify nodata explicitly
      warnings.warn(
  
    <xarray.Dataset>
    Dimensions:    (fid: 13, band: 9)
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
    Data variables:
        min        (band, fid) float64 133.0 277.0 70.0 77.0 ... 400.0 233.0 626.0
        max        (band, fid) float64 1.64e+03 3.57e+03 ... 527.0 1.571e+03
        mean       (band, fid) float64 704.0 1.803e+03 650.6 ... 357.1 1.118e+03
        count      (band, fid) int64 531 693 928 1097 874 ... 4779 35318 72 347 49
        median     (band, fid) float64 674.0 1.676e+03 645.5 ... 356.0 1.141e+03
  
 %% Cell type:markdown id:841dd808 tags:
  
 ### 5.3. Storing the results in a xarray Dataset
  
 %% Cell type:markdown id:52a1537d tags:
  
 The result returned by both versions of s2_zonal_stats may look like a DataArray but it is in fact an xarray Dataset. A Dataset is simply a collection of different DataArrays (or variables) which share the same coordinates and dimensions.
  
 In our case all of our different statistics (mean, count, median, etc.) share the same dimensions: band and fid (zone identifier), so they exist as different variables of the same Dataset.
  
  
 It is very easy to select an individual variable and obtain the habitual DataArray
  
 %% Cell type:code id:63d60f3d tags:
  
 ``` python
 # In this case the .count syntax would not work
 # since count() is a built-in method of xarray.Dataset.
 # In fact all of our current variable names happen to share their
 # name with a method, so only the bracket syntax will work
 stats['count']
 ```
  
 %% Output
  
    <xarray.DataArray 'count' (band: 9, fid: 13)>
    array([[   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.]])
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
  
 %% Cell type:markdown id:deafcef3-7243-49cb-a1b2-2d4a1600ee86 tags:
  
 ## 6. Spectral signatures
  
 Now we have the summary statistics for several landcover types, we can plot their spectral signatures.
  
 %% Cell type:code id:1aa56c5f tags:
  
 ``` python
 stats2.sel(fid=(stats2.landcover=='forest03'))['mean'].plot()
 stats2.sel(fid=(stats2.landcover=='forest03'))['max'].plot(c='gray')
 stats2.sel(fid=(stats2.landcover=='forest03'))['min'].plot(c='gray')
 # turning off the y axis label since it would wrongly say 'min' otherwise
 plt.ylabel(None)
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id:452930ab tags:
  
 ``` python
 # the parameter hue='fid' means that each
 # value in the 'fid' dimension leads to a different line
 # which means one coloured line per zone
 stats2['mean'].plot.line(hue='fid')
 plt.legend(stats2.landcover.values, loc='center left', bbox_to_anchor=(1, 0.5))
 ```
  
 %% Output
  
    <matplotlib.legend.Legend at 0x7fe7781e56d0>
  

  
 %% Cell type:markdown id:3190f28d-dee1-4d0f-b467-7b59d8c505c6 tags:
  
 ## 7. Band math and vegetation index
  
 %% Cell type:markdown id:754b62d9-b17a-4980-9de6-2173e0b68b0e tags:
  
 ### 7.1. NDVI principles
  
 The NDVI (Normalized Difference Vegetation Index) is a vegetation index based on the difference between red and near infrared (nIR) values. Its formula is as follows:
  
 $$NDVI = {nIR - Red \over nIR + Red}$$
  
 This index exploits the spectral signature of the vegetation which is very particular, because it shows a very marked peak in the near infrared, and a lower reflectance in the red. This index is very effective in determining the presence of vegetation, but it can also be used to to evaluate the importance of the vegetation biomass as well as the intensity of the photosynthesis activity.
  
 %% Cell type:code id:15b0ec3b-9a51-4577-b394-58dd3e9f8c6d tags:
  
 ``` python
 nir, red = ds.sel(band="B08").astype('float'), ds.sel(band="B04").astype('float')
 ndvi = (nir-red)/(nir+red)
 ndvi.name='NDVI'
 ```
  
 %% Cell type:markdown id:55b17021-7174-4df5-bd70-b9c3758bb468 tags:
  
 ### 7.2. Plotting the NDVI
  
 %% Cell type:code id:3547a038-8930-47fc-b904-5c7e408e94db tags:
  
 ``` python
 plt.imshow(ndvi.squeeze(), cmap="RdYlGn")
 plt.colorbar()
 plt.title('NDVI')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id:7b90664e-7658-44bc-b389-4c420e3daf86 tags:
  
 ``` python
 ndvi.plot.hist()
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:de7e0dea-c647-4692-b25a-7039d2d9d948 tags:
  
 ### 7.3. Using Jupyter widgets
  
 It is possible to add elements (*slider*, *progress bar*, *checkbox*, *radio buttons*, etc.) to interact with the data visualization. To do this, load the `ipywidgets` library.
  
 %% Cell type:code id:a3646df1-52b6-4957-aead-7b64b5010288 tags:
  
 ``` python
 import ipywidgets as widgets
 from ipywidgets import interact, interact_manual
  
 def threshold(seuil):
    seuil = np.where(ndvi.squeeze()>seuil, 1, 0)
    plt.imshow(seuil, cmap="viridis", interpolation='nearest')#, cmap=plt.cm.gray)
    plt.colorbar()
    plt.show()
  
 interact(threshold, seuil = widgets.FloatSlider(min=-1, max=1, step=0.001))
 ```
  
 %% Output
  
  
    <function __main__.threshold(seuil)>

 %% Cell type:markdown id:ba202a3a-c9bf-4df5-aa1e-688edc575137 tags:
  
 # IV. Zonal statistics, spectral signatures and vegetation index
  
 %% Cell type:markdown id:9e2184c7-c94a-4eec-906b-f36281433717 tags:
  
 ---
 **Author(s):** Kenji Ose, Quentin Yeche, Dino Ienco - [UMR TETIS](https://umr-tetis.fr) / [INRAE](https://www.inrae.fr/)
  
 ---
  
 %% Cell type:markdown id:2bd8ed9c-f1af-40d7-9be4-516846a2bca1 tags:
  
 ## 1. Introduction
  
 Here are presented two solutions for computing zonal statistics on Sentinel-2 image with polygon vector file. We will try two librairies : `xrspatial` and `rasterstats`.
  
 %% Cell type:markdown id:4f4d36e3 tags:
  
 ### 1.1. Outline
  
 In [section 4](#4-loading-the-polygon-vector-file) we will be loading vector data from a file and converting it to a raster so that we can use it to delimit zones
  
 In [section 5](#5-computing-of-zonal-statistics) we will be computing zonal statistics using two different methods, and getting the results back in the form of an xarray Dataset.
  
 In [section 6](#6-spectral-signatures) we will plot spectral signatures.
  
 Finally [section 7](#7-band-math-and-vegetation-index) will cover calculating and display indexes such as the NDVI.
  
 %% Cell type:markdown id:0bb2c59d-1d5e-4d01-ab56-36e2736987b1 tags:
  
 ## 2. Library imports
  
 As usual, we import all the required Python libraries. The two new ones in this notebook are `xrspatial` and `rasterstats` which are two libraries for computing zonal statistics.
  
 %% Cell type:code id:493ef787 tags:
  
 ``` python
 # this cell is only useful if you're using an environment like Google Collab or
 # Microsoft Planetary Computer's servers
 def test_import_or_install(imports):
  import importlib
  restart_needed = False
  for import_name, pip_name in imports:
    try:
      importlib.import_module(import_name, package=None)
    except ModuleNotFoundError:
      if not restart_needed:
          restart_needed= True
          print('\033[91m' + ("ONE OR MORE LIBRARIES HAVE TO BE INSTALLED, "
          "PLEASE RESTART THE NOTEBOOK AFTER THIS CELL FINISHES EXECUTING "
          "TO ENSURE PROPER FUNCTIONALITY") + "\x1b[0m")
      %pip install {pip_name}
  if restart_needed:
    print('\033[91m' + ("LIBRARIES HAVE BEEN INSTALLED. "
          "PLEASE RESTART THE NOTEBOOK NOW ") + "\x1b[0m")
  
 imports = [('pystac_client', 'pystac-client'),
           ('planetary_computer', 'planetary-computer'),
           ('stackstac', 'stackstac'),
           ('xrspatial', 'xarray-spatial'),
           ('rasterstats', 'rasterstats')
           ]
  
 test_import_or_install(imports)
 ```
  
 %% Cell type:code id:8bfabb34-d174-45e0-aa3d-34f5ae585819 tags:
  
 ``` python
 # STAC access
 import pystac_client
 import planetary_computer
  
 # (geo)dataframes
 import pandas as pd
 import geopandas as gpd
  
 # xarrays
-    import xarray as xr
+import xarray as xr
  
 from rasterio import features
  
 # library for turning STAC objects into xarrays
 import stackstac
  
 # visualization
 from matplotlib import pyplot as plt
  
 # libraries for zonal stats
 import xrspatial
 import rasterstats
  
 # miscellanous
 import numpy as np
 from IPython.display import display
 ```
  
 %% Cell type:markdown id:f6558a2f-e365-4591-bc1a-5dfbb448fc44 tags:
  
 ## 3. Creating a `DataArray` from STAC object
  
 ### 3.1. Getting a Sentinel-2 STAC Item
  
 As a practical use case let's consider that we have identified the STAC Item we're interested in (see [this notebook](Joensuu_01-STAC.ipynb) for a refresher), and we also have an area of interest defined as a bounding box.
  
 %% Cell type:code id:1dc9ac4a-eab8-4d38-89a6-27fe00d2a841 tags:
  
 ``` python
 # Access to Planetary Computer API
 root_catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
 )
  
 item_id = 'S2A_MSIL2A_20201213T104441_R008_T31TEJ_20201214T083443'
 stac_item = root_catalog.get_collection("sentinel-2-l2a").get_item(item_id)
 ```
  
 %% Cell type:markdown id:a6b88951-0ebb-44a6-bae2-dee1f4e3c3bf tags:
  
 ### 3.2. Loading the Sentinel-2 image
  
 We stack the item's assets, which are the spectral bands, into a `DataArray` with a resampling at a spatial resolution of 10m.
  
 We also collect metadata that will be useful for the following processing steps.
  
 %% Cell type:code id:6aad6b84-4a5c-4f74-aa0d-6c5db13a143b tags:
  
 ``` python
 # bounding box expressed in Lat/Lon
 aoi_bounds = (3.875107329166124, 43.48641456618909, 4.118824575734205, 43.71739887308995)
  
 # bands of interest
 bands_of_interest = ['B02','B03','B04','B05','B06','B07','B08','B11','B12']
  
 FILL_VALUE = 2**16-1
  
 ds = stackstac.stack(
                stac_item,
                assets = bands_of_interest,
                resolution=10,
                dtype="uint16",
                fill_value=FILL_VALUE,
                bounds_latlon=aoi_bounds,
                    ).squeeze()
  
 s2_ref_crs = ds.crs
 s2_ref_trf = ds.transform
 s2_ref_shape = (ds['y'].size, ds['x'].size)
  
 print(f'- S2 CRS: {s2_ref_crs}')
 print(f'- S2 affine transform: \n{s2_ref_trf}')
 print(f'- S2 YX dimension: {s2_ref_shape}')
 ```
  
 %% Output
  
    - S2 CRS: epsg:32631
    - S2 affine transform:
    | 10.00, 0.00, 570490.00|
    | 0.00,-10.00, 4841100.00|
    | 0.00, 0.00, 1.00|
    - S2 YX dimension: (2590, 1999)
  
    /home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/stackstac/prepare.py:364: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
      times = pd.to_datetime(
  
 %% Cell type:markdown id:694954ca-b4c3-4a6f-891e-1fc16c9dca44 tags:
  
 ## 4. Loading the polygon vector file
  
 ### 4.1. Conversion into data array
  
 The vector file, named `sample.geojson`, contains vector data defining 13 rectangular zones which will be our area of interest. The file has an attribute table with the following information:
 - **fid**: unique ID [integer]
 - **geometry**: coordinates of entity's polygon <[list]
 - **landcover**: label [string]
  
 %% Cell type:code id:145557a7 tags:
  
 ``` python
 zones_df = gpd.read_file('sample.geojson')
 print(f"GeoDataFrame uses {zones_df.crs}")
 zones_df
 ```
  
 %% Output
  
    GeoDataFrame uses EPSG:4326
  
        fid   landcover                                           geometry
    0     1     urban01  POLYGON ((3.91165 43.57305, 3.91165 43.57490, ...
    1     2     urban02  POLYGON ((3.93971 43.57152, 3.93971 43.57383, ...
    2     3     urban03  POLYGON ((3.87511 43.60920, 3.87511 43.61179, ...
    3     4  baresoil01  POLYGON ((3.93528 43.62391, 3.93528 43.62714, ...
    4     5  baresoil02  POLYGON ((3.98472 43.64148, 3.98472 43.64420, ...
    5     6      agri01  POLYGON ((4.04561 43.62706, 4.04561 43.62959, ...
    6     7      agri02  POLYGON ((4.10558 43.65974, 4.10558 43.66209, ...
    7     8      agri03  POLYGON ((4.11217 43.59764, 4.11217 43.60231, ...
    8     9     water01  POLYGON ((4.01531 43.56791, 4.01531 43.58062, ...
    9    10     water02  POLYGON ((3.98148 43.48641, 3.98148 43.52013, ...
    10   11    forest01  POLYGON ((3.99250 43.69722, 3.99250 43.69882, ...
    11   12    forest02  POLYGON ((4.01740 43.70491, 4.01740 43.70872, ...
    12   13    forest03  POLYGON ((4.00686 43.71606, 4.00686 43.71740, ...
  
 %% Cell type:markdown id:c3eaa66d tags:
  
 The GeoDataFrame uses the CRS EPSG:4326. However we previously saw that our Sentinel-2 data has EPSG:32631 for its CRS. We need to reproject one of them to match the other's CRS. It is almost always recommended to reproject vector data rather than raster data. Since vector data is essentially continuous reprojecting is seamless. On the other hand raster data is discrete (pixels) so a reprojection typically entails interpolating new pixel values which is undesirable.
  
 Conveniently `geopandas` includes a `to_crs` method to reproject a GeoDataFrame to a new CRS.
  
 %% Cell type:code id:a418cab0-4f65-489d-905b-06d19c7ec451 tags:
  
 ``` python
 zones_df = zones_df.to_crs(s2_ref_crs)
 zones_df.crs
 ```
  
 %% Output
  
    <Projected CRS: EPSG:32631>
    Name: WGS 84 / UTM zone 31N
    Axis Info [cartesian]:
    - E[east]: Easting (metre)
    - N[north]: Northing (metre)
    Area of Use:
    - name: Between 0°E and 6°E, northern hemisphere between equator and 84°N, onshore and offshore. Algeria. Andorra. Belgium. Benin. Burkina Faso. Denmark - North Sea. France. Germany - North Sea. Ghana. Luxembourg. Mali. Netherlands. Niger. Nigeria. Norway. Spain. Togo. United Kingdom (UK) - North Sea.
    - bounds: (0.0, 0.0, 6.0, 84.0)
    Coordinate Operation:
    - name: UTM zone 31N
    - method: Transverse Mercator
    Datum: World Geodetic System 1984 ensemble
    - Ellipsoid: WGS 84
    - Prime Meridian: Greenwich
  
 %% Cell type:markdown id:1e5bd96b tags:
  
 In order to compute zonal statistics, first we have to convert the vector zone data into a labeled raster with `rasterio.features`. Labels must be of integer type.
  
 %% Cell type:code id:13043345 tags:
  
 ``` python
 geometry = zones_df[['geometry', 'fid']].values.tolist()
  
 # creating a raster from the vector data, which an overall shape identical to ds
 # and using the same transform as ds
 zones_mask = features.rasterize(geometry, out_shape=s2_ref_shape, fill=0, transform=s2_ref_trf)
 # converting the numpy array in a DataArray so the types are the same
 zones_mask= xr.DataArray(zones_mask, dims=ds.isel(band=0).dims)
 zones_mask
 ```
  
 %% Output
  
    <xarray.DataArray (y: 2590, x: 1999)>
    array([[0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           ...,
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0],
           [0, 0, 0, ..., 0, 0, 0]], dtype=uint8)
    Dimensions without coordinates: y, x
  
 %% Cell type:code id:d2504d24 tags:
  
 ``` python
 ds.sel(band=['B04', 'B03', 'B02']).plot.imshow(robust=True)
 plt.gca().set_aspect('equal')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:6b18ec15-7683-473a-b61c-e7adfd6e054e tags:
  
 ### 4.2. Displaying the labeled image
  
 %% Cell type:code id:f8a06a63-e809-4707-9d4e-25328d79d009 tags:
  
 ``` python
 zones_mask.plot.imshow(add_colorbar=False)
 plt.gca().set_aspect('equal')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:3d220867-33ea-49c6-902e-6e6d8a823a1a tags:
  
 ## 5. Computing of zonal statistics
  
 ### 5.1. Solution 1: with `xrspatial`
  
 Here, we compute statistics based on each Sentinel-2 bands and merge the results into the vector file attribute table.
  
 We will use the `zonal_stats` function of `xrspatial`. However this function only handles one band at a time. So we create a function, named `s2_zonal_stats`, that calls `xrspatial.zonal_stats` once for each band and aggregates the results into an xarray Dataset.
  
 %% Cell type:code id:6cc24380 tags:
  
 ``` python
 def s2_zonal_stats(ds, zones_mask, zones_df, pre_compute = True):
    # compute is needed at some point because
    # both ds and patch_mask must be numpy backed DataArrays.
    # It is faster to pre-compute all bands at once
    # but requires more memory
    if pre_compute:
        ds = ds.compute()
    final = zones_df.copy()
    band_xarrays = []
    for band_name, single_band in ds.groupby('band'):
        # will do nothing if pre_compute = True
        # otherwise will turn single_band into a numpy backed DataArray
        single_band = single_band.compute()
        sign_spectral = xrspatial.zonal_stats(zones_mask,
                                 single_band,
                                 stats_funcs = ['count','min','mean','max'], nodata_values = 0)
  
        # merging the zonal stats with the zone id fid
        # using how='right' because we want to discard any zone which isn'try
        # in zones_df (in this case it's the rest of the images which zonal_stats considers
        # its own zone)
        sign_spectral = sign_spectral.merge(zones_df['fid'],
                                            how='right',
                                            left_on='zone',
                                            right_on='fid')
  
        sign_spectral = sign_spectral.drop('zone', axis=1).set_index('fid', drop=True)
        band_xarrays.append(sign_spectral.to_xarray())
  
  
    final = xr.concat(band_xarrays, 'band')
    final = final.assign_coords({'band': ('band', ds.band.data),
                                 'landcover':('fid', zones_df['landcover'])
                                 })
  
  
    return final
 ```
  
 %% Cell type:code id:fe128acd tags:
  
 ``` python
 stats = s2_zonal_stats(ds, zones_mask, zones_df)
 stats
 ```
  
 %% Output
  
    <xarray.Dataset>
    Dimensions:    (fid: 13, band: 9)
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
    Data variables:
        count      (band, fid) float64 531.0 693.0 928.0 ... 294.0 1.435e+03 195.0
        min        (band, fid) float64 133.0 277.0 70.0 77.0 ... 400.0 233.0 626.0
        mean       (band, fid) float64 704.0 1.803e+03 650.6 ... 358.1 1.131e+03
        max        (band, fid) float64 1.64e+03 3.57e+03 ... 527.0 1.571e+03
  
 %% Cell type:markdown id:e24456cd-2f73-42e1-be4d-c96df37bd39a tags:
  
 ### 5.2 Solution 2: with `rasterstats`
  
 %% Cell type:markdown id:b4a354d0-4b1b-4912-84ed-29e1150d566d tags:
  
 The main difference of `rasterstats`'s version of `zonal_stats` is that it can be used on a STAC Item with vector data directly. The `ds` and `zones_mask` DataArrays are absolutely unneeded. In this case as well it calculates stats on one band again. Similarly to before, let's create a function, named `s2_zonal_stats2` which calls `rasterstats.zonal_stats` for each band.
  
 %% Cell type:code id:fac946b7 tags:
  
 ``` python
 def s2_zonal_stats2(stac_item, band_names, zones_df):
    fids = zones_df['fid']
    band_xarrays = []
    for band_name in band_names:
        zs = rasterstats.zonal_stats(zones_df, stac_item.assets[band_name].href, stats="count min mean max median")
        sign_spectral = pd.concat([fids, pd.DataFrame(zs)], axis=1).set_index('fid', drop=True)
        band_xarrays.append(sign_spectral.to_xarray())
  
    final = xr.concat(band_xarrays, 'band')
    final = final.assign_coords({'band': ('band', band_names),
                                 'landcover':('fid', zones_df['landcover'])
                                 })
  
    return final
 ```
  
 %% Cell type:code id:74a328bc tags:
  
 ``` python
 stats2 = s2_zonal_stats2(stac_item, bands_of_interest, zones_df)
 stats2
 ```
  
 %% Output
  
    /home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/rasterstats/io.py:328: NodataWarning: Setting nodata to -999; specify nodata explicitly
      warnings.warn(
  
    <xarray.Dataset>
    Dimensions:    (fid: 13, band: 9)
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
    Data variables:
        min        (band, fid) float64 133.0 277.0 70.0 77.0 ... 400.0 233.0 626.0
        max        (band, fid) float64 1.64e+03 3.57e+03 ... 527.0 1.571e+03
        mean       (band, fid) float64 704.0 1.803e+03 650.6 ... 357.1 1.118e+03
        count      (band, fid) int64 531 693 928 1097 874 ... 4779 35318 72 347 49
        median     (band, fid) float64 674.0 1.676e+03 645.5 ... 356.0 1.141e+03
  
 %% Cell type:markdown id:841dd808 tags:
  
 ### 5.3. Storing the results in a xarray Dataset
  
 %% Cell type:markdown id:52a1537d tags:
  
 The result returned by both versions of s2_zonal_stats may look like a DataArray but it is in fact an xarray Dataset. A Dataset is simply a collection of different DataArrays (or variables) which share the same coordinates and dimensions.
  
 In our case all of our different statistics (mean, count, median, etc.) share the same dimensions: band and fid (zone identifier), so they exist as different variables of the same Dataset.
  
  
 It is very easy to select an individual variable and obtain the habitual DataArray
  
 %% Cell type:code id:63d60f3d tags:
  
 ``` python
 # In this case the .count syntax would not work
 # since count() is a built-in method of xarray.Dataset.
 # In fact all of our current variable names happen to share their
 # name with a method, so only the bracket syntax will work
 stats['count']
 ```
  
 %% Output
  
    <xarray.DataArray 'count' (band: 9, fid: 13)>
    array([[   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.],
           [   531.,    693.,    928.,   1097.,    874.,    987.,    570.,
              2786.,  19109., 141260.,    294.,   1435.,    195.]])
    Coordinates:
      * fid        (fid) int64 1 2 3 4 5 6 7 8 9 10 11 12 13
      * band       (band) <U3 'B02' 'B03' 'B04' 'B05' 'B06' 'B07' 'B08' 'B11' 'B12'
        landcover  (fid) object 'urban01' 'urban02' ... 'forest02' 'forest03'
  
 %% Cell type:markdown id:deafcef3-7243-49cb-a1b2-2d4a1600ee86 tags:
  
 ## 6. Spectral signatures
  
 Now we have the summary statistics for several landcover types, we can plot their spectral signatures.
  
 %% Cell type:code id:1aa56c5f tags:
  
 ``` python
 stats2.sel(fid=(stats2.landcover=='forest03'))['mean'].plot()
 stats2.sel(fid=(stats2.landcover=='forest03'))['max'].plot(c='gray')
 stats2.sel(fid=(stats2.landcover=='forest03'))['min'].plot(c='gray')
 # turning off the y axis label since it would wrongly say 'min' otherwise
 plt.ylabel(None)
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id:452930ab tags:
  
 ``` python
 # the parameter hue='fid' means that each
 # value in the 'fid' dimension leads to a different line
 # which means one coloured line per zone
 stats2['mean'].plot.line(hue='fid')
 plt.legend(stats2.landcover.values, loc='center left', bbox_to_anchor=(1, 0.5))
 ```
  
 %% Output
  
    <matplotlib.legend.Legend at 0x7fe7781e56d0>
  

  
 %% Cell type:markdown id:3190f28d-dee1-4d0f-b467-7b59d8c505c6 tags:
  
 ## 7. Band math and vegetation index
  
 %% Cell type:markdown id:754b62d9-b17a-4980-9de6-2173e0b68b0e tags:
  
 ### 7.1. NDVI principles
  
 The NDVI (Normalized Difference Vegetation Index) is a vegetation index based on the difference between red and near infrared (nIR) values. Its formula is as follows:
  
 $$NDVI = {nIR - Red \over nIR + Red}$$
  
 This index exploits the spectral signature of the vegetation which is very particular, because it shows a very marked peak in the near infrared, and a lower reflectance in the red. This index is very effective in determining the presence of vegetation, but it can also be used to to evaluate the importance of the vegetation biomass as well as the intensity of the photosynthesis activity.
  
 %% Cell type:code id:15b0ec3b-9a51-4577-b394-58dd3e9f8c6d tags:
  
 ``` python
 nir, red = ds.sel(band="B08").astype('float'), ds.sel(band="B04").astype('float')
 ndvi = (nir-red)/(nir+red)
 ndvi.name='NDVI'
 ```
  
 %% Cell type:markdown id:55b17021-7174-4df5-bd70-b9c3758bb468 tags:
  
 ### 7.2. Plotting the NDVI
  
 %% Cell type:code id:3547a038-8930-47fc-b904-5c7e408e94db tags:
  
 ``` python
 plt.imshow(ndvi.squeeze(), cmap="RdYlGn")
 plt.colorbar()
 plt.title('NDVI')
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:code id:7b90664e-7658-44bc-b389-4c420e3daf86 tags:
  
 ``` python
 ndvi.plot.hist()
 plt.show()
 ```
  
 %% Output
  

  
 %% Cell type:markdown id:de7e0dea-c647-4692-b25a-7039d2d9d948 tags:
  
 ### 7.3. Using Jupyter widgets
  
 It is possible to add elements (*slider*, *progress bar*, *checkbox*, *radio buttons*, etc.) to interact with the data visualization. To do this, load the `ipywidgets` library.
  
 %% Cell type:code id:a3646df1-52b6-4957-aead-7b64b5010288 tags:
  
 ``` python
 import ipywidgets as widgets
 from ipywidgets import interact, interact_manual
  
 def threshold(seuil):
    seuil = np.where(ndvi.squeeze()>seuil, 1, 0)
    plt.imshow(seuil, cmap="viridis", interpolation='nearest')#, cmap=plt.cm.gray)
    plt.colorbar()
    plt.show()
  
 interact(threshold, seuil = widgets.FloatSlider(min=-1, max=1, step=0.001))
 ```
  
 %% Output
  
  
    <function __main__.threshold(seuil)>

--- a/07-Machine-learning.ipynb
+++ b/07-Machine-learning.ipynb
--- a/Dockerfile_jupyterhub
+++ b/Dockerfile_jupyterhub
@@ -14,7 +14,6 @@ RUN useradd -m beyond &&\

 # creating environment
 ADD environment_jupyterhub.yml ${dest_yml}
-
 RUN mamba env create --file ${dest_yml} --name ${env_name}

 # setting up shell to use conda run

--- a/Dockerfile_jupyterlab
+++ b/Dockerfile_jupyterlab
@@ -8,7 +8,7 @@ ARG env_name=rs-notebooks
 ADD *.ipynb /notebooks/
 ADD sample.geojson /notebooks/

-#adding user
+# creating user
 RUN useradd -m beyond &&\
    chown -R beyond:beyond /notebooks

@@ -33,4 +33,4 @@ WORKDIR /notebooks
 # choosing new user as default
 USER beyond

-#CMD jupyter lab --ip 0.0.0.0 --no-browser --allow-root
+CMD jupyter lab --ip 0.0.0.0 --no-browser
--- a/README.md
+++ b/README.md
-# Tutorial notebooks in Python and R for remote sensing
+# Tutorial notebooks in Python for remote sensing
+
+## Getting Started
+
+There are a variety of ways to use these notebooks, but unelss you're using a server with the notebooks pre-loaded, it all starts with:
+
+### Cloning the repo
+```
+git clone git@forgemia.inra.fr:beyond/remote_sensing_notebooks.git
+```
+
+From here, here are the options:
+
+### Local installations
+
+#### Using conda
+
+For this type of install you will need to have [miniconda](https://docs.conda.io/projects/miniconda/en/latest/), [conda](https://docs.conda.io/en/latest/), or [miniforge](https://github.com/conda-forge/miniforge) installed. Please follow the relevant links for installation instructions
+
+```
+cd remote_sensing_notebooks
+```
+
+Installing the conda environment (replace conda with mamba if applicable). Note that this can take a few minutes to complete.
+
+```
+conda env create --name rs_notebooks --file environment_jupyterlab.yml
+```
+To launch the notebooks after installation:
+
+1. Activate the envionment:
+```
+conda activate rs_notebooks
+```
+2. Launch Jupyterlab:
+
+```
+jupyter lab
+```
+3. Click one of the links to open Jupyter Lab in a browser.
+
+As an alternative you can also open the notebooks directly in a program which supports Python notebooks (Visual Studio Code, PyCharm, etc.), but make sure that the right conda environment is selected.
+
+#### Using Docker
+
+The repositroy includes two Dockerfiles to create Docker images. The first image is for a Jupyter Hub client installation. Most users should instead use the Dockerfile for Jupyter Lab.
+
+```
+cd remote_sensing_notebooks
+```
+
+Building the docker image (this can take a few minutes):
+
+```
+sudo docker buildx build --tag rs_notebooks --file Dockerfile_jupyterlab .
+```
+
+To launch the enotebooks after installation:
+
+1. Run the docker image
+```
+sudo docker run -tp 8888:8888 rs_notebooks
+```
+
+2. Click one of the links to open Jupyter Lab in a browser.
+
+### using on a remote server
+
+It is quite possible to import and run these notebooks on a remote Jupyter notebook servers. Two publicly available options are [Google Colab](https://colab.research.google.com/) and [Microsoft Planetary Computer Hub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/login?next=%2Fcompute%2Fhub%2F). Here are a few key characteristics for each of these services:
+
+#### Google Colab:
+- free to use with a Google account
+- easy to share notebooks with others
+- limited compute power
+- can [access Google Drive](https://colab.research.google.com/notebooks/io.ipynb#scrollTo=c2W5A2px3doP) to enable data persistence
+
+#### Microsoft Planetary Computer Hub
+-  free to use (at the time of writing) but access must be requested
+- easier access to the Microsoft Planetary Computer STAC Catalog which is the dedicated method to access satellite images utilized in these notebooks.
+- includes QGis
+
+>**Note:** Some Python packages are not included in the default environment of Google Colab or Planetary Computer Hub. When this is the case, the missing packages will be installed upon executing the notebooks. 

 ## Notebook list

-1. STAC Catalogs (SpatioTemporel Asset Catalog)    
+1. Discovery of Satellite Data Through a STAC Catalog    
    - What is a STAC catalog?
-    - Metadata
-    - Querying from a vector file/coordinates
-2. Handling single-date and single-band data
-    - Using rasterio
-    - Cropping and printing a band (10m, 20m)
-    - Visual comparisons
-    - Dynamics histograms
-    - Contrast
-    - Visualizing and manipulating cloud cover masks
-3. Xarray for multi-band data
-    - Querying and data retrieval
-    - Visual comparisons
-    - Dynamics histograms
+    - Opening GeoJSON files
+    - Exploring a STAC catalog and some sattelite metadata
+    - Querying  a STAC catalog from a vector file/coordinates
+    - Displaying extents and GeoJSON layers on a map
+2. Opening a Sentinel-2 Image With rasterio
+    - Obtaining an Asset and looking at its properties
+    - Getting the relevant part of a satellite image
+    - Adjusting the contrast for visuals
+    - Color composites
    - Contrast
    - Visualizing and manipulating cloud cover masks
-    - Color composites
-4. Data cubes
+3. Opening a Sentinel-2 image with `Xarray`
+    - Creating a data cube from a STAC object
+    - Xarray overview
+    - Indexing and selecting
+    - Plotting data
+4. Zonal statistics, spectral signatures and vegetation index
+    - Loading vector data and converting to a raster to delimit zones
+    - Computing of zonal statistics
+    - Spectral signatures
+    - Band math and vegetation index
+5. Time series (part 1)
    - Building a times series data cube (spatial extent and time period)
-    - Visualizing a time series
-    - Applying masks
-    - Extracting spectral signatures
-5. Gap filling for time series
-6. indexes
-    - Calculating indexes (NDVI, etc.)
-    - Visualizing indexes (time series color composites, animated gifs, etc.)
-7. Sampling according to vector data and plotting
-8. Detecting changes with indexes and time series color composites
-9. Zonal statistics (requires identifying use cases)
-10. Thresholding indexes and exporting vector data
+    - Operations on the time dimension
+6. Time series: visualization
+    - Plotting time series images
+    - Gap Filling
+    - Variations and change in a time series
+7. Image classfication with a machine learning algorithm
+    - Data pre-processing
+    - Classification with machine learning methods
+    - Plotting results
\ No newline at end of file
--- a/environment_jupyterhub.yml
+++ b/environment_jupyterhub.yml
 channels:
-  - conda-forge
-  - defaults
+- conda-forge
+- defaults
 dependencies:
-  - python
-  - ipykernel
-  - geopandas
-  - pandas
-  - numpy
-  - matplotlib
-  - folium
-  - rioxarray
-  - jupyterhub==4.0.2
-  - jupyterlab
-  - pip
-  - pip:
-      - planetary-computer
-      - pystac-client
-      - rich
-      - stackstac
-      - bottleneck
-      - geogif
-      - xarray-spatial
-      - rasterstats
+# 3.11 is not a hard requirement but numba does not support 3.12 at the time of testing
+- python=3.11
+- ipykernel
+- geopandas
+- pandas
+- numpy
+- matplotlib
+- folium
+- rioxarray
+- jupyterhub-singleuser=3.0.0
+- pip
+- pip:
+  - planetary-computer
+  - pystac-client
+  - rich
+  - stackstac
+  - bottleneck
+  - geogif
+  - xarray-spatial
+  - rasterstats
--- a/environment_jupyterlab.yml
+++ b/environment_jupyterlab.yml
 channels:
-  - conda-forge
-  - defaults
+- conda-forge
+- defaults
 dependencies:
-  - python
-  - ipykernel
-  - geopandas
-  - pandas
-  - numpy
-  - matplotlib
-  - folium
-  - rioxarray
-  - jupyterlab
-  - pip
-  - pip:
-      - planetary-computer
-      - pystac-client
-      - rich
-      - stackstac
-      - bottleneck
-      - geogif
-      - xarray-spatial
-      - rasterstats
+# 3.11 is not a hard requirement but numba does not support 3.12 at the time of testing
+- python=3.11
+- ipykernel
+- geopandas
+- pandas
+- numpy
+- matplotlib
+- folium
+- rioxarray
+- jupyterlab
+- pip
+- pip:
+  - planetary-computer
+  - pystac-client
+  - rich
+  - stackstac
+  - bottleneck
+  - geogif
+  - xarray-spatial
+  - rasterstats