Forked from Cresson Remi / SimpleExtractionTools
Source project has a limited visibility.
03-Xarray.ipynb 2.72 MiB

In this notebook we will cover basic manipulation of multi-band imagery with Xarray. This is the last step before reaching the most complex use case of multi-band and multi-date data. Fortunately the libraries and concepts we cover here will transfer to times series.

In [243]:
# STAC access
import pystac_client
import planetary_computer

# dataframes
import pandas as pd

# xarrays
import xarray as xr

# library for turning STAC objects into xarrays
import stackstac

# visualization
from matplotlib import pyplot as plt

# miscellanous
import numpy as np
from IPython.display import display

As a practical use case let's consider that we have identified the STAC Item we're interested in (see this notebook for a refresher), and we also have an area of interest defined as a bounding box.

In [244]:
aoi_bounds = (3.875107329166124, 43.48641456618909, 4.118824575734205, 43.71739887308995)

# retrieving the relevant STAC Item
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=planetary_computer.sign_inplace,
    )

item = catalog.get_collection('sentinel-2-l2a').get_item("S2A_MSIL2A_20201213T104441_R008_T31TEJ_20201214T083443")
display(item)

assets = [asset.title for asset in item.assets.values()]
descriptions = pd.DataFrame(assets, columns=['Description'], index=pd.Series(item.assets.keys(), name='asset_key'))
descriptions
Out [244]:
(Click to sort ascending)
Description
(Click to sort ascending)
asset_key
(Click to sort ascending)
0Aerosol optical thickness (AOT)
1Band 1 - Coastal aerosol - 60m
2Band 2 - Blue - 10m
3Band 3 - Green - 10m
4Band 4 - Red - 10m
5Band 5 - Vegetation red edge 1 - 20m
6Band 6 - Vegetation red edge 2 - 20m
7Band 7 - Vegetation red edge 3 - 20m
8Band 8 - NIR - 10m
9Band 9 - Water vapor - 60m
10Band 11 - SWIR (1.6) - 20m
11Band 12 - SWIR (2.2) - 20m
12Band 8A - Vegetation red edge 4 - 20m
13Scene classfication map (SCL)
14Water vapour (WVP)
15True color image
16Thumbnail
17SAFE manifest
18Granule metadata
19INSPIRE metadata
20Product metadata
21Datastrip metadata
22TileJSON with default rendering
23Rendered preview

Creating an xarray from a STAC object

Now that we have our STAC Item we will use stackstac in order to create a multi-dimensional array for our multi-band data. This library is very powerful and will handle in a single call a lot of small inconveniences: resampling, data types, CRS, etc.

The function stackstac.stack has quite a few parameters, but most values can often be left default. Here are the ones we will be using here:

  • assets. We don't want all the assets in the Item, especially since some of them are just metadata and have no dimension. Let's select all the bands: B02 through B08, B11 and B12
  • resolution. As we can see in the table above, not all the bands have the same resolution. If we want them in a multi-dimensional array we need to choose a single resolution for all the bands. Here we choose 10 (for 10 meters), which means that the 20 and 60 meter bands will be resampled to 10 m. The resampling parameter can be added to specify which resampling method to use, but here the default value (nearest) is what we want.
  • dtype. By default the data type is float64. Sentinel-2 data is almost always distributed as 16 bit integers. Thus we will use uint16 which will save space and allow faster computations. See the note below for more details.
  • fill_value. This is the value to use if data is somehow mising. However the default value np.nan (numpy's version of NaN i.e. 'not a number') is only valid for floating point numbers, not for integers. Here we will use . See the note below for more details.
  • bounds_latlon and bounds. Only one of the two can be used. Here we use the former, which allows us to directly specify a bounding box as latitude and longitude. On the other hand bounds requires using the correct coordinate system as defined by the epsg parameter if specified, or the proj:epsg property on the STAC assets (and in this case all assets must have the same CRS) if the epsg parameter is left as default.
More on data types and fill values

Understanding data types for satellite imagery is not especially difficult. But as they rarely pose a problem it can be easy to overlook them. The following explanation will be focused on Sentinel-2 data. For other satellite images it can be very different so some attention is needed when transferring or writing code for this imagery. Moreover, even for Sentinel-2 data (for example) some things can differ depending on the data distributer (Theia, Planetary, etc.).

For the most part the Copernicus which manages the Sentinel mission offers 3 main products for Sentinel-2: Level-1B, Level-1C, and Level-2A. They correspond to increasingly processed and refined data. For the vast majority of uses and users, Level-2A is the right choice. As described here, Level-2A is reflectance data.

Reflectance is defined as the proportion of incident radiant energy that is reflected by a surface. It ranges from 0 (total absorption) to 1 (total reflectance). However for reasons of efficiency and convenience Sentinel-2 Level-2A is typically distributed as integers. The conversion is simply which results in integers values which range from to . This means that valid data will never fall between and . Thus we can safely choose as the fill value, and we should not be able to ever confuse valid and invalid data

In [245]:
bands = ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B11', 'B12']
FILL_VALUE = 2**16-1
array = stackstac.stack(
                    item,
                    assets = bands,
                    resolution=10,
                    dtype="uint16",
                    fill_value=FILL_VALUE,
                    bounds_latlon=aoi_bounds,
                    )

#equivalent to display(array)
#when it's the last line in a cell
array 
Out [245]:
/home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/stackstac/prepare.py:363: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
  times = pd.to_datetime(

Xarray overview

Xarrays are multi-dimensional arrays. Some of the advantages over more traditonal arrays such as NumPy arrays include:

  • they allow to seamlessly manipulate data without having to hold it in memory all at once,
  • they allow for named dimensions,
  • they provide support for holding various metadata alongside the numeric data.

Just like pandas dataframes and NumPy arrays, xarray does not do in-place processing. This means that most operations on an xarray do not modify the xarray, they return a new modified array instead.

Note: It may seem wasteful and/or slow to create new objects instead of modifying them. It is sometimes the case. However Xarray (and NumPy before it) was developped with this in mind so it is well optimized for this. In fact it is much more of a stylistic choice (i.e. how the code is written) than a fundamental inability to modify objects.

Xarrays display very nicely as the last lign in cell (or with a IPython.display.display call). From this visualization we can already gather a lot of information about our data and xarrays in general.

The first table and accompanying cube visual essentially show the same information. First of all the table is separated into two columns: array and chunks. Arrays are divided into chunks for reasons of efficiency. If you need to perform a lot of calculations on big arrays, it can become necessary to choose an appropriate chunk size to allow for better computation

The rows of the table are fairly self-explanatory. The Bytes row gives the size of the array and a single chunk. It is important to note that even if the size is given, it does not mean that the data has been loaded. In fact at this point in the code the xarray is created but the data has not been downloaded. But the size can still be calculated from the dimensions of the array and the data type.

The visual on the right give a spatial interpetation to the table data. The rectangular cuboid on the right is a representation of the 2 spatial dimensions as well as the 9 bands we selected. We can see that the spatial dimensions are separated into 3 rows and 2 columns. These smaller sections are the chunks. By looking at the table we can see that the chunks have spatial dimensions 1024 x 1024. Given our array of dimensions 1999 x 2590, the number of chunks is coherent.

The smaller square at the top-left represents the extra time dimension (of size 1 in this case since this is single-date data).

Coordinates and dimensions

Note: The name Coordinate can be confusing, especially since it does not inherently relate in any way to spatial coordinates of any kind. Xarrays are used in fields other than GIS, but the terminology still applies. Feel free to understand "coordinates" as "metadata which may be aligned along the dimensions of the array".

If we look at the Coordinates section we can see a lot of different rows. Feel free to explore these coordinates. You can see more about a coordinate by clicking the grey icon (stacked disks) to the right.

Here 4 of the rows have names in bold: time, band, x and y. These coordinates are also dimensions. They are named dimensions, which means that we will be able to select data by directly specifying the dimension name. The dimensions contain values called labels, which permits selecting by these labels rather than using indices. For example in array the second dimension has the name band and 9 labels associated with it: B02, B03, B04, B05, B06, B07, B08, B11, and B12.

Most of the other coordinates can be seen as metadata gathered from the STAC Assets: the CRS, some information about the satellite, the cloud cover, full names for the bands... This is one of the advantage of xarray: we can keep the relevant metadata as we perform calculations on the underlying numeric values.

The second colum in the coordinate table can be useful. It indicates to which dimension the coordinate relates to. As an obvious example let's consider the title coordinate. This coordinate refers to the full human-readable names of the bands. The row has (band) in the second column. This means that the title coordinate contains an array of the same size as the band dimension, and the values it contains refer to the same position.

In the following cell we can see that the title coordinate is itself an xarray with a single dimension : band.

On the other hand, most of the metadata we have in Coordinates has nothing specified for the second column. It simply means that there are no dimensions associated with these properties. Indeed values like orbit or proj:epsg don't have any dimensional aspect, they're just properties of the whole array.

In [246]:
array.title
Out [246]:

Looking at the x and y dimensions

It is fairly obvious that the band dimension contains the labels for the bands: B02, B03 and so on. Similarly, the time dimension contains the date label (just one in this case). However the nature of the labels of the x and y dimensions is not necessarily obvious. These labels are the spatial coordinates for the pixels in the relevant CRS (in this case EPSG:32631, as specified by the epsg coordinate). By default the coordinates are for the top-left corner of the pixel, but it is possible to choose the center as the reference by using the xy_coords argument for stackstac.stack.

Note: Since we chose 10 meter as our common resolution, it makes sense that these geometric coordinates are incremented by 10 with every subsequent value.

In [247]:
display(array.x)
display(array.y)
Out [247]:

Here's how to understand the x and y dimension labels with an example. The third label of the dimension x is 570510, and the second label for y is 4841090. This means that the top-left corner of the pixel located at (3, 2) in the array (i.e. third column from the left, and second row from the top) maps to the geometric point with coordinates (570510, 4841090) in EPSG:32631.

Indexes and Attributes

The Indexes are simply the values associated with each dimension, available as a PandasIndex object. They can be useful in some situations, but those uses will not be covered here.

Finally the Attributes section contain some information about our imagery: the CRS, the affine transformation which links the spatial coordinates with row/column coordinates in the array, and the resolution. These Attributes would be used and modified if we were to reproject or resample our data.

Indexing and selecting

When deciding to filter and select data in an xarray there are two choices to make:

  1. Are we we referring to the dimension by its name or by its position?
  2. Are we using the dimension labels are simply the positional indices?

Since these two choices are independent, this leads to four different ways to select data in an xarray, summarized in the table below:

Dimension lookup Index lookup Syntax
Positional By integer array[:, 0]
Positional By label array.loc[:, 'B02'] or
array.loc[dict(band='B02')]
By name By integer array.isel(band=0)
By name By label array.sel(band='B02')
In [248]:
(
    (array[:,0]).equals(array.loc[:,'B02']) and
    (array[:,0]).equals(array.isel(band=0)) and
    (array[:,0]).equals(array.sel(band='B02'))
)
Out [248]:
True

In most cases, the sel method is the most convenient, so we will favor it. isel is useful in situations where the integer index has a meaning. For example if we want to manipulate our current array as if it were a regular array sel would be interesting to specify row/column values instead of spatial coordinates.

Note: The two methods which use brackets have one advantage compared to sel and isel: they allow assigning values from a selection.

In [249]:
array2 = array.copy(deep=True)
print(f"Value before assignment: {array2.loc[dict(band='B02', x=570490, y=4841100)].values}")
array2.loc[dict(band='B02', x=570490, y=4841100)] = 0
print(f"Value after assignment: {array2.loc[dict(band='B02', x=570490, y=4841100)].values}")

# If you uncomment the following line and try to execute, it will fail
# Indeed sel() and isel() are function calls, so they don't support assignment.
#array2.sel(band='B02', x=570490, y=4841100) = 0
Out [249]:
Value before assignment: [104]
Value after assignment: [0]

Single values

The simplest use case is to select with a single value per dimension. Note that it is possible to chain multiple selections

In [250]:
filtered = array.sel(band='B02').isel(x=658)
display(filtered)
filtered.dims
Out [250]:
Out [250]:
('time', 'y')

It is interesting to note that after selecting with a unique value for band and x, they still appear in coordinates but they are no longer dimensions.

Selecting multiple values

If we want to select a few values for a dimension, we can use a boolean mask with isin:

In [251]:
mask_x = array.band.isin(['B02', 'B04', 
                        'this label does not exist',
                        'but it doesn matter'])
display(mask_x)
Out [251]:
In [252]:
filtered = array.sel(band=mask_x)
display(filtered)
filtered.dims
Out [252]:
Out [252]:
('time', 'band', 'y', 'x')

This time the dimensions are kept as multiple values still exist for the dimension.

Using boolean masks for more arbitrary conditions

Boolean masks are very powerful as they allow performing very diverse filtering operations. The only issue is that in a single mask you must restrict yourself to a single dimension. But it is possible to use multiple masks at once

In [253]:
mask_x = (array.x%20 == 0) & (array.x%30 == 0) | (array.x < 578340)
mask_y = (array.y > 4840000)
array.sel(x=mask_x, y=mask_y)
Out [253]:

Select in a range and slices

Although it is possible to do it with a boolean mask of two conditions, selecting in a range (spatially or a range of dates) is a fairly typical use case, so more dedicated tools exist. We use Python slice objects. What follows is a small introduction to slicing if you're not familiar with that concept.

Refresher on slicing

In Python there are data types called sequences. Lists and arrays are examples of sequences. One characteristic of sequences is the ability to access elements in a variety of ways. The most basic way is to just access an element

In [254]:
a = list(chr(i+97) for i in range(10))
print(f"initial list: {a=}\n")
      
print("accessing with just an index")
print(f"{a[3]=}, {a[-1]=}, {a[-2]=}\n") # negative values start from the end
Out [254]:
initial list: a=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

accessing with just an index
a[3]='d', a[-1]='j', a[-2]='i'

But we can also use a very similar syntax to select and create sub-lists:

In [255]:
print("accessing a range")
print(f"{a[2:5]=}\n")

print("accessing a range and leaving some arguments blank")
print(f"{a[:5]=}") # equivalent to a[0:5]
print(f"{a[3:]=}\n") # equivalent to a[3:-1]

print("accessing with a step")
print(f"{a[::2]=}") # every two elements
print(f"{a[::-2]=}") # every two elements in reverse
Out [255]:
accessing a range
a[2:5]=['c', 'd', 'e']

accessing a range and leaving some arguments blank
a[:5]=['a', 'b', 'c', 'd', 'e']
a[3:]=['d', 'e', 'f', 'g', 'h', 'i', 'j']

accessing with a step
a[::2]=['a', 'c', 'e', 'g', 'i']
a[::-2]=['j', 'h', 'f', 'd', 'b']

This kind of slicing is fairly easy to understand. But sometimes writing all of this manually can be troublesome, especially if we want to slice multiple sequences in the same way, or if we want to consider a way to slice independently of a specific object. This is what slice objects are for. They allow to turn this x:y:z syntax into an object we can reuse. The syntax of slice is: slice(start, stop, step). Only stop is mandatory, start and stop are optional and will be replaced to None if not specified. Here are some example of slicing the same list with slice objects:

In [256]:
a = list(chr(i+97) for i in range(10))
print(f"initial list: {a=}\n")

print("accessing a range")
print(f"{a[slice(2,5)]=}\n")

print("accessing a range with just a stop value")
print(f"{a[slice(5)]=} \n") #equivalent to a[slice(0,5)]
print("accessing a range with just a start value")
print(f"{a[slice(3, None)]=}\n") #equivalent to a[slice(3,-1)]

print("accessing with a step")
print(f"{a[slice(None, None, 2)]=}") #every two elements
print(f"{a[slice(None, None, -2)]=}") #every two elements in reverse
Out [256]:
initial list: a=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

accessing a range
a[slice(2,5)]=['c', 'd', 'e']

accessing a range with just a stop value
a[slice(5)]=['a', 'b', 'c', 'd', 'e'] 

accessing a range with just a start value
a[slice(3, None)]=['d', 'e', 'f', 'g', 'h', 'i', 'j']

accessing with a step
a[slice(None, None, 2)]=['a', 'c', 'e', 'g', 'i']
a[slice(None, None, -2)]=['j', 'h', 'f', 'd', 'b']

Of course it makes little sense to write a[slice(x,y,z)] instead of a[x:y:z]. Here's a more typical if simplistic reason to use slice objects:

In [257]:
letters = list(chr(i+97) for i in range(10))
letter_numbers = list(i for i in range(10))
print(f"{letters=}")
print(f"{letter_numbers=}\n")

print("let's slice both lists with the same object")
s = slice(3,8,2)
print(f"{letters[s]=}")
print(f"{letter_numbers[s]=}")
Out [257]:
letters=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
letter_numbers=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]

let's slice both lists with the same object
letters[s]=['d', 'f', 'h']
letter_numbers[s]=[3, 5, 7]

Using slice objects with sel or isel

Selecting with with a slice object is fairly straight-forward:

In [258]:
array.sel(x=slice(580000, 580100), y=slice(None, None, 500))
Out [258]:

However there are three important details to keep in mind.

  • Contrary to slicing with lists or NumPy arrays, both bounds are included. This means that slice(a,b) also includes b
  • When we are using sel, the slice arguments we use are values, not indices.
  • Despite the previous point, the selection is still operated on indices This means that when writing sel(x=slice(a,b)), the operation is equivalent to the following:
    1. Find the index with value a. Let's name it id_a
    2. Find the index with value b. Let's name it id_b
    3. If id_a > id_b then return nothing
    4. Else return all the values with indices between id_a and id_b (bounds included).

These quirks can often show when dealing with unsorted dimensions. What follows is a small example with an unsorted dimension x.

In [259]:
x = np.array([3, 1, 2, 7, 6, 4, 5, 0])  # Values for the first dimension (not sorted)
y = np.arange(3)  # Values for the second dimension


data = np.random.rand(len(x), len(y))  # Random data values

# Create the xarray dataset
ds = xr.DataArray(data, coords={'x': x, 'y': y}, dims=['x', 'y'])

ds
Out [259]:
In [260]:
display(ds.sel(x=slice(1,4)).x) #still includes 7 and 6 since indices are used
display(ds.sel(x=slice(5,7)).x) #nothing is selected since 5 comes after 7 in the order
ds.sel(x=slice(7,5)).x #this selects correctly
Out [260]:

This somewhat suprising behavior is part of the reason why using unsorted dimensions is usually discouraged. Similarly, dimensions don't necessarily need to have unique values, but this also leads to confusing behavior. Fortunately in remote sensing we rarely if ever encounter these situations.

A quick introduction to xarray.where

This section aims to show a few uses for Xarryay's where function. This function is a useful tool in many situations. It enables selecting data from arrays/xarrays based on a boolean condition. Its signature is the following xarray.where(cond, x, y):

  • cond is the boolean condition. It can be a scalar, an array or an xarray
  • x values to choose from where cond is True
  • y values to choose from where cond is False
In [261]:
a = np.arange(10)
print(a)
xr.where(a>5, 0, a)
Out [261]:
[0 1 2 3 4 5 6 7 8 9]
Out [261]:
array([0, 1, 2, 3, 4, 5, 0, 0, 0, 0])

This first exemple is trivial, but it captures the essence of the where function. The complexity lies in writing the right boolean condition, and understand the shapes of the objects involved.

When we created the xarray with stackstac.stack we specified a fill value to be used if data was missing. We can use where in combination with sum in order to check how many, if any, pixels were filled with that value.

In [262]:
xr.where(array == FILL_VALUE, 1, 0).sum().values
Out [262]:
array(0)

One last thing about where is that it allows use to create a modified array fairly easily

In [263]:
xr.where(array>5000, 5000, array) #useful for clipping values before creating visuals
array2 = xr.where(array.band=='B02', 0, array)
array2.sel(band='B02').values
Out [263]:
array([[[0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        ...,
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0],
        [0, 0, 0, ..., 0, 0, 0]]], dtype=uint16)

Accessing values, loading and compute

Xarray has a lazy approach to computing when it comes to data. This means that the multi-dimensional array is not loaded into memory unless it is expressly needed. In this notebook we've learned about dimensions and coordinates. We've also seen how to filter an xarray. However, as long as we work on the dimensions or coordinates, there was no need to access the data itself, i.e. the underlying multi-dimensional array.

In fact, in this notebook there are only 2 situations in which we've had to access the multi-dimensional array:

  1. When we accessed it directly by using the values method. The first time was in this cell
In [264]:
b = array.compute()
b
Out [264]:
In [265]:
bs = array.band.isin(['B02', 'B03', 'B04'])
ds_bands = array.sel(band=bs).isel(time=0).compute()
ds_bands
Out [265]:

Plotting

In [266]:
ds_bands.plot.hist(bins=30)
Out [266]:
(array([7.623534e+06, 4.882460e+06, 1.447367e+06, 4.906030e+05,
        2.281520e+05, 1.573650e+05, 1.362690e+05, 1.238380e+05,
        8.969200e+04, 6.750300e+04, 5.696000e+04, 6.432700e+04,
        7.658800e+04, 5.876900e+04, 2.349100e+04, 4.804000e+03,
        4.620000e+02, 2.100000e+01, 6.000000e+00, 3.000000e+00,
        7.000000e+00, 3.000000e+00, 1.000000e+00, 1.000000e+00,
        0.000000e+00, 1.000000e+00, 0.000000e+00, 1.000000e+00,
        1.000000e+00, 1.000000e+00]),
 array([1.00000000e+00, 5.35366667e+02, 1.06973333e+03, 1.60410000e+03,
        2.13846667e+03, 2.67283333e+03, 3.20720000e+03, 3.74156667e+03,
        4.27593333e+03, 4.81030000e+03, 5.34466667e+03, 5.87903333e+03,
        6.41340000e+03, 6.94776667e+03, 7.48213333e+03, 8.01650000e+03,
        8.55086667e+03, 9.08523333e+03, 9.61960000e+03, 1.01539667e+04,
        1.06883333e+04, 1.12227000e+04, 1.17570667e+04, 1.22914333e+04,
        1.28258000e+04, 1.33601667e+04, 1.38945333e+04, 1.44289000e+04,
        1.49632667e+04, 1.54976333e+04, 1.60320000e+04]),
 )

TODO see with Kenji about easier aspect ratios

In [267]:
aspect = ds_bands.x.shape[0]/ds_bands.y.shape[0]
aspect
Out [267]:
0.7718146718146718

explain slice, put link to documentation

In [268]:
ds_bands
ds_bands.plot.imshow(robust=True)
# list ordre inverse
ds_bands.isel(band=slice(None, None, -1)).plot.imshow(robust=True, aspect=aspect, size=10)
Out [268]:
In [269]:
array[0:3:-1] array[slice(0,3,-1)]
Out [269]:
  Cell In[269], line 1
    array[0:3:-1] array[slice(0,3,-1)]
                  ^
SyntaxError: invalid syntax
import xarray as xr
quantile = ds_bands.quantile(.95)
array.max(['x','y'])
ds_bands = xr.where(ds_bands
import xarray as xr

# Create a sample DataArray
data = xr.DataArray([[1, 2], [3, 4]], dims=('dim1', 'dim2'))

# Check the existing dimensions
print(data.dims)  # Output: ('dim1', 'dim2')

# Convert 'dim1' dimension into a coordinate
data_with_coordinate = data.set_index(dim1='dim1')

print(data_with_coordinate)
('dim1', 'dim2')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[23], line 10
      7 print(data.dims)  # Output: ('dim1', 'dim2')
      9 # Convert 'dim1' dimension into a coordinate
---> 10 data_with_coordinate = data.set_index(dim1='dim1')
     12 print(data_with_coordinate)

File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/core/dataarray.py:2599, in DataArray.set_index(self, indexes, append, **indexes_kwargs)
   2539 def set_index(
   2540     self,
   2541     indexes: Mapping[Any, Hashable | Sequence[Hashable]] | None = None,
   2542     append: bool = False,
   2543     **indexes_kwargs: Hashable | Sequence[Hashable],
   2544 ) -> DataArray:
   2545     """Set DataArray (multi-)indexes using one or more existing
   2546     coordinates.
   2547 
   (...)
   2597     DataArray.set_xindex
   2598     """
-> 2599     ds = self._to_temp_dataset().set_index(indexes, append=append, **indexes_kwargs)
   2600     return self._from_temp_dataset(ds)

File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/core/dataset.py:4155, in Dataset.set_index(self, indexes, append, **indexes_kwargs)
   4153 invalid_vars = set(var_names) - set(self._variables)
   4154 if invalid_vars:
-> 4155     raise ValueError(
   4156         ", ".join([str(v) for v in invalid_vars])
   4157         + " variable(s) do not exist"
   4158     )
   4160 all_var_names.update(var_names)
   4161 drop_variables.update(var_names)

ValueError: dim1 variable(s) do not exist
type(ds_bands)
xarray.core.dataarray.DataArray
display(quantile)
ds_bands.max(['x', 'y'])
ds_bands.plot.imshow(rgb='band')
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[36], line 1
----> 1 ds_bands.plot.imshow(rgb='band')

File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/plot/accessor.py:430, in DataArrayPlotAccessor.imshow(self, *args, **kwargs)
    428 @functools.wraps(dataarray_plot.imshow)
    429 def imshow(self, *args, **kwargs) -> AxesImage:
--> 430     return dataarray_plot.imshow(self._da, *args, **kwargs)

File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:1520, in _plot2d.<locals>.newplotfunc(***failed resolving arguments***)
   1518     raise ValueError('The "rgb" keyword is only valid for imshow()')
   1519 elif rgb is not None and not imshow_rgb:
-> 1520     raise ValueError(
   1521         'The "rgb" keyword is only valid for imshow()'
   1522         "with a three-dimensional array (per facet)"
   1523     )
   1525 xlab, ylab = _infer_xy_labels(
   1526     darray=darray, x=x, y=y, imshow=imshow_rgb, rgb=rgb
   1527 )
   1529 xval = darray[xlab]

ValueError: The "rgb" keyword is only valid for imshow()with a three-dimensional array (per facet)
ds_bands.dims
('time', 'band', 'y', 'x')
from matplotlib import pyplot as plt
import numpy as np
values = values
mins = values.min(axis=(0,1))
maxs = values.max(axis=(0,1))

scaled_arr = (values - mins) / (maxs - mins)
plt.imshow(scaled_arr)


---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[11], line 3
      1 from matplotlib import pyplot as plt
      2 import numpy as np
----> 3 values = values
      4 mins = values.min(axis=(0,1))
      5 maxs = values.max(axis=(0,1))

NameError: name 'values' is not defined