# STAC access
import pystac_client
import planetary_computer
# dataframes
import pandas as pd
# xarrays
import xarray as xr
# library for turning STAC objects into xarrays
import stackstac
# visualization
from matplotlib import pyplot as plt
# miscellanous
import numpy as np
from IPython.display import display
-
Cresson Remi authoreddeda95fb
In this notebook we will cover basic manipulation of multi-band imagery with Xarray. This is the last step before reaching the most complex use case of multi-band and multi-date data. Fortunately the libraries and concepts we cover here will transfer to times series.
As a practical use case let's consider that we have identified the STAC Item we're interested in (see this notebook for a refresher), and we also have an area of interest defined as a bounding box.
aoi_bounds = (3.875107329166124, 43.48641456618909, 4.118824575734205, 43.71739887308995)
# retrieving the relevant STAC Item
catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
item = catalog.get_collection('sentinel-2-l2a').get_item("S2A_MSIL2A_20201213T104441_R008_T31TEJ_20201214T083443")
display(item)
assets = [asset.title for asset in item.assets.values()]
descriptions = pd.DataFrame(assets, columns=['Description'], index=pd.Series(item.assets.keys(), name='asset_key'))
descriptions
(Click to sort ascending) | Description (Click to sort ascending) | asset_key (Click to sort ascending) |
---|---|---|
0 | Aerosol optical thickness (AOT) | |
1 | Band 1 - Coastal aerosol - 60m | |
2 | Band 2 - Blue - 10m | |
3 | Band 3 - Green - 10m | |
4 | Band 4 - Red - 10m | |
5 | Band 5 - Vegetation red edge 1 - 20m | |
6 | Band 6 - Vegetation red edge 2 - 20m | |
7 | Band 7 - Vegetation red edge 3 - 20m | |
8 | Band 8 - NIR - 10m | |
9 | Band 9 - Water vapor - 60m | |
10 | Band 11 - SWIR (1.6) - 20m | |
11 | Band 12 - SWIR (2.2) - 20m | |
12 | Band 8A - Vegetation red edge 4 - 20m | |
13 | Scene classfication map (SCL) | |
14 | Water vapour (WVP) | |
15 | True color image | |
16 | Thumbnail | |
17 | SAFE manifest | |
18 | Granule metadata | |
19 | INSPIRE metadata | |
20 | Product metadata | |
21 | Datastrip metadata | |
22 | TileJSON with default rendering | |
23 | Rendered preview |
Creating an xarray from a STAC object
Now that we have our STAC Item we will use stackstac
in order to create a multi-dimensional array for our multi-band data. This library is very powerful and will handle in a single call a lot of small inconveniences: resampling, data types, CRS, etc.
The function stackstac.stack has quite a few parameters, but most values can often be left default. Here are the ones we will be using here:
assets
. We don't want all the assets in the Item, especially since some of them are just metadata and have no dimension. Let's select all the bands: B02 through B08, B11 and B12resolution
. As we can see in the table above, not all the bands have the same resolution. If we want them in a multi-dimensional array we need to choose a single resolution for all the bands. Here we choose 10 (for 10 meters), which means that the 20 and 60 meter bands will be resampled to 10 m. Theresampling
parameter can be added to specify which resampling method to use, but here the default value (nearest
) is what we want.dtype
. By default the data type isfloat64
. Sentinel-2 data is almost always distributed as 16 bit integers. Thus we will useuint16
which will save space and allow faster computations. See the note below for more details.fill_value
. This is the value to use if data is somehow mising. However the default valuenp.nan
(numpy's version of NaN i.e. 'not a number') is only valid for floating point numbers, not for integers. Here we will use . See the note below for more details.bounds_latlon
andbounds
. Only one of the two can be used. Here we use the former, which allows us to directly specify a bounding box as latitude and longitude. On the other handbounds
requires using the correct coordinate system as defined by theepsg
parameter if specified, or theproj:epsg
property on the STAC assets (and in this case all assets must have the same CRS) if theepsg
parameter is left as default.
Understanding data types for satellite imagery is not especially difficult. But as they rarely pose a problem it can be easy to overlook them. The following explanation will be focused on Sentinel-2 data. For other satellite images it can be very different so some attention is needed when transferring or writing code for this imagery. Moreover, even for Sentinel-2 data (for example) some things can differ depending on the data distributer (Theia, Planetary, etc.).
For the most part the Copernicus which manages the Sentinel mission offers 3 main products for Sentinel-2: Level-1B, Level-1C, and Level-2A. They correspond to increasingly processed and refined data. For the vast majority of uses and users, Level-2A is the right choice. As described here, Level-2A is reflectance data.
Reflectance is defined as the proportion of incident radiant energy that is reflected by a surface. It ranges from 0 (total absorption) to 1 (total reflectance). However for reasons of efficiency and convenience Sentinel-2 Level-2A is typically distributed as integers. The conversion is simply which results in integers values which range from to . This means that valid data will never fall between and . Thus we can safely choose as the fill value, and we should not be able to ever confuse valid and invalid data
bands = ['B02', 'B03', 'B04', 'B05', 'B06', 'B07', 'B08', 'B11', 'B12']
FILL_VALUE = 2**16-1
array = stackstac.stack(
item,
assets = bands,
resolution=10,
dtype="uint16",
fill_value=FILL_VALUE,
bounds_latlon=aoi_bounds,
)
#equivalent to display(array)
#when it's the last line in a cell
array
/home/quentin/miniconda3/envs/beyond/lib/python3.11/site-packages/stackstac/prepare.py:363: UserWarning: The argument 'infer_datetime_format' is deprecated and will be removed in a future version. A strict version of it is now the default, see https://pandas.pydata.org/pdeps/0004-consistent-to-datetime-parsing.html. You can safely remove this argument.
times = pd.to_datetime(
Xarray overview
Xarrays are multi-dimensional arrays. Some of the advantages over more traditonal arrays such as NumPy arrays include:
- they allow to seamlessly manipulate data without having to hold it in memory all at once,
- they allow for named dimensions,
- they provide support for holding various metadata alongside the numeric data.
Just like pandas dataframes and NumPy arrays, xarray does not do in-place processing. This means that most operations on an xarray do not modify the xarray, they return a new modified array instead.
Note: It may seem wasteful and/or slow to create new objects instead of modifying them. It is sometimes the case. However Xarray (and NumPy before it) was developped with this in mind so it is well optimized for this. In fact it is much more of a stylistic choice (i.e. how the code is written) than a fundamental inability to modify objects.
Xarrays display very nicely as the last lign in cell (or with a IPython.display.display
call). From this visualization we can already gather a lot of information about our data and xarrays in general.
The first table and accompanying cube visual essentially show the same information. First of all the table is separated into two columns: array and chunks. Arrays are divided into chunks for reasons of efficiency. If you need to perform a lot of calculations on big arrays, it can become necessary to choose an appropriate chunk size to allow for better computation
The rows of the table are fairly self-explanatory. The Bytes
row gives the size of the array and a single chunk. It is important to note that even if the size is given, it does not mean that the data has been loaded. In fact at this point in the code the xarray is created but the data has not been downloaded. But the size can still be calculated from the dimensions of the array and the data type.
The visual on the right give a spatial interpetation to the table data. The rectangular cuboid on the right is a representation of the 2 spatial dimensions as well as the 9 bands we selected. We can see that the spatial dimensions are separated into 3 rows and 2 columns. These smaller sections are the chunks. By looking at the table we can see that the chunks have spatial dimensions 1024 x 1024. Given our array of dimensions 1999 x 2590, the number of chunks is coherent.
The smaller square at the top-left represents the extra time dimension (of size 1 in this case since this is single-date data).
Coordinates and dimensions
Note: The name Coordinate can be confusing, especially since it does not inherently relate in any way to spatial coordinates of any kind. Xarrays are used in fields other than GIS, but the terminology still applies. Feel free to understand "coordinates" as "metadata which may be aligned along the dimensions of the array".
If we look at the Coordinates section we can see a lot of different rows. Feel free to explore these coordinates. You can see more about a coordinate by clicking the grey icon (stacked disks) to the right.
Here 4 of the rows have names in bold: time
, band
, x
and y
. These coordinates are also dimensions. They are named dimensions, which means that we will be able to select data by directly specifying the dimension name. The dimensions contain values called labels, which permits selecting by these labels rather than using indices. For example in array
the second dimension has the name band
and 9 labels associated with it: B02
, B03
, B04
, B05
, B06
, B07
, B08
, B11
, and B12
.
Most of the other coordinates can be seen as metadata gathered from the STAC Assets: the CRS, some information about the satellite, the cloud cover, full names for the bands... This is one of the advantage of xarray: we can keep the relevant metadata as we perform calculations on the underlying numeric values.
The second colum in the coordinate table can be useful. It indicates to which dimension the coordinate relates to. As an obvious example let's consider the title
coordinate. This coordinate refers to the full human-readable names of the bands. The row has (band)
in the second column. This means that the title
coordinate contains an array of the same size as the band
dimension, and the values it contains refer to the same position.
In the following cell we can see that the title
coordinate is itself an xarray with a single dimension : band
.
On the other hand, most of the metadata we have in Coordinates has nothing specified for the second column. It simply means that there are no dimensions associated with these properties. Indeed values like orbit
or proj:epsg
don't have any dimensional aspect, they're just properties of the whole array.
array.title
Looking at the x
and y
dimensions
It is fairly obvious that the band
dimension contains the labels for the bands: B02
, B03
and so on. Similarly, the time
dimension contains the date label (just one in this case). However the nature of the labels of the x
and y
dimensions is not necessarily obvious. These labels are the spatial coordinates for the pixels in the relevant CRS (in this case EPSG:32631, as specified by the epsg
coordinate). By default the coordinates are for the top-left corner of the pixel, but it is possible to choose the center as the reference by using the xy_coords
argument for stackstac.stack
.
Note: Since we chose 10 meter as our common resolution, it makes sense that these geometric coordinates are incremented by 10 with every subsequent value.
display(array.x)
display(array.y)
Here's how to understand the x
and y
dimension labels with an example. The third label of the dimension x
is 570510, and the second label for y
is 4841090. This means that the top-left corner of the pixel located at (3, 2) in the array (i.e. third column from the left, and second row from the top) maps to the geometric point with coordinates (570510, 4841090) in EPSG:32631.
Indexes and Attributes
The Indexes are simply the values associated with each dimension, available as a PandasIndex object. They can be useful in some situations, but those uses will not be covered here.
Finally the Attributes section contain some information about our imagery: the CRS, the affine transformation which links the spatial coordinates with row/column coordinates in the array, and the resolution. These Attributes would be used and modified if we were to reproject or resample our data.
Indexing and selecting
When deciding to filter and select data in an xarray there are two choices to make:
- Are we we referring to the dimension by its name or by its position?
- Are we using the dimension labels are simply the positional indices?
Since these two choices are independent, this leads to four different ways to select data in an xarray, summarized in the table below:
Dimension lookup | Index lookup | Syntax |
---|---|---|
Positional | By integer | array[:, 0] |
Positional | By label | array.loc[:, 'B02'] or array.loc[dict(band='B02')] |
By name | By integer | array.isel(band=0) |
By name | By label | array.sel(band='B02') |
(
(array[:,0]).equals(array.loc[:,'B02']) and
(array[:,0]).equals(array.isel(band=0)) and
(array[:,0]).equals(array.sel(band='B02'))
)
True
In most cases, the sel
method is the most convenient, so we will favor it. isel
is useful in situations where the integer index has a meaning. For example if we want to manipulate our current array as if it were a regular array sel
would be interesting to specify row/column values instead of spatial coordinates.
Note: The two methods which use brackets have one advantage compared to sel
and isel
: they allow assigning values from a selection.
array2 = array.copy(deep=True)
print(f"Value before assignment: {array2.loc[dict(band='B02', x=570490, y=4841100)].values}")
array2.loc[dict(band='B02', x=570490, y=4841100)] = 0
print(f"Value after assignment: {array2.loc[dict(band='B02', x=570490, y=4841100)].values}")
# If you uncomment the following line and try to execute, it will fail
# Indeed sel() and isel() are function calls, so they don't support assignment.
#array2.sel(band='B02', x=570490, y=4841100) = 0
Value before assignment: [104]
Value after assignment: [0]
Single values
The simplest use case is to select with a single value per dimension. Note that it is possible to chain multiple selections
filtered = array.sel(band='B02').isel(x=658)
display(filtered)
filtered.dims
('time', 'y')
It is interesting to note that after selecting with a unique value for band
and x
, they still appear in coordinates but they are no longer dimensions.
Selecting multiple values
If we want to select a few values for a dimension, we can use a boolean mask with isin
:
mask_x = array.band.isin(['B02', 'B04',
'this label does not exist',
'but it doesn matter'])
display(mask_x)
filtered = array.sel(band=mask_x)
display(filtered)
filtered.dims
('time', 'band', 'y', 'x')
This time the dimensions are kept as multiple values still exist for the dimension.
Using boolean masks for more arbitrary conditions
Boolean masks are very powerful as they allow performing very diverse filtering operations. The only issue is that in a single mask you must restrict yourself to a single dimension. But it is possible to use multiple masks at once
mask_x = (array.x%20 == 0) & (array.x%30 == 0) | (array.x < 578340)
mask_y = (array.y > 4840000)
array.sel(x=mask_x, y=mask_y)
Select in a range and slices
Although it is possible to do it with a boolean mask of two conditions, selecting in a range (spatially or a range of dates) is a fairly typical use case, so more dedicated tools exist. We use Python slice
objects. What follows is a small introduction to slicing if you're not familiar with that concept.
Refresher on slicing
In Python there are data types called sequences. Lists and arrays are examples of sequences. One characteristic of sequences is the ability to access elements in a variety of ways. The most basic way is to just access an element
a = list(chr(i+97) for i in range(10))
print(f"initial list: {a=}\n")
print("accessing with just an index")
print(f"{a[3]=}, {a[-1]=}, {a[-2]=}\n") # negative values start from the end
initial list: a=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
accessing with just an index
a[3]='d', a[-1]='j', a[-2]='i'
But we can also use a very similar syntax to select and create sub-lists:
print("accessing a range")
print(f"{a[2:5]=}\n")
print("accessing a range and leaving some arguments blank")
print(f"{a[:5]=}") # equivalent to a[0:5]
print(f"{a[3:]=}\n") # equivalent to a[3:-1]
print("accessing with a step")
print(f"{a[::2]=}") # every two elements
print(f"{a[::-2]=}") # every two elements in reverse
accessing a range
a[2:5]=['c', 'd', 'e']
accessing a range and leaving some arguments blank
a[:5]=['a', 'b', 'c', 'd', 'e']
a[3:]=['d', 'e', 'f', 'g', 'h', 'i', 'j']
accessing with a step
a[::2]=['a', 'c', 'e', 'g', 'i']
a[::-2]=['j', 'h', 'f', 'd', 'b']
This kind of slicing is fairly easy to understand. But sometimes writing all of this manually can be troublesome, especially if we want to slice multiple sequences in the same way, or if we want to consider a way to slice independently of a specific object. This is what slice
objects are for. They allow to turn this x:y:z
syntax into an object we can reuse. The syntax of slice is: slice(start, stop, step)
. Only stop
is mandatory, start
and stop
are optional and will be replaced to None
if not specified. Here are some example of slicing the same list with slice objects:
a = list(chr(i+97) for i in range(10))
print(f"initial list: {a=}\n")
print("accessing a range")
print(f"{a[slice(2,5)]=}\n")
print("accessing a range with just a stop value")
print(f"{a[slice(5)]=} \n") #equivalent to a[slice(0,5)]
print("accessing a range with just a start value")
print(f"{a[slice(3, None)]=}\n") #equivalent to a[slice(3,-1)]
print("accessing with a step")
print(f"{a[slice(None, None, 2)]=}") #every two elements
print(f"{a[slice(None, None, -2)]=}") #every two elements in reverse
initial list: a=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
accessing a range
a[slice(2,5)]=['c', 'd', 'e']
accessing a range with just a stop value
a[slice(5)]=['a', 'b', 'c', 'd', 'e']
accessing a range with just a start value
a[slice(3, None)]=['d', 'e', 'f', 'g', 'h', 'i', 'j']
accessing with a step
a[slice(None, None, 2)]=['a', 'c', 'e', 'g', 'i']
a[slice(None, None, -2)]=['j', 'h', 'f', 'd', 'b']
Of course it makes little sense to write a[slice(x,y,z)] instead of a[x:y:z]. Here's a more typical if simplistic reason to use slice
objects:
letters = list(chr(i+97) for i in range(10))
letter_numbers = list(i for i in range(10))
print(f"{letters=}")
print(f"{letter_numbers=}\n")
print("let's slice both lists with the same object")
s = slice(3,8,2)
print(f"{letters[s]=}")
print(f"{letter_numbers[s]=}")
letters=['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
letter_numbers=[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
let's slice both lists with the same object
letters[s]=['d', 'f', 'h']
letter_numbers[s]=[3, 5, 7]
Using slice objects with sel
or isel
Selecting with with a slice object is fairly straight-forward:
array.sel(x=slice(580000, 580100), y=slice(None, None, 500))
However there are three important details to keep in mind.
- Contrary to slicing with lists or NumPy arrays, both bounds are included. This means that
slice(a,b)
also includesb
- When we are using
sel
, the slice arguments we use are values, not indices. - Despite the previous point, the selection is still operated on indices
This means that when writing
sel(x=slice(a,b))
, the operation is equivalent to the following:- Find the index with value
a
. Let's name itid_a
- Find the index with value
b
. Let's name itid_b
- If
id_a
>id_b
then return nothing - Else return all the values with indices between
id_a
andid_b
(bounds included).
- Find the index with value
These quirks can often show when dealing with unsorted dimensions. What follows is a small example with an unsorted dimension x
.
x = np.array([3, 1, 2, 7, 6, 4, 5, 0]) # Values for the first dimension (not sorted)
y = np.arange(3) # Values for the second dimension
data = np.random.rand(len(x), len(y)) # Random data values
# Create the xarray dataset
ds = xr.DataArray(data, coords={'x': x, 'y': y}, dims=['x', 'y'])
ds
display(ds.sel(x=slice(1,4)).x) #still includes 7 and 6 since indices are used
display(ds.sel(x=slice(5,7)).x) #nothing is selected since 5 comes after 7 in the order
ds.sel(x=slice(7,5)).x #this selects correctly
This somewhat suprising behavior is part of the reason why using unsorted dimensions is usually discouraged. Similarly, dimensions don't necessarily need to have unique values, but this also leads to confusing behavior. Fortunately in remote sensing we rarely if ever encounter these situations.
A quick introduction to xarray.where
This section aims to show a few uses for Xarryay's where
function. This function is a useful tool in many situations. It enables selecting data from arrays/xarrays based on a boolean condition. Its signature is the following xarray.where(cond, x, y)
:
cond
is the boolean condition. It can be a scalar, an array or an xarrayx
values to choose from wherecond
is Truey
values to choose from wherecond
is False
a = np.arange(10)
print(a)
xr.where(a>5, 0, a)
[0 1 2 3 4 5 6 7 8 9]
array([0, 1, 2, 3, 4, 5, 0, 0, 0, 0])
This first exemple is trivial, but it captures the essence of the where
function. The complexity lies in writing the right boolean condition, and understand the shapes of the objects involved.
When we created the xarray with stackstac.stack we specified a fill value to be used if data was missing. We can use where
in combination with sum
in order to check how many, if any, pixels were filled with that value.
xr.where(array == FILL_VALUE, 1, 0).sum().values
array(0)
One last thing about where
is that it allows use to create a modified array fairly easily
xr.where(array>5000, 5000, array) #useful for clipping values before creating visuals
array2 = xr.where(array.band=='B02', 0, array)
array2.sel(band='B02').values
array([[[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]]], dtype=uint16)
Accessing values, loading and compute
Xarray has a lazy approach to computing when it comes to data. This means that the multi-dimensional array is not loaded into memory unless it is expressly needed. In this notebook we've learned about dimensions and coordinates. We've also seen how to filter an xarray. However, as long as we work on the dimensions or coordinates, there was no need to access the data itself, i.e. the underlying multi-dimensional array.
In fact, in this notebook there are only 2 situations in which we've had to access the multi-dimensional array:
- When we accessed it directly by using the
values
method. The first time was in this cell
b = array.compute()
b
bs = array.band.isin(['B02', 'B03', 'B04'])
ds_bands = array.sel(band=bs).isel(time=0).compute()
ds_bands
Plotting
ds_bands.plot.hist(bins=30)
(array([7.623534e+06, 4.882460e+06, 1.447367e+06, 4.906030e+05,
2.281520e+05, 1.573650e+05, 1.362690e+05, 1.238380e+05,
8.969200e+04, 6.750300e+04, 5.696000e+04, 6.432700e+04,
7.658800e+04, 5.876900e+04, 2.349100e+04, 4.804000e+03,
4.620000e+02, 2.100000e+01, 6.000000e+00, 3.000000e+00,
7.000000e+00, 3.000000e+00, 1.000000e+00, 1.000000e+00,
0.000000e+00, 1.000000e+00, 0.000000e+00, 1.000000e+00,
1.000000e+00, 1.000000e+00]),
array([1.00000000e+00, 5.35366667e+02, 1.06973333e+03, 1.60410000e+03,
2.13846667e+03, 2.67283333e+03, 3.20720000e+03, 3.74156667e+03,
4.27593333e+03, 4.81030000e+03, 5.34466667e+03, 5.87903333e+03,
6.41340000e+03, 6.94776667e+03, 7.48213333e+03, 8.01650000e+03,
8.55086667e+03, 9.08523333e+03, 9.61960000e+03, 1.01539667e+04,
1.06883333e+04, 1.12227000e+04, 1.17570667e+04, 1.22914333e+04,
1.28258000e+04, 1.33601667e+04, 1.38945333e+04, 1.44289000e+04,
1.49632667e+04, 1.54976333e+04, 1.60320000e+04]),
)
TODO see with Kenji about easier aspect ratios
aspect = ds_bands.x.shape[0]/ds_bands.y.shape[0]
aspect
0.7718146718146718
explain slice, put link to documentation
ds_bands
ds_bands.plot.imshow(robust=True)
# list ordre inverse
ds_bands.isel(band=slice(None, None, -1)).plot.imshow(robust=True, aspect=aspect, size=10)
array[0:3:-1] array[slice(0,3,-1)]
Cell In[269], line 1
array[0:3:-1] array[slice(0,3,-1)]
^
SyntaxError: invalid syntax
import xarray as xr
quantile = ds_bands.quantile(.95)
array.max(['x','y'])
ds_bands = xr.where(ds_bands
import xarray as xr
# Create a sample DataArray
data = xr.DataArray([[1, 2], [3, 4]], dims=('dim1', 'dim2'))
# Check the existing dimensions
print(data.dims) # Output: ('dim1', 'dim2')
# Convert 'dim1' dimension into a coordinate
data_with_coordinate = data.set_index(dim1='dim1')
print(data_with_coordinate)
('dim1', 'dim2')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[23], line 10
7 print(data.dims) # Output: ('dim1', 'dim2')
9 # Convert 'dim1' dimension into a coordinate
---> 10 data_with_coordinate = data.set_index(dim1='dim1')
12 print(data_with_coordinate)
File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/core/dataarray.py:2599, in DataArray.set_index(self, indexes, append, **indexes_kwargs)
2539 def set_index(
2540 self,
2541 indexes: Mapping[Any, Hashable | Sequence[Hashable]] | None = None,
2542 append: bool = False,
2543 **indexes_kwargs: Hashable | Sequence[Hashable],
2544 ) -> DataArray:
2545 """Set DataArray (multi-)indexes using one or more existing
2546 coordinates.
2547
(...)
2597 DataArray.set_xindex
2598 """
-> 2599 ds = self._to_temp_dataset().set_index(indexes, append=append, **indexes_kwargs)
2600 return self._from_temp_dataset(ds)
File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/core/dataset.py:4155, in Dataset.set_index(self, indexes, append, **indexes_kwargs)
4153 invalid_vars = set(var_names) - set(self._variables)
4154 if invalid_vars:
-> 4155 raise ValueError(
4156 ", ".join([str(v) for v in invalid_vars])
4157 + " variable(s) do not exist"
4158 )
4160 all_var_names.update(var_names)
4161 drop_variables.update(var_names)
ValueError: dim1 variable(s) do not exist
type(ds_bands)
xarray.core.dataarray.DataArray
display(quantile)
ds_bands.max(['x', 'y'])
ds_bands.plot.imshow(rgb='band')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[36], line 1
----> 1 ds_bands.plot.imshow(rgb='band')
File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/plot/accessor.py:430, in DataArrayPlotAccessor.imshow(self, *args, **kwargs)
428 @functools.wraps(dataarray_plot.imshow)
429 def imshow(self, *args, **kwargs) -> AxesImage:
--> 430 return dataarray_plot.imshow(self._da, *args, **kwargs)
File ~/miniconda3/envs/beyond/lib/python3.11/site-packages/xarray/plot/dataarray_plot.py:1520, in _plot2d.<locals>.newplotfunc(***failed resolving arguments***)
1518 raise ValueError('The "rgb" keyword is only valid for imshow()')
1519 elif rgb is not None and not imshow_rgb:
-> 1520 raise ValueError(
1521 'The "rgb" keyword is only valid for imshow()'
1522 "with a three-dimensional array (per facet)"
1523 )
1525 xlab, ylab = _infer_xy_labels(
1526 darray=darray, x=x, y=y, imshow=imshow_rgb, rgb=rgb
1527 )
1529 xval = darray[xlab]
ValueError: The "rgb" keyword is only valid for imshow()with a three-dimensional array (per facet)
ds_bands.dims
('time', 'band', 'y', 'x')
from matplotlib import pyplot as plt
import numpy as np
values = values
mins = values.min(axis=(0,1))
maxs = values.max(axis=(0,1))
scaled_arr = (values - mins) / (maxs - mins)
plt.imshow(scaled_arr)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[11], line 3
1 from matplotlib import pyplot as plt
2 import numpy as np
----> 3 values = values
4 mins = values.min(axis=(0,1))
5 maxs = values.max(axis=(0,1))
NameError: name 'values' is not defined