Xarray
Xarray library, common use-cases and how they can be implemented easily.
Xarray is a library for working with labelled multi-dimensional arrays. Xarray is built on top of NumPy and Pandas. Xarray introduces labels in the form of dimensions, coordinates and attributes on top of raw NumPy-like arrays, which allows for a more intuitive, more concise, and less error-prone developer experience. The package includes a large and growing library of domain-agnostic functions for advanced analytics and visualization with these data structures.
A good overview of the Xarray library and why it’s a perfect fit for N-dimensional data (such as Tilebox time series datasets) can be found in the official Why Xarray? documentation page.
The Tilebox Python client provides access to your satellite data in the form of a xarray.Dataset. This brings a great number of benefits compared to custom Tilebox specific data structures such as:
An example dataset
To get an understanding of how Xarray works, a sample dataset is used, as it could be returned by Tilebox datasets.
This is a simple dataset that was generated to showcase some common Xarray use-cases. If you want to follow along, you can download the dataset as a NetCDF file. The Reading and writing files section explains how to save and load Xarray datasets to and from NetCDF files.
Here is a breakdown of the preceding output:
satellite_data
dataset contains different dimensions, coordinates and variablestime
dimension consists of 514 elements. This means there are 514 data points in the datasettime
dimension coordinate contains datetime values. This is the time when the data was measured. The*
mark shows that it’s a dimension coordinate. Dimension coordinates are used for label based indexing and alignment, it means you can use the time to access individual data points in the datasetingestion_time
non-dimension coordinate contains datetime values. This is the time when the data was ingested into the Tilebox database. Non-dimension coordinates are variables that contain coordinate data, but are not used for label based indexing and alignment. They can even be multidimensional- The dataset contains 28 variables
bands
variable contains integers, this variable tells you how many bands the data containssun_elevation
variable contains floating point values, this variable contains the sun elevation when the data was measured
Check out the xarray terminology overview to deepen your understanding of datasets, dimensions, coordinates, and variables.
The examples below showcase some of the most common use-cases for Xarray. Since the data is already loaded into memory, no more API requests are required, there is no difference between the sync and async Client in the examples below.
Accessing data in a dataset
Accessing by index
There a couple of different ways that you can access data in a dataset. The Xarray documentation provides a great overview of all those methods.
You can access the sun_elevation
variable:
You can see in the preceding output that the first sun elevation value is 44.19904463
, but the output is not just a plain
float
containing that value. Instead it’s an xarray.DataArray
object. This is because that way you can still
access the coordinates belonging to that value. To get the plain python object you can use the item()
method:
You can access coordinates in a similar manner. For datetime fields Xarray additionally offers a special dt
(datetime)
accessor, which you can use to format the time as a string:
Similarly you can also retrieve a whole dataset containing all variables and coordinates for a single data point in the
example dataset. For this Xarray offers the isel
method (it stands for index selection):
Subsets of data
You can also access a subsets of the data. Here are a couple of ways you can retrieve the first 3 and last 3 sun elevations.
Filtering data
Xarray also offers a convenient way of filtering a dataset based on a condition.
For example, you can filter the dataset to only look at sun elevation values taken by cloud cover 0
.
You can combine conditions, for example to filter for sun elevation values between 45
and 90
taken by cloud cover 0
:
Selecting data by value
You can use the values of a primary coordinate to index your dataset.
For example, you can access the data point taken at 2022-05-01T11:28:28.249000
:
When trying to access a value that is not in the dataset, a KeyError
is raised.
The method
parameter can be used to return the closest value instead of raising an error.
Indexing requires the coordinate values to be unique. If there are duplicated values, Xarray raises an error, because
it can not determine which value to return. An easy way to avoid this is to
drop_duplicates before indexing.
satellite_data = satellite_data.drop_duplicates("time")
Statistics
Xarray and NumPy offer a wide range of statistical functions that can be applied to a dataset or a DataArray. Here are a few examples:
You can also use many NumPy functions directly on a dataset or DataArray. For example, to find out how many bands
the data contains, you can use np.unique to
get all the unique values in the bands
data array.
Reading and writing files
Xarray also offers a convenient way to save and load datasets to and from files. This is especially useful if you want to share your data with others or if you want to persist your data for later use. Xarray supports a wide range of file formats, including NetCDF, Zarr, GRIB, and many more. For a full list of supported formats, please refer to the official documentation page.
You might need to install the netcdf4
package first. You can do this by running pip install netcdf4
.
Here is how you can save the example dataset to a NetCDF file:
It creates a file called example_satellite_data.nc
in the current directory. You can now load this file back
into memory:
In case you want to follow along with the examples in this section, you can download the example dataset as a NetCDF file here.
Further reading
This section only covered a few of the most common use-cases for Xarray. Xarray offers many more functions and features. For more information, please refer to the Xarray documentation or check out the Xarray Tutorials.
Some useful capability that this section did not cover include:
Was this page helpful?