
An overview of the Xarray library and its suitability for N-dimensional data (such as Tilebox time series datasets) is available in the official Why Xarray? documentation page.
Familiarity
Familiarity
Xarray is based on NumPy and Pandas—two of the most widely used Python libraries for scientific computing. Familiarity with these libraries translates well to using Xarray.
Performance
Performance
Leveraging NumPy, which is built on C and Fortran, Xarray benefits from extensive performance optimizations. This allows Xarray to efficiently handle large datasets.
Interoperability
Interoperability
As a widely used library, Xarray easily integrates with many other libraries. Many third-party libraries are also available to expand Xarray’s capabilities for diverse use cases.
Flexibility
Flexibility
Xarray is versatile and supports a broad range of applications. It’s also easy to extend with custom features.
Example dataset
To understand how Xarray functions, below is a quick a look at a sample dataset as it might be retrieved from a Tilebox datasets client.Output
This basic dataset illustrates common use cases for Xarray. To follow along, you can download the dataset as a NetCDF file. The Reading and writing files section explains how to save and load Xarray datasets from NetCDF files.
- The
satellite_data
dataset contains dimensions, coordinates, and variables. - The
time
dimension has 514 elements, indicating that there are 514 data points in the dataset. - The
time
dimension coordinate contains datetime values representing when the data was measured. The*
indicates a dimension coordinate, which enables label-based indexing and alignment. - The
ingestion_time
non-dimension coordinate holds datetime values for when the data was ingested into Tilebox. Non-dimension coordinates carry coordinate data but are not used for label-based indexing. They can even be multidimensional. - The dataset includes 28 variables.
- The
bands
variable contains integers indicating how many bands the data contains. - The
sun_elevation
variable contains floating-point values representing the sun’s elevation when the data was measured.
Explore the xarray terminology overview to broaden your understanding of datasets, dimensions, coordinates, and variables.
Accessing data in a dataset
By index
You can access data in different ways. The Xarray documentation offers a comprehensive overview of these methods. To access thesun_elevation
variable:
Accessing values
Output
44.19904463
. It appears as an xarray.DataArray
object to allow access to the corresponding coordinates. To retrieve the plain Python object, use the item()
method:
Accessing raw values
Output
dt
(datetime) accessor for formatting time as a string:
Accessing and formatting datetime fields
Output
isel
method (index selection):
Accessing a whole datapoint by index
Output
Subsets of data
You can access subsets of the data as well. Here are methods to retrieve the first three and last three sun elevations.Accessing raw values
Output
Filtering data
Xarray allows convenient filtering of datasets based on conditions. For example, filter a dataset to only include sun elevation values where cloud cover is0
:
Filtering data by sensor
Output
45
and 90
with cloud cover 0
:
Filtering data by cloud_cover and sun_elevation value
Output
Selecting data by value
You can use dimension coordinate values to index your dataset. For instance, access the data point recorded at2022-05-01T11:28:28.249000
:
Selecting data by value requires unique coordinate values. In case of duplicates, you will encounter an
InvalidIndexError
. To avoid this, you can drop duplicates.Indexing by time
Output
KeyError
.
Indexing by time (not found)
method
.
Finding the closest data point
Dropping duplicates
Xarray allows you to drop duplicate values from a dataset. For example, to drop duplicate timestamps:Dropping duplicates
Statistics
Xarray and NumPy include a wide range of statistical functions that you can apply to a dataset or DataArray. Here are some examples:Computing dataset statistics
Output
Finding unique values
Output
Reading and writing files
Xarray provides a simple method for saving and loading datasets from files. This is useful for sharing your data or storing it for future use. Xarray supports many different file formats, including NetCDF, Zarr, GRIB, and more. For a complete list of supported formats, refer to the official documentation page. To save the example dataset as a NetCDF file:You may need to install the
netcdf4
package first.Saving a dataset to a file
example_satellite_data.nc
in your current directory. You can then load this file back into memory with:
Loading a dataset from a file