Querying data
Learn how to query and load data from Tilebox datasets.
Overview
This section provides an overview of the API for loading data from a collection. It includes usage examples for many common scenarios.
Method | Description |
---|---|
collection.load | Query data points from a collection. |
collection.find | Find a specific datapoint in a collection by its id. |
Check out the examples below for common scenarios when loading data from collections.
To load data points from a dataset collection, use the load method. It requires a time_or_interval
parameter to specify the time or time interval for loading.
Filtering by time
Time interval
To load data for a specific time interval, use a tuple
in the form (start, end)
as the time_or_interval
parameter. Both start
and end
must be TimeScalars, which can be datetime
objects or strings in ISO 8601 format.
The show_progress
parameter is optional and can be used to display a tqdm progress bar while loading data.
A time interval specified as a tuple is interpreted as a half-closed interval. This means the start time is inclusive, and the end time is exclusive. For instance, using an end time of 2023-01-01
includes data points up to 2022-12-31 23:59:59.999
, but excludes those from 2023-01-01 00:00:00.000
. This behavior mimics the Python range
function and is useful for chaining time intervals.
Above example demonstrates how to split a large time interval into smaller chunks while loading data in separate requests. Typically, this is not necessary as the datasets client auto-paginates large intervals.
Endpoint inclusivity
For greater control over inclusivity of start and end times, you can use the TimeInterval
dataclass instead of a tuple of two TimeScalars. This class allows you to specify the start
and end
times, as well as their inclusivity. Here’s an example of creating equivalent TimeInterval
objects in two different ways.
Time scalars
You can load all datapoints linked to a specific timestamp by specifying a TimeScalar
as the time query argument. A TimeScalar
can be a datetime
object or a string in ISO 8601 format. When passed to the load
method, it retrieves all data points matching exactly that specified time, with a millisecond precision.
A collection may contain multiple datapoints for one millisecond, so multiple data points could still be returned. If you want to fetch only a single data point, query the collection by id instead.
Here’s how to load a data point at a specific millisecond from a collection.
Tilebox uses millisecond precision for timestamps. To load all data points for a specific second, it’s a time interval request. Refer to the examples below for details.
The output of the load
method is an xarray.Dataset
object. To learn more about Xarray, visit the dedicated Xarray page.
Time iterables
You can specify a time interval by using an iterable of TimeScalar
s as the time_or_interval
parameter. This is especially useful when you want to use the output of a previous load
call as input for another load. Here’s how that works.
This feature works by constructing a TimeInterval
object from the first and last elements of the iterable, making both the start and end time inclusive.
Timezones
All TimeScalars
specified as a string are treated as UTC if they do not include a timezone suffix. If you want to query data for a specific time or time range
in another timezone, it’s recommended to use a datetime
object. In this case, the Tilebox API will convert the datetime to UTC
before making the request.
The output will always contain UTC timestamps, which will need to be converted again if a different timezone is required.
Filtering by area of interest
Spatio-temporal also come with spatial filtering capabilities. When querying, you can no longer specify a time interval, but additionally also specify a bounding box or a polygon as an area of interest to filter by.
Spatio-temporal datasets - including spatial filtering capabilities - are currently in development and not available yet. Stay tuned for updates!
Fetching only metadata
Sometimes, it may be useful to load only dataset metadata fields without the actual data fields. This can be done by setting the skip_data
parameter to True
.
For example, when only checking if a datapoint exists, you may want to use skip_data=True
to avoid loading the data fields.
If this flag is set, the response will only include the required fields for the given dataset type, but no additional custom data fields.
Empty response
The load
method always returns an xarray.Dataset
object, even if there are no data points for the specified query. In such cases, the returned dataset will be empty, but no error will be raised.
By datapoint ID
If you know the ID of the data point you want to load, you can use collection.find.
This method always returns a single data point or raises an exception if no data point with the specified ID exists.
Since find
returns only a single data point, the output dataset does not include a time
dimension.
You can also set the skip_data
parameter when calling find
to load only the metadata of the data point, same as for load
.
Possible errors
NotFoundError
: raised if no data point with the given ID is found in the collectionValueError
: raised if the specifieddatapoint_id
is not a valid UUID
Was this page helpful?