Overview

This section provides a quick overview of the API for listing and accessing datasets.

MethodDescription
client.datasetsList all available datasets.
client.datasetAccess an individual dataset by its name.

You can create your own, custom datasets via the Tilebox Console.

Dataset types

Each dataset is of a specific type. Each dataset type comes with a set of required fields for each data point. The dataset type also determines the query capabilities for a dataset, e.g. whether a dataset supports time-based queries or additionally also spatially filtered queries.

To find out which fields are required for each dataset type check out the documentation for the available dataset types below.

Dataset specific fields

Additionally, each dataset has a set of fields that are specific to that dataset. Fields are defined during dataset creation. That way, all data points in a dataset are strongly typed and are validated during ingestion. The required fields of the dataset type, as well as the custom fields specific to each dataset together make up the dataset schema.

Once a dataset schema is defined, existing fields cannot be removed or edited as soon as data has been ingested into it. However, you can always add new fields to a dataset, since all fields are always optional.

The only exception to this rule are empty datasets. If you empty all collections in a dataset, you can freely edit the data schema, since no conflicts with existing data points can occur.

Field types

When defining the data schema, you can specify the type of each field. The following field types are supported.

Primitives

TypeDescriptionExample value
stringA string of characters of arbitrary length.Some string
int64A 64-bit signed integer.123
uint64A 64-bit unsigned integer.123
float64A 64-bit floating-point number.123.45
boolA boolean.true
bytesA sequence of arbitrary length bytes.0xAF1E28D4

Time

TypeDescriptionExample value
DurationA signed, fixed-length span of time represented as a count of seconds and fractions of seconds at nanosecond resolution. See Duration for more information.12s 345ms
TimestampA point in time, represented as seconds and fractions of seconds at nanosecond resolution in UTC Epoch time. See Timestamp for more information.2023-05-17T14:30:00Z

Identifier

TypeDescriptionExample value
UUIDA universally unique identifier (UUID).126a2531-c98d-4e06-815a-34bc5b1228cc

Geospatial

TypeDescriptionExample value
GeometryGeospatial geometries of type Point, LineString, Polygon or MultiPolygon.POLYGON ((12.3 -5.4, 12.5 -5.4, ...))

Arrays

Every type is also available as an array, allowing to ingest multiple values of the underlying type for each data point. The size of the array is flexible, and can be different for each data point.

Creating a dataset

You can create a dataset in Tilebox using the Tilebox Console. Check out the Creating a dataset guide for an example of how to achieve this.

Listing datasets

You can use your client instance to access the datasets available to you. To list all available datasets, use the datasets method of the client.

from tilebox.datasets import Client

client = Client()
datasets = client.datasets()
print(datasets)
Output
open_data:
    asf:
        ers_sar: European Remote Sensing Satellite (ERS) Synthetic Aperture Radar (SAR) Granules
    copernicus:
        landsat8_oli_tirs: Landsat-8 is part of the long-running Landsat programme ...
        sentinel1_sar: The Sentinel-1 mission is the European Radar Observatory for the ...
        sentinel2_msi: Sentinel-2 is equipped with an optical instrument payload that samples ...
        sentinel3_olci: OLCI (Ocean and Land Colour Instrument) is an optical instrument used to ...
        ...

Once you have your dataset object, you can use it to list the available collections for the dataset.

If you’re using an IDE or an interactive environment with auto-complete, you can use it on your client instance to discover the datasets available to you. Type client. and trigger auto-complete after the dot to do so.

Accessing a dataset

Each dataset has an automatically generated code name that can be used to access it. The code name is the name of the group, followed by a dot, followed by the dataset name. For example, the code name for the Sentinel-2 MSI dataset above, which is part of the open_data.copernicus group, the code name is open_data.copernicus.sentinel2_msi.

To access a dataset, use the dataset method of your client instance and pass the code name of the dataset as an argument.

from tilebox.datasets import Client

client = Client()
s2_msi_dataset = client.dataset("open_data.copernicus.sentinel2_msi")

Once you have your dataset object, you can use it to access available collections for the dataset.