Ingesting Data

You need to have write permission on the collection to be able to ingest data.

Check out the examples below for common scenarios of ingesting data into a collection.

Dataset schema

Tilebox Datasets are strongly typed. This means you can only ingest data that matches the schema of a dataset. The schema is defined during dataset creation time. The examples on this page assume that you have access to a Timeseries dataset that has the following schema:

MyCustomDataset schema

Check out the Creating a dataset guide for an example of how to create such a dataset.

MyCustomDataset schema

Field name	Type	Description
`time`	Timestamp	Timestamp of the data point. Required by the Timeseries dataset type.
`id`	UUID	Auto-generated UUID for each datapoint.
`ingestion_time`	Timestamp	Auto-generated timestamp for when the data point was ingested into the Tilebox API.
`value`	float64	A numeric measurement value.
`sensor`	string	A name of the sensor that generated the data point.
`precise_time`	Timestamp	A precise measurement time in nanosecond precision.
`sensor_history`	Array[float64]	The last few measurements of the sensor.

A full overview of available data types can be found in the here.

Once you’ve defined the schema and created a dataset, you can access it and create a collection to ingest data into.

from tilebox.datasets import Client

client = Client()
dataset = client.dataset("my_org.my_custom_dataset")
collection = dataset.get_or_create_collection("Measurements")

Preparing data for ingestion

Ingestion can be done either in Python or Go.

Python

collection.ingest supports a wide range of input types. Below is an example of using either a pandas.DataFrame or an xarray.Dataset as input.

pandas DataFrame

A pandas.DataFrame is a representation of two-dimensional, potentially heterogeneous tabular data. It’s a powerful tool for working with structured data, and Tilebox supports it as input for ingest. The example below shows how to construct a pandas.DataFrame from scratch, that matches the schema of the MyCustomDataset dataset and can be ingested into it.

import pandas as pd

data = pd.DataFrame({
    "time": [
      "2025-03-28T11:44:23Z",
      "2025-03-28T11:45:19Z",
    ],
    "value": [45.16, 273.15],
    "sensor": ["A", "B"],
    "precise_time": [
      "2025-03-28T11:44:23.345761444Z",
      "2025-03-28T11:45:19.128742312Z",
    ],
    "sensor_history": [
      [-12.15, 13.45, -8.2, 16.5, 45.16],
      [300.16, 280.12, 273.15],
    ],
})
print(data)

                   time   value sensor                    precise_time                      sensor_history
0  2025-03-28T11:44:23Z   45.16      A  2025-03-28T11:44:23.345761444Z  [-12.15, 13.45, -8.2, 16.5, 45.16]
1  2025-03-28T11:45:19Z  273.15      B  2025-03-28T11:45:19.128742312Z            [300.16, 280.12, 273.15]

Once you have the data ready in this format, you can ingest it into a collection.

# now that we have the data frame in the correct format
# we can ingest it into the Tilebox dataset
collection.ingest(data)

# To verify it now contains the 2 data points
print(collection.info())

Measurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:45:19.000 UTC] (2 data points)

You can now also head on over to the Tilebox Console and view the newly ingested data points there.

xarray Dataset

xarray.Dataset is the default format in which Tilebox Datasets returns data when querying data from a collection. Tilebox also supports it as input for ingestion. The example below shows how to construct an xarray.Dataset from scratch, that matches the schema of the MyCustomDataset dataset and can then be ingested into it. To learn more about xarray.Dataset, visit Tilebox dedicated Xarray documentation page.

import pandas as pd

data = xr.Dataset({
    "time": ("time", [
      "2025-03-28T11:46:13Z",
      "2025-03-28T11:46:54Z",
    ]),
    "value": ("time", [48.1, 290.12]),
    "sensor": ("time", ["A", "B"]),
    "precise_time": ("time", [
      "2025-03-28T11:46:13.345761444Z",
      "2025-03-28T11:46:54.128742312Z",
    ]),
    "sensor_history": (("time", "n_sensor_history"), [
      [13.45, -8.2, 16.5, 45.16, 48.1],
      [280.12, 273.15, 290.12, np.nan, np.nan],
    ]),
})
print(data)

<xarray.Dataset> Size: 504B
Dimensions:         (time: 2, n_sensor_history: 5)
Coordinates:
  * time            (time) <U20 160B '2025-03-28T11:46:13Z' '2025-03-28T11:46...
Dimensions without coordinates: n_sensor_history
Data variables:
    value           (time) float64 16B 48.1 290.1
    sensor          (time) <U1 8B 'A' 'B'
    precise_time    (time) <U30 240B '2025-03-28T11:46:13.345761444Z' '2025-0...
    sensor_history  (time, n_sensor_history) float64 80B 13.45 -8.2 ... nan nan

Array fields manifest in xarray using an extra dimension, in this case n_sensor_history. In case of different array sizes for each data point, remaining values are filled up with a fill value, depending on the dtype of the array. For float64 this is np.nan (not a number). Don’t worry - when ingesting data into a Tilebox dataset, Tilebox will automatically skip those padding fill values and not store them in the dataset.

Now that you have the xarray.Dataset in the correct format, you can ingest it into the Tilebox dataset collection.

collection = dataset.get_or_create_collection("OtherMeasurements")
collection.ingest(data)

# To verify it now contains the 2 data points
print(collection.info())

OtherMeasurements: [2025-03-28T11:46:13.000 UTC, 2025-03-28T11:46:54.000 UTC] (2 data points)

Go

Client.Datapoints.Ingest supports ingestion of data points in the form of a slice of protobuf messages.

Protobuf

Protobuf is Google’s language-neutral, platform-neutral, extensible mechanism for serializing structured data. More details on protobuf can be found in the protobuf section. In the example below, v1.Modis type has been generated using tilebox-generate as described in the protobuf section.

datapoints := []*v1.Modis{
  v1.Modis_builder{
    Time:        timestamppb.New(time.Now()),
    GranuleName: proto.String("Granule 1"),
  }.Build(),
  v1.Modis_builder{
    Time:        timestamppb.New(time.Now().Add(-5 * time.Hour)),
    GranuleName: proto.String("Past Granule 2"),
  }.Build(),
}

ingestResponse, err := client.Datapoints.Ingest(ctx,
    collectionID,
    &datapoints
    false,
)

Copying or moving data

Since ingest takes query’s output as input, you can easily copy or move data from one collection to another.

Copying data like this also works across datasets in case the dataset schemas are compatible.

src_collection = dataset.collection("Measurements")
data_to_copy = src_collection.query(temporal_extent=("2025-03-28", "2025-03-29"))

dest_collection = dataset.collection("OtherMeasurements")
dest_collection.ingest(data_to_copy)  # copy the data to the other collection

# To verify it now contains 4 datapoints (2 we ingested already, and 2 we copied just now)
print(dest_collection.info())

OtherMeasurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:46:54.000 UTC] (4 data points)

Automatic batching

Tilebox automatically batches the ingestion requests for you, so you don’t have to worry about the maximum request size.

Idempotency

Tilebox will auto-generate datapoint IDs based on the data of all its fields - except for the auto-generated ingestion_time, so ingesting the same data twice will result in the same ID being generated. By default, Tilebox will silently skip any data points that are duplicates of existing ones in a collection. This behavior is especially useful when implementing idempotent algorithms. That way, re-executions of certain ingestion tasks due to retries or other reasons will never result in duplicate data points. You can instead also request an error to be raised if any of the generated datapoint IDs already exist. This can be done by setting the allow_existing parameter to False.

data = pd.DataFrame({
    "time": [
      "2025-03-28T11:45:19Z",
    ],
    "value": [45.16],
    "sensor": ["A"],
    "precise_time": [
      "2025-03-28T11:44:23.345761444Z",
    ],
    "sensor_history": [
      [-12.15, 13.45, -8.2, 16.5, 45.16],
    ],
})

# we already ingested the same data point previously
collection.ingest(data, allow_existing=False)

# we can still ingest it, by setting allow_existing=True
# but the total number of datapoints will still be the same
# as before in that case, since it already exists and therefore
# will be skipped
collection.ingest(data, allow_existing=True)  # no-op

ArgumentError: found existing datapoints with same id, refusing to ingest with "allow_existing=false"

Ingestion from common file formats

Through the usage of xarray and pandas you can also easily ingest existing datasets available in file formats, such as CSV, Parquet, Feather and more. Check out the Ingestion from common file formats guide for examples of how to achieve this.

Geometries

Ingesting Geometries can traditionally be a bit tricky, especially when working with geometries that cross the antimeridian or cover a pole. Tilebox is designed to take away most of the friction involved in this, but it’s still recommended to follow the best practices for handling geometries.

Get Started

Datasets

Storage

Workflows

Dataset schema

Preparing data for ingestion

Python

pandas DataFrame

xarray Dataset

Go

Protobuf

Copying or moving data

Automatic batching

Idempotency

Ingestion from common file formats

Geometries

Get Started

Datasets

Storage

Workflows

​Dataset schema

​Preparing data for ingestion

​Python

​pandas DataFrame

​xarray Dataset

​Go

​Protobuf

​Copying or moving data

​Automatic batching

​Idempotency

​Ingestion from common file formats

​Geometries

Dataset schema

Preparing data for ingestion

Python

pandas DataFrame

xarray Dataset

Go

Protobuf

Copying or moving data

Automatic batching

Idempotency

Ingestion from common file formats

Geometries