> ## Documentation Index
> Fetch the complete documentation index at: https://docs.tilebox.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Ingesting Data

> Populate your dataset collections by defining schemas, preparing structured data points, and submitting them efficiently to Tilebox for storage and querying.

<Note>
  You need to have write permission on the collection to be able to ingest data.
</Note>

Check out the examples below for common scenarios of ingesting data into a collection.

## Dataset schema

Tilebox Datasets are strongly typed. This means you can only ingest data that matches the schema of a dataset. The schema is defined during dataset creation time.

The examples on this page assume that you have access to a [Timeseries dataset](/datasets/types/timeseries) that has the following schema:

<Accordion title="MyCustomDataset schema">
  <Tip>
    Check out the [Creating a dataset](/guides/datasets/create) guide for an example of how to create such a dataset.
  </Tip>

  **MyCustomDataset schema**

  | Field name       | Type                                                                                              | Description                                                                                         |
  | ---------------- | ------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------- |
  | `time`           | <span class="text-primary dark:text-primary-light font-mono font-semibold">Timestamp</span>       | Timestamp of the data point. Required by the [Timeseries dataset](/datasets/types/timeseries) type. |
  | `id`             | <span class="text-primary dark:text-primary-light font-mono font-semibold">UUID</span>            | Auto-generated UUID for each datapoint.                                                             |
  | `ingestion_time` | <span class="text-primary dark:text-primary-light font-mono font-semibold">Timestamp</span>       | Auto-generated timestamp for when the data point was ingested into the Tilebox API.                 |
  | `value`          | <span class="text-primary dark:text-primary-light font-mono font-semibold">float64</span>         | A numeric measurement value.                                                                        |
  | `sensor`         | <span class="text-primary dark:text-primary-light font-mono font-semibold">string</span>          | A name of the sensor that generated the data point.                                                 |
  | `precise_time`   | <span class="text-primary dark:text-primary-light font-mono font-semibold">Timestamp</span>       | A precise measurement time in nanosecond precision.                                                 |
  | `sensor_history` | <span class="text-primary dark:text-primary-light font-mono font-semibold">Array\[float64]</span> | The last few measurements of the sensor.                                                            |

  <Tip>
    A full overview of available data types can be found in the [here](/datasets/concepts/datasets#field-types).
  </Tip>
</Accordion>

Once you've defined the schema and created a dataset, you can access it and create a collection to ingest data into.

<CodeGroup>
  ```python Python theme={"system"}
  from tilebox.datasets import Client

  client = Client()
  dataset = client.dataset("my_org.my_custom_dataset")
  collection = dataset.get_or_create_collection("Measurements")
  ```

  ```go Go theme={"system"}
  package main

  import (
  	"context"
  	"log"

  	"github.com/tilebox/tilebox-go/datasets/v1"
  )

  func main() {
  	ctx := context.Background()
  	client := datasets.NewClient()

  	dataset, err := client.Datasets.Get(ctx, "my_org.my_custom_dataset")
  	if err != nil {
  		log.Fatalf("Failed to get dataset: %v", err)
  	}

  	collection, err := client.Collections.GetOrCreate(ctx, dataset.ID, "Measurements")
  	if err != nil {
  		log.Fatalf("Failed to get collection: %v", err)
  	}
  }
  ```
</CodeGroup>

## Preparing data for ingestion

Ingestion can be done either in Python or Go.

### Python

[`collection.ingest`](/api-reference/python/tilebox.datasets/Collection.ingest) supports a wide range of input types. Below is an example of using either a `pandas.DataFrame` or an `xarray.Dataset` as input.

#### pandas DataFrame

A [pandas.DataFrame](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html) is a representation of two-dimensional, potentially heterogeneous tabular data. It's a powerful tool for working with structured data, and Tilebox supports it as input for `ingest`.

The example below shows how to construct a `pandas.DataFrame` from scratch, that matches the schema of the `MyCustomDataset` dataset and can be ingested into it.

<CodeGroup>
  ```python Python theme={"system"}
  import pandas as pd

  data = pd.DataFrame({
      "time": [
        "2025-03-28T11:44:23Z",
        "2025-03-28T11:45:19Z",
      ],
      "value": [45.16, 273.15],
      "sensor": ["A", "B"],
      "precise_time": [
        "2025-03-28T11:44:23.345761444Z",
        "2025-03-28T11:45:19.128742312Z",
      ],
      "sensor_history": [
        [-12.15, 13.45, -8.2, 16.5, 45.16],
        [300.16, 280.12, 273.15],
      ],
  })
  print(data)
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
                     time   value sensor                    precise_time                      sensor_history
  0  2025-03-28T11:44:23Z   45.16      A  2025-03-28T11:44:23.345761444Z  [-12.15, 13.45, -8.2, 16.5, 45.16]
  1  2025-03-28T11:45:19Z  273.15      B  2025-03-28T11:45:19.128742312Z            [300.16, 280.12, 273.15]
  ```
</CodeGroup>

Once you have the data ready in this format, you can `ingest` it into a collection.

<CodeGroup>
  ```python Python theme={"system"}
  # now that we have the data frame in the correct format
  # we can ingest it into the Tilebox dataset
  collection.ingest(data)

  # To verify it now contains the 2 data points
  print(collection.info())
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
  Measurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:45:19.000 UTC] (2 data points)
  ```
</CodeGroup>

<Tip>
  You can now also head on over to the [Tilebox Console](/console) and view the newly ingested data points there.
</Tip>

#### xarray Dataset

[`xarray.Dataset`](/sdks/python/xarray) is the default format in which Tilebox Datasets returns data when
[querying data](/datasets/query/querying-data) from a collection.
Tilebox also supports it as input for ingestion. The example below shows how to construct an `xarray.Dataset`
from scratch, that matches the schema of the `MyCustomDataset` dataset and can then be ingested into it.
To learn more about `xarray.Dataset`, visit Tilebox dedicated [Xarray documentation page](/sdks/python/xarray).

<CodeGroup>
  ```python Python theme={"system"}
  import pandas as pd

  data = xr.Dataset({
      "time": ("time", [
        "2025-03-28T11:46:13Z",
        "2025-03-28T11:46:54Z",
      ]),
      "value": ("time", [48.1, 290.12]),
      "sensor": ("time", ["A", "B"]),
      "precise_time": ("time", [
        "2025-03-28T11:46:13.345761444Z",
        "2025-03-28T11:46:54.128742312Z",
      ]),
      "sensor_history": (("time", "n_sensor_history"), [
        [13.45, -8.2, 16.5, 45.16, 48.1],
        [280.12, 273.15, 290.12, np.nan, np.nan],
      ]),
  })
  print(data)
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
  <xarray.Dataset> Size: 504B
  Dimensions:         (time: 2, n_sensor_history: 5)
  Coordinates:
    * time            (time) <U20 160B '2025-03-28T11:46:13Z' '2025-03-28T11:46...
  Dimensions without coordinates: n_sensor_history
  Data variables:
      value           (time) float64 16B 48.1 290.1
      sensor          (time) <U1 8B 'A' 'B'
      precise_time    (time) <U30 240B '2025-03-28T11:46:13.345761444Z' '2025-0...
      sensor_history  (time, n_sensor_history) float64 80B 13.45 -8.2 ... nan nan
  ```
</CodeGroup>

<Note>
  Array fields manifest in xarray using an extra dimension, in this case `n_sensor_history`. In case
  of different array sizes for each data point, remaining values are filled up with a fill value, depending on the
  `dtype` of the array. For `float64` this is `np.nan` (not a number).
  Don't worry - when ingesting data into a Tilebox dataset, Tilebox will automatically skip those padding fill values
  and not store them in the dataset.
</Note>

Now that you have the `xarray.Dataset` in the correct format, you can ingest it into the Tilebox dataset collection.

<CodeGroup>
  ```python Python theme={"system"}
  collection = dataset.get_or_create_collection("OtherMeasurements")
  collection.ingest(data)

  # To verify it now contains the 2 data points
  print(collection.info())
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
  OtherMeasurements: [2025-03-28T11:46:13.000 UTC, 2025-03-28T11:46:54.000 UTC] (2 data points)
  ```
</CodeGroup>

### Go

[`Client.Datapoints.Ingest`](/api-reference/go/datasets/Datapoints.Ingest) supports ingestion of data points in the form of a slice of protobuf messages.

#### Protobuf

Protobuf is Google's language-neutral, platform-neutral, extensible mechanism for serializing structured data.

More details on protobuf can be found in the [protobuf section](/sdks/go/protobuf).

In the example below, `v1.Modis` type has been generated using [tilebox-generate](https://github.com/tilebox/tilebox-generate) as described in the [protobuf section](/sdks/go/protobuf).

```go Go theme={"system"}
datapoints := []*v1.Modis{
  v1.Modis_builder{
    Time:        timestamppb.New(time.Now()),
    GranuleName: proto.String("Granule 1"),
  }.Build(),
  v1.Modis_builder{
    Time:        timestamppb.New(time.Now().Add(-5 * time.Hour)),
    GranuleName: proto.String("Past Granule 2"),
  }.Build(),
}

ingestResponse, err := client.Datapoints.Ingest(ctx,
    collectionID,
    &datapoints
    false,
)
```

## Copying or moving data

Since `ingest` takes `query`'s output as input, you can easily copy or move data from one collection to another.

<Tip>
  Copying data like this also works across datasets in case the dataset schemas are compatible.
</Tip>

<CodeGroup>
  ```python Python theme={"system"}
  src_collection = dataset.collection("Measurements")
  data_to_copy = src_collection.query(temporal_extent=("2025-03-28", "2025-03-29"))

  dest_collection = dataset.collection("OtherMeasurements")
  dest_collection.ingest(data_to_copy)  # copy the data to the other collection

  # To verify it now contains 4 datapoints (2 we ingested already, and 2 we copied just now)
  print(dest_collection.info())
  ```

  ```go Go theme={"system"}
  dataset, err := client.Datasets.Get(ctx, "my_org.my_custom_dataset")
  if err != nil {
    log.Fatalf("Failed to get dataset: %v", err)
  }

  srcCollection, err := client.Collections.GetOrCreate(ctx, dataset.ID, "Measurements")
  if err != nil {
    log.Fatalf("Failed to get collection: %v", err)
  }

  startDate := time.Date(2025, time.March, 28, 0, 0, 0, 0, time.UTC)
  endDate := time.Date(2025, time.March, 29, 0, 0, 0, 0, time.UTC)

  var dataToCopy []*v1.MyCustomDataset
  err = client.Datapoints.QueryInto(ctx,
    dataset.ID,
    &dataToCopy,
    datasets.WithCollectionIDs(srcCollection.ID),
    datasets.WithTemporalExtent(query.NewTimeInterval(startDate, endDate)),
  )
  if err != nil {
    log.Fatalf("Failed to query datapoints: %v", err)
  }

  destCollection, err := client.Collections.GetOrCreate(ctx, dataset.ID, "OtherMeasurements")
  if err != nil {
    log.Fatalf("Failed to get collection: %v", err)
  }

  // copy the data to the other collection
  _, err = client.Datapoints.Ingest(ctx, destCollection.ID, &dataToCopy, false)
  if err != nil {
    log.Fatalf("Failed to ingest datapoints: %v", err)
  }

  // To verify it now contains 4 datapoints (2 we ingested already, and 2 we copied just now)
  updatedDestCollection, err := client.Collections.Get(ctx, dataset.ID, "OtherMeasurements")
  if err != nil {
    log.Fatalf("Failed to get collection: %v", err)
  }
  slog.Info("Updated collection", slog.String("collection", updatedDestCollection.String()))
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
  OtherMeasurements: [2025-03-28T11:44:23.000 UTC, 2025-03-28T11:46:54.000 UTC] (4 data points)
  ```
</CodeGroup>

## Automatic batching

Tilebox automatically batches the ingestion requests for you, so you don't have to worry about the maximum request size.

## Idempotency

Tilebox will auto-generate datapoint IDs based on the data of all its fields - except for the auto-generated
`ingestion_time`, so ingesting the same data twice will result in the same ID being generated. By default, Tilebox
will silently skip any data points that are duplicates of existing ones in a collection. This behavior is especially
useful when implementing idempotent algorithms. That way, re-executions of certain ingestion tasks due to retries
or other reasons will never result in duplicate data points.

You can instead also request an error to be raised if any of the generated datapoint IDs already exist.
This can be done by setting the `allow_existing` parameter to `False`.

<CodeGroup>
  ```python Python theme={"system"}
  data = pd.DataFrame({
      "time": [
        "2025-03-28T11:45:19Z",
      ],
      "value": [45.16],
      "sensor": ["A"],
      "precise_time": [
        "2025-03-28T11:44:23.345761444Z",
      ],
      "sensor_history": [
        [-12.15, 13.45, -8.2, 16.5, 45.16],
      ],
  })

  # we already ingested the same data point previously
  collection.ingest(data, allow_existing=False)

  # we can still ingest it, by setting allow_existing=True
  # but the total number of datapoints will still be the same
  # as before in that case, since it already exists and therefore
  # will be skipped
  collection.ingest(data, allow_existing=True)  # no-op
  ```

  ```go Go theme={"system"}
  datapoints := []*v1.MyCustomDataset{
    v1.MyCustomDataset_builder{
      Time:          timestamppb.New(time.Date(2025, time.March, 28, 11, 45, 19, 0, time.UTC)),
      Value:         proto.Float64(45.16),
      Sensor:        proto.String("A"),
      PreciseTime:   timestamppb.New(time.Date(2025, time.March, 28, 11, 44, 23, 345761444, time.UTC)),
      SensorHistory: []float64{-12.15, 13.45, -8.2, 16.5, 45.16},
    }.Build(),
  }

  // we already ingested the same data point previously
  _, err = client.Datapoints.Ingest(ctx, collection.ID, &datapoints, false)
  if err != nil {
    log.Fatalf("Failed to ingest datapoints: %v", err)
  }

  // we can still ingest it, by setting allowExisting to true
  // but the total number of datapoints will still be the same
  // as before in that case, since it already exists and therefore
  // will be skipped
  _, err = client.Datapoints.Ingest(ctx, collection.ID, &datapoints, true) // no-op
  if err != nil {
    log.Fatalf("Failed to ingest datapoints: %v", err)
  }
  ```
</CodeGroup>

<CodeGroup>
  ```plaintext Output theme={"system"}
  ArgumentError: found existing datapoints with same id, refusing to ingest with "allow_existing=false"
  ```
</CodeGroup>

## Ingestion from common file formats

Through the usage of `xarray` and `pandas` you can also easily ingest existing datasets available in file
formats, such as CSV, [Parquet](https://parquet.apache.org/), [Feather](https://arrow.apache.org/docs/python/feather.html) and more.

Check out the [Ingestion from common file formats](/guides/datasets/ingest-format) guide for examples of how to achieve this.

## Geometries

Ingesting Geometries can traditionally be a bit tricky, especially when working with geometries that cross the antimeridian or cover a pole.
Tilebox is designed to take away most of the friction involved in this, but it's still recommended to follow the [best practices for handling geometries](/datasets/geometries).
