Learn how to ingest data into a Tilebox dataset.
MyCustomDataset schema
Field name | Type | Description |
---|---|---|
time | Timestamp | Timestamp of the data point. Required by the Timeseries dataset type. |
id | UUID | Auto-generated UUID for each datapoint. |
ingestion_time | Timestamp | Auto-generated timestamp for when the data point was ingested into the Tilebox API. |
value | float64 | A numeric measurement value. |
sensor | string | A name of the sensor that generated the data point. |
precise_time | Timestamp | A precise measurement time in nanosecond precision. |
sensor_history | Array[float64] | The last few measurements of the sensor. |
collection.ingest
supports a wide range of input types. Below is an example of using either a pandas.DataFrame
or an xarray.Dataset
as input.
ingest
.
The example below shows how to construct a pandas.DataFrame
from scratch, that matches the schema of the MyCustomDataset
dataset and can be ingested into it.
ingest
it into a collection.
xarray.Dataset
is the default format in which Tilebox Datasets returns data when
querying data from a collection.
Tilebox also supports it as input for ingestion. The example below shows how to construct an xarray.Dataset
from scratch, that matches the schema of the MyCustomDataset
dataset and can then be ingested into it.
To learn more about xarray.Dataset
, visit Tilebox dedicated Xarray documentation page.
n_sensor_history
. In case
of different array sizes for each data point, remaining values are filled up with a fill value, depending on the
dtype
of the array. For float64
this is np.nan
(not a number).
Don’t worry - when ingesting data into a Tilebox dataset, Tilebox will automatically skip those padding fill values
and not store them in the dataset.xarray.Dataset
in the correct format, you can ingest it into the Tilebox dataset collection.
Client.Datapoints.Ingest
supports ingestion of data points in the form of a slice of protobuf messages.
v1.Modis
type has been generated using tilebox-generate as described in the protobuf section.
ingest
takes query
’s output as input, you can easily copy or move data from one collection to another.
ingestion_time
, so ingesting the same data twice will result in the same ID being generated. By default, Tilebox
will silently skip any data points that are duplicates of existing ones in a collection. This behavior is especially
useful when implementing idempotent algorithms. That way, re-executions of certain ingestion tasks due to retries
or other reasons will never result in duplicate data points.
You can instead also request an error to be raised if any of the generated datapoint IDs already exist.
This can be done by setting the allow_existing
parameter to False
.
xarray
and pandas
you can also easily ingest existing datasets available in file
formats, such as CSV, Parquet, Feather and more.
Check out the Ingestion from common file formats guide for examples of how to achieve this.