This page guides you through the process of ingesting data into a Tilebox dataset. Starting from an existing dataset available as file in the GeoParquet format, we’ll walk you through the process of ingestion that data into Tilebox as a Timeseries dataset.

Downloading the example dataset

The dataset used in this example is available as a GeoParquet file. You can download it from here: modis_MCD12Q1.geoparquet.

Installing the necessary packages

This example uses a couple of python packages for reading parquet files and for visualizing the dataset. Install the required packages using your preferred package manager. For new projects, we recommend using uv.

uv add tilebox-datasets geopandas folium matplotlib mapclassify

Reading and previewing the dataset

The dataset is available as a GeoParquet file. You can read it using the geopandas.read_parquet function.

import geopandas as gpd

modis_data = gpd.read_parquet("modis_MCD12Q1.geoparquet")
modis_data.head(5)
Output
             time                  end_time                                   granule_name                                           geometry  horizontal_tile_number  vertical_tile_number   tile_id  file_size    checksum checksum_type day_night_flag browse_granule_id                     published_at
0 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00  MCD12Q1.A2001001.h00v08.061.2022146024956.hdf  POLYGON ((-180 10, -180 0, -170 0, -172.62252 ...                       0                     8  51000008     275957   941243048         CKSUM            Day              None 2022-06-23 10:54:43.824000+00:00
1 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00  MCD12Q1.A2001001.h00v09.061.2022146024922.hdf  POLYGON ((-180 0, -180 -10, -172.62252 -10, -1...                       0                     9  51000009     285389  3014510714         CKSUM            Day              None 2022-06-23 10:54:44.697000+00:00
2 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00  MCD12Q1.A2001001.h00v10.061.2022146032851.hdf  POLYGON ((-180 -10, -180 -20, -180 -20, -172.6...                       0                    10  51000010     358728  2908215698         CKSUM            Day              None 2022-06-23 10:54:44.669000+00:00
3 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00  MCD12Q1.A2001001.h01v08.061.2022146025203.hdf  POLYGON ((-172.62252 10, -170 0, -160 0, -162....                       1                     8  51001008     146979  1397661843         CKSUM            Day              None 2022-06-23 10:54:44.309000+00:00
4 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00  MCD12Q1.A2001001.h01v09.061.2022146025902.hdf  POLYGON ((-170 0, -172.62252 -10, -162.46826 -...                       1                     9  51001009     148935  2314263965         CKSUM            Day              None 2022-06-23 10:54:44.023000+00:00

Exploring it visually

Geopandas comes with a built in explorer to visually explore the dataset.

modis_data.head(1000).explore(width=800, height=600)
Explore the MODIS datasetExplore the MODIS dataset

Create a Tilebox dataset

Now we’ll create a Timeseries dataset with the same schema as the given MODIS dataset. To do so, we’ll use the Tilebox Console, navigate to My Datasets and click Create Dataset. We then select Timeseries Dataset as the dataset type.

For more information on creating a dataset, check out the Creating a dataset guide for a Step by step guide.

Now, to match the given MODIS dataset, we’ll specify the following fields:

FieldTypeNote
granule_namestringMODIS granule name
geometryGeometryTile boundary coords of the granule
end_timeTimestampMeasurement end time
horizontal_tile_numberint64Horizontal modis tile number (0-35)
vertical_tile_numberint64Vertical modis tile number (0-17)
tile_idint64Modis Tile ID
file_sizeuint64File size of the product in bytes
checksumstringHash checksum of the file
checksum_typestringChecksum algorithm (MD5 / CKSUM)
day_night_flagint64Day / Night / Both
browse_granule_idstringOptional granule ID for browsing
published_atTimestampThe time the product was published

In the console, this will look like the following:

Tilebox ConsoleTilebox Console

Access the dataset from Python

Our newly created dataset is now available. Let’s access it from Python. For this, we’ll need to know the dataset slug, which was assigned automatically based on the specified code_name. To find out the slug, navigate to the dataset overview in the console.

Explore the MODIS datasetExplore the MODIS dataset

We can now instantiate the dataset client and access the dataset.

from tilebox.datasets import Client

client = Client()
dataset = client.dataset("tilebox.modis")  # replace with your dataset slug

Create a collection

Next, we’ll create a collection to insert our data into.

collection = dataset.get_or_create_collection("MCD12Q1")

Ingest the data

Now, we’ll finally ingest the MODIS data into the collection.

datapoint_ids = collection.ingest(modis_data)
print(f"Successfully ingested {len(datapoint_ids)} datapoints!")
Output
Successfully ingested 7245 datapoints!

Query the newly ingested data

We can now query the newly ingested data. Let’s query a subset of the data for a specific time range.

Since the data is now stored directly in the Tilebox dataset, you can query and access it from anywhere.

data = collection.load(("2015-01-01", "2020-01-01"))
data
Output
<xarray.Dataset> Size: 403kB
Dimensions:                 (time: 1575)
Coordinates:
  * time                    (time) datetime64[ns] 13kB 2015-01-01 ... 2019-01-01
Data variables: (12/14)
    id                      (time) <U36 227kB '014aa2ca-b000-0155-ab96-2239bf...
    ingestion_time          (time) datetime64[ns] 13kB 2025-03-25T13:46:19.39...
    granule_name            (time) object 13kB 'MCD12Q1.A2015001.h12v02.061.2...
    geometry                (time) object 13kB POLYGON ((-175.4282639365679 6...
    end_time                (time) datetime64[ns] 13kB 2015-12-31T23:59:59 .....
    horizontal_tile_number  (time) int64 13kB 12 17 13 30 21 ... 32 21 13 31 30
    ...                      ...
    file_size               (time) uint64 13kB 16411708 146059 ... 2843975
    checksum                (time) object 13kB '2554844679' ... '3579360945'
    checksum_type           (time) object 13kB 'CKSUM' 'CKSUM' ... 'CKSUM'
    day_night_flag          (time) object 13kB 'Day' 'Day' 'Day' ... 'Day' 'Day'
    browse_granule_id       (time) object 13kB 'UR:10:DsShESDTUR:UR:15:DsShSc...
    published_at            (time) datetime64[ns] 13kB 2022-06-14T18:33:34.36...

For more information on accessing and querying data, check out querying data.

View the data in the console

You can also view your data in the Console, by navigate to the dataset, selecting the collection and then clicking on one of the data points.

Explore the MODIS datasetExplore the MODIS dataset

Next steps

Congrats! You’ve successfully ingested data into Tilebox. You can now explore the data in the console and use it for further processing and analysis.