This page guides you through the process of ingesting data into a Tilebox dataset. Starting from an existing
dataset available as file in the GeoParquet format, you’ll go through the process of
ingesting that data into Tilebox as a Timeseries dataset.
Downloading the example dataset
The dataset used in this example is available as a GeoParquet file. You can download it
from here: modis_MCD12Q1.geoparquet.
Installing the necessary packages
This example uses a couple of python packages for reading parquet files and for visualizing the dataset. Install the
required packages using your preferred package manager. For new projects, Tilebox recommend using uv.
uv add tilebox-datasets geopandas lonboard
Reading and previewing the dataset
The dataset is available as a GeoParquet file. You can read it using the geopandas.read_parquet
function.
import geopandas as gpd
modis_data = gpd.read_parquet("modis_MCD12Q1.geoparquet")
modis_data.head(5)
time end_time granule_name geometry horizontal_tile_number vertical_tile_number tile_id file_size checksum checksum_type day_night_flag browse_granule_id published_at
0 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v08.061.2022146024956.hdf POLYGON ((-180 10, -180 0, -170 0, -172.62252 ... 0 8 51000008 275957 941243048 CKSUM Day None 2022-06-23 10:54:43.824000+00:00
1 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v09.061.2022146024922.hdf POLYGON ((-180 0, -180 -10, -172.62252 -10, -1... 0 9 51000009 285389 3014510714 CKSUM Day None 2022-06-23 10:54:44.697000+00:00
2 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h00v10.061.2022146032851.hdf POLYGON ((-180 -10, -180 -20, -180 -20, -172.6... 0 10 51000010 358728 2908215698 CKSUM Day None 2022-06-23 10:54:44.669000+00:00
3 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v08.061.2022146025203.hdf POLYGON ((-172.62252 10, -170 0, -160 0, -162.... 1 8 51001008 146979 1397661843 CKSUM Day None 2022-06-23 10:54:44.309000+00:00
4 2001-01-01 00:00:00+00:00 2001-12-31 23:59:59+00:00 MCD12Q1.A2001001.h01v09.061.2022146025902.hdf POLYGON ((-170 0, -172.62252 -10, -162.46826 -... 1 9 51001009 148935 2314263965 CKSUM Day None 2022-06-23 10:54:44.023000+00:00
Exploring it visually
Geopandas comes with a built in explorer to visually explore the dataset.
from lonboard import viz
viz(modis_data, map_kwargs={"show_tooltip": True})
Create a Tilebox dataset
Now you’ll create a Spatio-temporal dataset with the same schema as the given MODIS dataset.
To do so, you’ll use the Tilebox Console, navigate to My Datasets
and click Create Dataset
. Then select
Spatio-temporal Dataset
as the dataset type.
For more information on creating a dataset, check out the Creating a dataset guide for a
Step by step guide.
Now, to match the given MODIS dataset, you’ll specify the following fields:
Field | Type | Note |
---|
granule_name | string | MODIS granule name |
end_time | Timestamp | Measurement end time |
horizontal_tile_number | int64 | Horizontal modis tile number (0-35) |
vertical_tile_number | int64 | Vertical modis tile number (0-17) |
tile_id | int64 | Modis Tile ID |
file_size | uint64 | File size of the product in bytes |
checksum | string | Hash checksum of the file |
checksum_type | string | Checksum algorithm (MD5 / CKSUM) |
day_night_flag | int64 | Day / Night / Both |
browse_granule_id | string | Optional granule ID for browsing |
published_at | Timestamp | The time the product was published |
In the console, this will look like the following:
Access the dataset from Python
Your newly created dataset is now available. You can access it from Python. For this, you’ll need to know the dataset slug,
which was assigned automatically based on the specified code_name
. To find out the slug, navigate to the dataset overview
in the console.
You can now instantiate the dataset client and access the dataset.
from tilebox.datasets import Client
client = Client()
dataset = client.dataset("tilebox.modis") # replace with your dataset slug
Create a collection
Next, you’ll create a collection to insert your data into.
collection = dataset.get_or_create_collection("MCD12Q1")
Ingest the data
Now, you’ll finally ingest the MODIS data into the collection.
datapoint_ids = collection.ingest(modis_data)
print(f"Successfully ingested {len(datapoint_ids)} datapoints!")
Successfully ingested 7245 datapoints!
Query the newly ingested data
You can now query the newly ingested data. You can query a subset of the data for a specific time range.
Since the data is now stored directly in the Tilebox dataset, you can query and access it from anywhere.
from shapely import Polygon
area = Polygon( # area roughly covering the US
((-124.45, 49.19), (-120.88, 29.31), (-66.87, 24.77), (-65.34, 47.84), (-124.45, 49.19)),
)
data = collection.query(
temporal_extent=("2015-01-01", "2020-01-01"),
spatial_extent=area
)
data
<xarray.Dataset> Size: 28kB
Dimensions: (time: 110)
Coordinates:
* time (time) datetime64[ns] 880B 2015-01-01 ... 2019-01-01
Data variables: (12/14)
id (time) <U36 16kB '014aa2ca-b000-100a-dd34-fae14c5...
ingestion_time (time) datetime64[ns] 880B 2025-07-09T09:21:34.70...
geometry (time) object 880B POLYGON ((-160 60, -124.457906...
granule_name (time) object 880B 'MCD12Q1.A2015001.h10v03.061.2...
end_time (time) datetime64[ns] 880B 2015-12-31T23:59:59 .....
horizontal_tile_number (time) int64 880B 10 13 12 11 11 10 ... 8 8 10 7 15
... ...
file_size (time) uint64 880B 11719196 10878403 ... 263319
checksum (time) object 880B '2878522088' ... '190901039'
checksum_type (time) object 880B 'CKSUM' 'CKSUM' ... 'CKSUM'
day_night_flag (time) object 880B 'Day' 'Day' 'Day' ... 'Day' 'Day'
browse_granule_id (time) object 880B 'UR:10:DsShESDTUR:UR:15:DsShSc...
published_at (time) datetime64[ns] 880B 2022-06-15T01:26:58.61...
For more information on accessing and querying data, check out querying data.
View the data in the console
You can also view your data in the Console, by navigate to the dataset, selecting the collection and then clicking
on one of the data points.
Next steps
Congrats. You’ve successfully ingested data into Tilebox. You can now explore the data in the console and use it for
further processing and analysis.