Ingesting from common file formats
Learn how to ingest data from common file formats into Tilebox
For ingesting data from common file formats, it’s recommend to use the Tilebox Python SDK, since
it provides out-of-the-box support for reading many common formats through third party libraries for loading data as either
pandas.DataFrame
or xarray.Dataset
, which can then be directly ingested into Tilebox.
Reading and previewing the data
To ingest data from a file, you first need to read it into a pandas.DataFrame
or an xarray.Dataset
.
How that can be achieved depends on the file format. The following sections show examples for a couple of common
file formats.
CSV
Comma-separated values (CSV) is a common file format for tabular data. It’s widely used in data science. Tilebox
supports CSV ingestion using the pandas.read_csv
function.
Assuming you have a CSV file named data.csv
with the following content. If you want to follow along, you can
download the file here.
Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval.
Tilebox supports Parquet ingestion using the pandas.read_parquet
function.
The parquet file used in this example is available here.
GeoParquet
GeoParquet is an extension of the Parquet file format, adding geospatial
features support to Parquet. Tilebox supports GeoParquet ingestion using the geopandas.read_parquet
function.
The GeoParquet file used in this example is available here.
For a step-by-step guide of ingesting a GeoParquet file, check out our Ingesting data guide.
Feather
Feather is a file format originating from the Apache Arrow project,
designed for storing tabular data in a fast and memory-efficient way. Tilebox supports Feather ingestion using the pandas.read_feather
function.
The feather file used in this example is available here.
Mapping columns to dataset fields
Once data is read into a pandas.DataFrame
or an xarray.Dataset
, it can be ingested into Tilebox directly.
The column names of the pandas.DataFrame
or the variables and coordinates of the xarray.Dataset
are mapped to the fields of the
Tilebox dataset to ingest into.
Depending on how closely the column names or variable/coordinate names match the field names in the Tilebox dataset, you might need to rename some columns/variables/coordinates before ingestion.
Renaming fields
Dropping fields
In case you want to skip certain columns/variables/coordinates entirely, you can drop them before ingestion.
Ingesting the data
Once the data is in the correct format, you can ingest it into Tilebox.