pandas.DataFrame
or xarray.Dataset
, which can then be directly ingested into Tilebox.
Reading and previewing the data
To ingest data from a file, you first need to read it into apandas.DataFrame
or an xarray.Dataset
.
How that can be achieved depends on the file format. The following sections show examples for a couple of common
file formats.
CSV
Comma-separated values (CSV) is a common file format for tabular data. It’s widely used in data science. Tilebox supports CSV ingestion using thepandas.read_csv
function.
Assuming you have a CSV file named data.csv
with the following content. If you want to follow along, you can
download the file here.
ingestion_data.csv
Parquet
Apache Parquet is an open source, column-oriented data file format designed for efficient data storage and retrieval. Tilebox supports Parquet ingestion using thepandas.read_parquet
function.
The parquet file used in this example is available here.
GeoParquet
GeoParquet is an extension of the Parquet file format, adding geospatial features support to Parquet. Tilebox supports GeoParquet ingestion using thegeopandas.read_parquet
function.
The GeoParquet file used in this example is available here.
For a step-by-step guide of ingesting a GeoParquet file, check out our Ingesting data guide.
Feather
Feather is a file format originating from the Apache Arrow project, designed for storing tabular data in a fast and memory-efficient way. Tilebox supports Feather ingestion using thepandas.read_feather
function.
The feather file used in this example is available here.
Mapping columns to dataset fields
Once data is read into apandas.DataFrame
or an xarray.Dataset
, it can be ingested into Tilebox directly.
The column names of the pandas.DataFrame
or the variables and coordinates of the xarray.Dataset
are mapped to the fields of the
Tilebox dataset to ingest into.
Depending on how closely the column names or variable/coordinate names match the field names in the Tilebox dataset,
you might need to rename some columns/variables/coordinates before ingestion.